Cross-Entropy measures the distance between two probability distributions: the true labels and the predicted probabilities.
Formula (for one example with C classes):
Where:
- = ground truth (one-hot encoded, 0 or 1)
- = predicted probability for class .
Simplified (when is one-hot):
For a batch of N examples:
Intuition:
- High loss if the predicted probability for the correct class is low.
- Low loss if the predicted probability for the correct class is high.