Cross-Entropy measures the distance between two probability distributions: the true labels and the predicted probabilities.

Formula (for one example with C classes):

Where:

  • = ground truth (one-hot encoded, 0 or 1)
  • = predicted probability for class .

Simplified (when is one-hot):

For a batch of N examples:


Intuition:

  • High loss if the predicted probability for the correct class is low.
  • Low loss if the predicted probability for the correct class is high.