The softmax function is an activation function that converts a vector of values into a probability distribution. It is commonly used in multi-class classification tasks to represent the probabilities of each class.
Given an input vector , where each is the raw score (the values are also called logits) for class , the softmax function outputs:
where:
- is the probability of class ;
- is the total number of classes;
- exponentiates the score for each class to ensure non-negativity.
The resulting probabilities sum to 1, making softmax ideal for representing mutually exclusive class probabilities.
Note
Usually this function is used as the last layer to a Neural Network to ensure the probability vector sums to .