The softmax function is an activation function that converts a vector of values into a probability distribution. It is commonly used in multi-class classification tasks to represent the probabilities of each class.

Given an input vector , where each  is the raw score (the values are also called logits) for class , the softmax function outputs:

where:

  •   is the probability of class  ;
  •   is the total number of classes;
  •   exponentiates the score for each class to ensure non-negativity.

The resulting probabilities  sum to 1, making softmax ideal for representing mutually exclusive class probabilities.

Note

Usually this function is used as the last layer to a Neural Network to ensure the probability vector sums to .


machine-learning