Softmax Function

The softmax function is an activation function that converts a vector of values into a probability distribution. It is commonly used in multi-class classification tasks to represent the probabilities of each class.

Given an input vector $z = [z_{1}, z_{2}, \dots, z_{K}]$ , where each $z_{j}$ is the raw score (the $z_{j}$ values are also called logits) for class $j$ , the softmax function outputs:

$σ (z)_{j} = \frac{e ^{z_{j}}}{\sum k = 1 ^{K} e ^{z_{k}}}$ where:

$σ (z)_{j}$ is the probability of class $j$ ;
$K$ is the total number of classes;
$e^{z_{j}}$ exponentiates the score for each class to ensure non-negativity.

The resulting probabilities $σ (z) = [σ (z)_{1}, σ (z)_{2}, \dots, σ (z)_{K}]$ sum to 1, making softmax ideal for representing mutually exclusive class probabilities.

Note

Usually this function is used as the last layer to a Neural Network to ensure the probability vector sums to $1$ .

machine-learning

Quartz 4

Explorer

Softmax Function

Graph View

Backlinks