ReLU Function

Introduced with AlexNet, the Rectifier Linear Unit function solves a lot of problems.

ReLU (x) = max (0, x)

First of all it doesn’t kill the gradients for positive $x$ s. It’s also very efficient and allows a faster convergence.

The two problems are that gradients are killed for negative $x$ s, and the fact that is not zero-centered like sigmoid or tanh. Here we will introduce some ReLU variants that aim to solve some of these problems:

$Leaky ReLU (x) = max (0.1 x, x)$ : doesn’t kill the gradients by introducing a small coeficient.
$Parametric ReLU (x) = max (αx, x)$ : like the Leaky ReLU, but makes the coefficeint $α$ learnable.
$ELU (x) = {x α (e^{x} - 1) x \geq 0 x < 0$ : Exponential Linear Unit, which is less efficient because of the exponential computation.
$SELU (x)$ : Scaled ELU, which is better for deep networks since it has a self-normalizing properties and so it doesn’t need Batch Norm.

Quartz 4

Explorer

ReLU Function

Graph View

Backlinks