Introduced with AlexNet, the Rectifier Linear Unit function solves a lot of problems.

First of all it doesn’t kill the gradients for positive s. It’s also very efficient and allows a faster convergence.

The two problems are that gradients are killed for negative s, and the fact that is not zero-centered like sigmoid or tanh. Here we will introduce some ReLU variants that aim to solve some of these problems:

  • : doesn’t kill the gradients by introducing a small coeficient.
  • : like the Leaky ReLU, but makes the coefficeint learnable.
  • : Exponential Linear Unit, which is less efficient because of the exponential computation.
  • : Scaled ELU, which is better for deep networks since it has a self-normalizing properties and so it doesn’t need Batch Norm.