Introduced with AlexNet, the Rectifier Linear Unit function solves a lot of problems.
First of all it doesn’t kill the gradients for positive s. It’s also very efficient and allows a faster convergence.
The two problems are that gradients are killed for negative s, and the fact that is not zero-centered like sigmoid or tanh. Here we will introduce some ReLU variants that aim to solve some of these problems:
- : doesn’t kill the gradients by introducing a small coeficient.
- : like the Leaky ReLU, but makes the coefficeint learnable.
- : Exponential Linear Unit, which is less efficient because of the exponential computation.
- : Scaled ELU, which is better for deep networks since it has a self-normalizing properties and so it doesn’t need Batch Norm.