In order to better understand how VAE works, we have to introduce some concepts from Information Theory.

Autoencoders are neural networks where the expected output coincides with the input and the architecture contains a bottleneck which allows the model to learn latent representations of the input. The model is composed of two parts:

  • The encoder, which takes the input and transforms it into a lower dimensional representation
  • The decoder, which takes the latent code and transforms it back to a vector of the same dimensions of the input. The reconstruction loss makes sure that the input and output are the most similar possibile.

VAEs are just AE that produce a probability distribution of latent codes, which we can sample from, instead than a single well defined value.

Source: Variational autoencoders.

We call the prior distribution, meaning the true probability distribution of the data ; and the posterior distribution our learned distribution .

We introduce because we want to compute , but this requires to compute , which is intractable. So the solution is to try to approximate using . In order to make the approximation work, we use KL divergence (which measures the difference between two probability distribution) as the function to minimize.

todo CONTINUA CON APPUNTI NOTION

As said in this video about VQ-VAE, we use VAE over standard AE because they impose a structure in the latent space, in order for it to be more continuous. This allows the generation to be more precise. If we take a random code and input it to a decoder of a simple AE, probably we will just have some noise, because the latent space has no structure. If we do the same thing but with a VAE, then we will probably have an image of something. Furthermore, we can see that with VAE we can interpolate two points if we decode the latent codes that are in between them, this once again because the latent space in the VAE is smooth.


tags: deep-learning