What are Autoregressive Generative Models?
The term ‘autoregressive’ is taken from the field of time-series forecasting frameworks. Where, In order to make a future prediction, a model considers all the past observations in a timely manner.
Autoregressive generative models are also quite similar in nature. They also take help from all their past predictions in order to decide the subsequent prediction. For example: An autoregressive model may generate an image by predicting one pixel at a time where each pixel value is decided based on the previously predicted pixel values. In other words, this model is expected to learn the conditional distribution of each individual pixel given the surrounding pixels. (or previously known pixels).
Remember, this kind of sequence prediction task is not new and has been successfully applied in the past in the field of natural language processing and other sequence learning tasks. But the job of an autoregressive generative model is way more complex as the length of sequences is can be quite huge.
Understanding how Autoregressive Generative Models work
For example, the task of generating an image of 100 x 100 dimensions, can be seen as a task of sequence prediction, where the sequence length is 10,000 (can be obtained by arranging the image pixels using raster scan, ‘or reading the image pixels in left-to-right and top-to-bottom manner’). Generating a sequence this long, where each pixel is generated by conditioning over all previous pixels, is quite complex and time-consuming.
In mathematical terms, the objective of an autoregressive generative model is to learn a data distribution using the maximum likelihood estimation (MLE). Assuming that we are a given a n-dimensional dataset to learn the distribution from, we can write the joint-distribution by chain rule as:
p(x1, x2, x3, … xn) = p(x1) * p(x2|x1) * p(x3|x2, x1) * … * p(xn|xn-1, …, x2, x1)
The chain rule is a general way of calculating joint probability assuming no conditional independence between the random variables. And that is exactly how the autoregressive models work (Bayesian networks with no conditional independence assumption).
To understand the Maximum Likelihood Estimation (MLE), Do check out my article on:
Learning an Autoregressive Generative Model
To learn an autoregressive generative model, we first need to arrange all the random variables in a fixed order (such as: x1, x2 ……, xn). And then learn a mapping function for each subsequent random variable in the following stepwise manner:
- x1 needs to be randomly sampled from all possible values
- x2 can be estimated using x1
- x3 can be estimated using x1 and x2
- . . . . . . . .
- . . . . . . . .
- . . . . . . . .
- xn can be estimated using x1, x2 ……, xn-1
Here, each step is a model that learns the linear (or non-linear in case of neural networks) combination of all previously estimated random variables to estimate the next random variable.
Each model has O(n) parameters and thus the overall setup has O(n2) parameters. Main drawback is that the data generation process is slow. As you need to estimate variables in a sequential way (similar to a for loop).
If you are interested in learning more about the generative learning and Generative Adversarial Networks, Do check out my book:
Pros and Cons of Autoregressive Generative Models
Pixel CNN, Pixel RNN, Character CNN, Character RNN, Wave-Net are some popular examples of the Deep Autoregressive Generative models. Some of these models have been successfully applied in the field of anomaly detection and adversarial attacks detection.
Here are some pros and cons of the autoregressive generative models:
Pros
· Easy to understand and calculate likelihoods
· Training process is supervised in nature and straight-forward
Cons
- Need an ordering of random variables
- Generation process is sequential, hence slow
- High likelihood does not guarantee better looking samples
- Do not learn unsupervised features (representations)
Conclusion
In this article, we learned the basic idea behind the autoregressive generative models. We saw how these models approach the problem of learning a distribution using Maximum Likelihood Estimation (MLE). We also learned about some common pros of cons of this approach.
In my next article, we will learn more about autoregressive generative models with a hands on example. I hope this article was helpful, please do share your feedback via commenting below. See you in the next article.
Read Next>>>
- Building Blocks of Deep Generative Models
- Generative Learning and its Differences from the Discriminative Learning
- How Does a Generative Learning Model Work?
- Deep Learning with PyTorch: Introduction
- Deep Learning with PyTorch: First Neural Network
- Autoencoders in Keras and Deep Learning (Introduction)
- Optimizers explained for training Neural Networks
- Optimizing TensorFlow models with Quantization Techniques
Pingback: Image Synthesis using Pixel CNN based Autoregressive Generative Model - Drops of AI