Generative Learning refers to a special class of statistical models that are capable of
generating content that is very hard to distinguish from the reality (or fake content that
looks real). The generated content could be poems, images, music, songs, videos, 3D
objects or content from some other domain we could imagine. A domain is nothing but a fancy word for a bunch of examples that follow some common pattern. In this article, we will learn: how does a generative learning model work?
Interesting part about the generative models is that, sometimes, the content generated by them is not just realistic, its completely new as well (or unseen in the training examples). Everyone must have seen or heard about the modern technologies that can generate very realistic faces of the people that do not even exist in the world. Projects such as Face aging apps, Virtual try-on, Photos to paintings and a lot more advancements with similar technologies are examples of the generative models.
Let’s now learn how exactly does a generative learning model work.
Check out my introductory article on “Generative Learning and its Differences from the Discriminative Learning”
How does a Generative Learning Model work?
To understand how does the generative learning work: lets first define an example
problem and then we will discuss how a generative model will solve it. Let’s assume that we have a dataset (D) of 1 million cat images representing multiple breeds of cats across the world and the photographs have been taken from almost all possible angles. Note that the number 1 million is significant here, as generative models generally require larger datasets to estimate the target distribution more accurately.
Because a generative learning approach estimates the data distribution to solve a
problem, our focus is to define a generative model that is capable of learning the
distribution (𝑷𝒄𝒂𝒕𝒔) that these cat images represent. Every dataset represents some data distribution that it is originally sampled from, and that data distribution is known as the true distribution of that particular dataset. Here 𝑷𝒄𝒂𝒕𝒔 is the distribution of all possible cat images in the universe and this dataset is sampled from it as a representative of the true data distribution.
If somehow, our model is able to learn the distribution 𝑷𝒄𝒂𝒕𝒔, It will be able to answer all possible questions about cats present in this universe. For example:
- It will be able to tell whether a given image x represents a cat or not. If the
likelihood value 𝒑(𝒙) is high, then x is definitely a cat or vice-versa. - Secondly, if you go ahead and sample an image from 𝒑(𝒙), it will always be a cat
image. In this way, it will be able to generate cat images infinitely.
This example gave us a much better understanding of generative models. We now
understand that a generative model first learns the underlying data distribution so that
later it could answer any questions about that data. But in reality, learning a data
distribution is not trivial. To understand how complex, it can be to learn a joint
distribution, we first need to understand what does a joint distribution actually mean?
The following subsection explains the joint distributions.
1. Joint Distribution
As we all are aware, the digital images are made up of pixels. Each pixel inside an image, represents a colour and a group of such colour pixels, may represent the objects inside that image. In digital computers and smartphones, each pixel is represented using three discrete random variables R, G and B representing the intensity of three colours – Red, Green and Blue.
In a given digital image, each colour pixel represented by these three random discrete
variables, can choose any random discrete integer value from the range [0, 255] for
each variable R, G and B. We can represent the joint distribution of a single-coloured
pixel by 𝒑(𝑹, 𝑮, 𝑩) such that sampling from this distribution (𝒓,𝒈, 𝒃)~𝒑(𝑹, 𝑮, 𝑩) always
generates a colourful pixel. In this case, the total number of parameters required to
specify the joint distribution 𝒑(𝑹, 𝑮, 𝑩) would be:
= 256 x 256 x 256 – 1 = 2563 – 1
Here, as each random variable has 256 possible values, so total parameters required to specify this true distribution would be on less than the total possible combinations, as shown in the calculation above.
This was just a single pixel, now think about an image with 100×100 dimensions
(though it’s a pretty low-resolution image) that is made up of 10,000 such colourful
pixels. Now, can you imagine the number of parameters required to represent a true
joint distribution of all such possible 100×100 dimension colour images? Pretty huge
right. Let’s calculate it. We just need to multiply the number of possible combinations of one coloured pixel ten thousand time. Check the following calculation.
= (2563 – 1) x (2563 – 1) x …… 10,000 times
= 25630,000 (approximately)
This number is pretty huge. Now if I ask you, can you prepare a dataset that can
efficiently represent the above-described distribution of color images with 100×100
resolution? The answer is pretty obvious –
“Never”.
It is impossible to practically represent the true data distribution in this case, no matter
how big dataset you have, it’s never enough. Any given dataset, representing a
distribution 𝑷𝑫𝒂𝒕𝒂𝒔𝒆𝒕 , is a “not very efficient” representative of the true data distribution. Now, one question that pops up in our mind is: Do we really need to model the true joint distribution? Can we settle for less (something like 𝑷𝑫𝒂𝒕𝒂𝒔𝒆𝒕)?
Actually, modelling the true distributions is pointless as they are deterministic in
nature. In other words, if we already have the required information about the true
distribution, we don’t really need to model it. For example: the distribution of all the
possible colorful images of dimensions 100×100, we don’t actually need to model it.
Because we already know that any random color image of 100×100 dimensions will
always belong to the aforementioned distribution with 100% confidence. Thus, there is
no point in learning such distributions.
The aforementioned data distribution is deterministic in nature, because we assumed
the pixels to be independent from each other. What if the pixels are somehow related?
This relation between pixels can restrict the given true distribution to represent only a
particular class of colour images, such as the dataset of 1 million cats where pixels are not independent.
The dataset of 1 million cat images can be considered as a restricted distribution due to the pixel relationships. Learning this kind of restricted distribution, instead of the true joint distribution described above, can be helpful. To understand this, let’s get into more details about the restricted distributions next.
2. Restricted Distribution
Now, let’s get back to our dataset of 1 million cat images. Let’s assume that our dataset has the distribution 𝑷𝒄𝒂𝒕𝒔 which is supposed to be close to the real distribution of all the possible cats in this universe. Imagine, we are able to learn a generative model 𝑷𝑴𝒐𝒅𝒆𝒍 (model distribution) such that 𝑷𝑴𝒐𝒅𝒆𝒍 is very close to 𝑷𝒄𝒂𝒕𝒔 (from our dataset).
Using this model distribution (𝑷𝑴𝒐𝒅𝒆𝒍), we should be able to perform the following tasks such as –
- Generation: Sampling from the model (𝒙𝒏𝒆𝒘 ~𝑷𝑴𝒐𝒅𝒆𝒍) will always generate a cat
image and it will give us the flexibility of generating infinite number of cat
images if required. (See example in Figure below). - Prediction: It will be able to tell whether a given image x, represents a cat or
not. If the likelihood value 𝑷𝑴𝒐𝒅𝒆𝒍(x) is high, x is a cat or vice-versa. - Representation Learning: The model will be capable of learning the
unsupervised features related to cats such as breed, colour, eyes, tail and so on
without explicitly providing labels for these attributes.
Given the above notion, a conditional generative model is also possible. Suppose that we want to generate a set of variables: Y, given some other set of variables: X, we can
directly train a generative model to learn the conditional distribution 𝑷(𝒀|𝑿) without
worrying about the joint distribution. This is very similar to the sequence generation
tasks where the next candidate of the sequence is predicted given some already existing candidates. Another popular example of conditional generative models is: Latent variable based generative models. Let’s discuss how latent variable based generative models actually work.
3. Latent variable based Generative Models
To understand the latent variable based generative models, let’s get back to our dataset of 1 million cat images. This dataset is not annotated and it means that there is no information about the kind of cat, that is present in a given picture. Now suppose that we want to train a generative model on this dataset, so that later we can use it to
generate some cat images. But this time we would like our model to generate the images of desired type of cats, instead of generating random cat images. This time, we are asking the model to learn unsupervised features as well, along with the data
distribution so that it is able to answer questions like: Generate an orange long-haired
cat image! The term ‘unsupervised features’ makes sense here because we are not
providing the model with any labelled information for learning these features.
Latent Variable: A Latent variable is a variable that is hidden or that is not directly
observed but is actually inferred from other variables that are observed.
The idea is to learn these unsupervised features, such as colours, hair-length, poses and so on, with the help of a latent vector (z). Here, the latent vector z is expected to
represent these high-level features of cat images. In this case, a cat image of desired
type can be sampled from the conditional distribution 𝒑(𝒙|𝒛), if we are able to provide
the correct value of z here. Now, our objective has changed and our new goal is to learn the conditional distribution 𝒑(𝒙|𝒛), instead of the joint distribution P(x) which was
more complex. Figure below shows the high-level idea of sampling from this new model, this time sampling is conditioned on the latent input.
Now the real question is: How do we know what value of z generates which type of cat images? Because the training is also completely unsupervised (or unlabeled), we can’t really have control over the latent variables. But here the trick, we will let our model learn the conditional distribution 𝒑(𝒙|𝒛) and then, we can simply reverse the equation and learn the reverse conditional distribution 𝒑(𝒛|𝒙). Using this simple trick, we will be able to map the generated cats back to their input latent vector z, and hence, we will be able to find out what value of z generates which type of cat image.
If you are interested in learning more about the generative learning and Generative Adversarial Networks, Do check out my book:
Our plan seems simple, but again, learning this conditional distribution 𝒑(𝒙|𝒛) is a
challenging task and we cannot be 100% sure that it will learn the desired features
corresponding to different values of the latent vector z (so that we could map them back by reversing the process), as the training is completely unsupervised (or
uncontrollable).
Another important thing to consider is that we are talking about training models to
learn the distributions here. To accomplish that –
“We need a way to compare distributions”!
As we have discussed earlier as well, the real job of a generative model is to learn the
distribution 𝑷𝑴𝒐𝒅𝒆𝒍 which is really close to the true data distribution 𝑷𝑻𝑹𝑼𝑬 (here,
𝑷𝑫𝒂𝒕𝒂𝒔𝒆𝒕 can be a good proxy for the true distribution). During the learning phase, a
generative model may request to know how close the model distribution (𝑷𝑴𝒐𝒅𝒆𝒍) is to
the expected distribution (𝑷𝑫𝒂𝒕𝒂𝒔𝒆𝒕). For this reason, we need a function that is capable of efficiently comparing and quantifying the closeness of two distributions (e.g., 𝑷𝑴𝒐𝒅𝒆𝒍 and 𝑷𝑫𝒂𝒕𝒂𝒔𝒆𝒕).
We will cover the different ways of learning or comparing distributions in my subsequent articles.
Conclusion
In this article, we learned about some basics related to the generative models. We have also understood how does a generative learning model work, with the help of some examples. We also learned about the the joint distributions, restricted distributions and the latent variable based generative models on a very high level. Next important thing is: finding ways to learn or compare distributions in practice. We will cover this topic in my subsequent articles on generative learning.
I hope this article was helpful and gave you a good basic understanding of the generative models. If you find this useful kindly share, if you find any mistakes please let me know your valuable feedback by commenting below.
See you in my next articles on generative learning!!
Read Next>>>
- Generative Learning and its Differences from the Discriminative Learning
- Building Blocks of Deep Generative Models
- Deep Learning with PyTorch: Introduction
- Deep Learning with PyTorch: First Neural Network
- Autoencoders in Keras and Deep Learning (Introduction)
- Optimizers explained for training Neural Networks
- Optimizing TensorFlow models with Quantization Techniques
Pingback: Generative Learning and its Differences from the Discriminative Learning - Drops of AI
Pingback: Building blocks of Deep Generative Models - Drops of AI