Training stable GANs
Generative Adversarial Networks, or GANs for short, are quite difficult to train in practice. This is due to the nature of GAN training where two networks compete with each other in a zero-sum game. This means that one model improves at the cost of degradation in the performance of the other model. This contest makes the training process very unstable.
Although there is no hard and fast rule for designing a stable GAN architecture, many researchers have studied the behaviour of GAN trainings under different conditions over the decade. These studies have led to a set of guidelines that improve the overall stability of the training process. It has taken a lot of research and experimentation to find out these settings that stabilise the training process. In this article, we will go through some of the best-practices and guidelines for training stable GANs.
After reading this article, the readers should feel confident about implementing and training stable GANs. Let’s understand some of the best-practices of training stable GANs.
Best Practices for Training Stable GANs
Most of the best-practices highlighted in this article are based on a 2016 paper by Salimans, Tim, et al. titled “Improved techniques for training GANs.” Some techniques are taken from other studies/research-papers and a few of these are based on my own experience from training multiple GAN models while writing a book on this topic.
If you are interested in learning more about the generative learning and Generative Adversarial Networks, Do check out my book:
Following is a consolidated list of some of the best-practices that are important to
understand and remember:
- Convolutional Layers
- Regularisation
- Activations
- Optimisers
- Feature Matching
- Image Scaling and Mini Batch
- Label Smoothing
- Weight Initialisation
Now, let’s go through each of these best-practices one-by-one.
1. Convolutional Layers
Convolutional Neural Network, or CNN, is an obvious choice when it comes to image related use cases. CNNs are specifically designed for localised feature learning from Images. So, if we want to generate images from a GAN based framework, we should develop the GAN with CNN layers. As discussed earlier as well, GAN training is highly unstable but fortunately we can make it work, if we choose the network architectures and hyper-parameters carefully.
Radford et. al. (2015) presented a CNN based GAN architecture, known as DCGAN, that implements some of the best practices. DCGAN resulted in quite stable training and performed really well for the task of image generation.
Following is a list of some common guidelines for designing CNN based stable GAN
architectures:
- For down-sampling and up-sampling purposes, utilise the strided convolutions instead of pooling layers.
- Use normalisation techniques such as batch-normalisation in both discriminator and generator networks.
- Don’t keep too many fully-connected hidden layers in any of the networks.
- Use LeakyReLU activation in discriminator network and ReLU in the generator network.
These guidelines can help us in designing more stable CNN based GAN architectures.
Next, let’s learn about regularisation.
2. Regularisation
In machine learning, regularisation techniques are utilised to prevent model from overfitting and under-fitting on training data and as a result the model performs well on test data. Batch normalisation is commonly preferred as a regularisation technique in case of neural network based models. It makes the training of networks faster and stable by adding a few extra layers (batch normalisation layers) within the network.
Batch Normalisation has significant impact in stabilising the training of GANs as well. Batch normalisation layers are usually added after the convolutional layers and before the activation layers (ReLU or LeakyReLU) in both generator and the discriminator networks.
3. Activations
Most of the times, when we design neural networks, we don’t pay much attention to the activation functions and generally go with a simple activation function such as Rectified Linear Unit, or ReLU, within the intermediate layers. While the final layer activation depends upon the type of problem we are solving, for example – ‘sigmoid’ is useful for binary classification tasks.
Interestingly, in case of GANs the activation function also plays a critical role when it comes to the stability of training. Experiments in the DCGAN paper have shown that using a LeakyReLU activation function within all the layers of the discriminator network and ReLU activation function in the layers (except for the final layer) of the generator network, results in a stable DCGAN architecture. However, some other studies claim that using LeakyReLU in both networks, performs even better.
LeakyReLU is a slight variant of the ReLU activation and it allows some negative values to flow as well (whereas ReLU converts every negative input to zero). LeakyReLU also provides a parameter for negative slope and keeping it at the default value of 0.2 generally gives good results in case of GANs.
4. Optimisers
The basic optimisation algorithm used for the training of neural networks is Stochastic Gradient Descent (or SGD). In practice, researchers have found few hacks to make the optimisation faster and smooth. These new derivative optimisation techniques are knowns as optimisers. Optimisation algorithms also play a key role in the stability of training of deep convolutional GANs.
Adaptive Moment Estimation optimiser, or Adam for short, is a well-known and frequently used variant of SGD. Adam optimiser is also recommended for training deep convolutional GANs with a learning rate of 0.0002 and the beta_1 momentum value of 0.5 (whereas the default value of beta_1 momentum is 0.9).
5. Feature Matching
In this approach, we modify the objective of GAN training a little bit. Instead of training the generator network for maximising the output of discriminator, we train the generator to match the statistics of generated data with the real data. In simpler words, we train the generator to match the features of real and generated images within an intermediate layer of the discriminator network. Experiments have shown that using feature matching technique, we can make the training stable when it is unstable.
6. Image Scaling and Mini Batches
Images are stored as arrays of pixel values where each pixel can have an integer value from 0 to 255, depicting the intensity or brightness of a pixel. For stable training of GANs for image synthesis, it is recommended that the pixel values in real images are scaled to a range of [-1, 1], and the generator network uses the hyperbolic tangent (tanh) as the activation function in the output layer. This scaling is generally performed using min-max scaling technique.
Secondly, it is recommended to train GANs with mini-batches. This helps the model in learning faster. In some GAN variants, researchers have recommended the batch sizes as small as 1 or 2. It is also recommended that we update the weights of the discriminator network with separate mini batches of real and fake (or generated) images.
7. Label Smoothing
Label smoothing is a technique in which we replace original classification labels of training data with smoothed values. For example: In a binary classification problem the original labels of 0 and 1 are replaced with smoothed values such as 0.1 and 0.9 before training. Doing this small trick improves the stability of training and also makes the model more robust. Label smoothing has also been shown to reduce vulnerability of the neural networks to adversarial examples.
As per the findings of the researchers, label smoothing also helps in stabilising the training of GANs.
8. Weight Initialisation
Generally, neural network weights are initialised with random values before starting the training. Weight initialisation can also affect the results and training behaviour of GANs. It is a best-practice to initialise all network weights using zero centered gaussian distribution (normal bell-shaped distribution) with a standard deviation of 0.02. We can do this by passing the gaussian distribution into ‘kernel_initializer’ parameter of convolutional layers using keras library.
Conclusion
As we already know that GANs are very difficult to train due to their nature of training. Thus, it becomes really important to understand the best-practices and recommended hacks that result in the stable training of GANs.
In this article, we learned about few ways of making GAN training stable. These practices are quite important to remember, as these have been taken from research papers where researchers have done tons of experiments to come up with these hacks.
After reading these best practices, the readers should be able to define and train a stable
GAN architecture for their experimentations very easily.
Read Next>>>
- What are Autoregressive Generative Models?
- Building Blocks of Deep Generative Models
- Generative Learning and its Differences from the Discriminative Learning
- How Does a Generative Learning Model Work?
- Image Synthesis using Pixel CNN based Autoregressive Generative Models
- AutoEncoders in Keras and Deep Learning
- Optimizers explained for training Neural Networks
- Optimizing TensorFlow models with Quantization Techniques
ChatGPT powered Autoresponder with Free SMTP at Unbeatable 1-Time Price! https://ext-opp.com/NewsMailer
Pingback: Understanding Adversarial Examples and Defence Mechanisms - Drops of AI