Autoencoders in Keras and Deep Learning

By | October 27, 2020

We all are well aware of the Supervised Machine Learning algorithms where ML algorithm tries to understand the relationship between input features and labels from the training data and is expected to automatically generate a similar kind of relationship between test data features and set of possible output labels. Although we solve lots of problems with supervised learning algorithms, not all problems are supervised in nature. There are problems where we just need the algorithm to learn representation (encoding) from the dataset without requiring labels. These algorithms are known as unsupervised learning ML algorithms. Autoencoders also come under unsupervised learning techniques. In this tutorial, we will learn about Autoencoders in Keras and Deep Learning.

Clustering algorithms (for example- KNN, KMeans, DBSCAN, etc.) are also examples of unsupervised learning techniques as they do not require labeled data to learn relationships/representations in data. Autoencoders are deep learning-based(Neural Networks) algorithms that are widely used to solve many complex tasks.

Here is a list of some of the popular tasks that are solved efficiently using autoencoder based deep learning models-

In this article, we will learn about how to write autoencoders in Keras and deep learning. The rest of the article is divided into the following sub-sections-

  1. Autoencoders in Keras and Deep Learning
  2. Python Implementation
  3. Types of Autoencoders
  4. Summary
  5. References
Autoencoders in Keras and Deep Learning
Autoencoders in Keras and Deep Learning | Taj Mahal (Agra, India) | Photo by Jovyn Chamb | Image Source

1. Autoencoders in Keras and Deep Learning

I hope everyone is aware of sponge balls that are extensively used as stress balls. Stress balls (or hand exercise balls) are squeezed in hand and manipulated by fingers to relieve muscle tension and stress as prescribed in physical therapy.

Autoencoders in Keras and Deep Learning
Sponge Ball for stress relieving | Autoencoders in Keras and Deep Learning

Sponge balls are known for their elastic property which brings them into their original shape when squeezed forcefully. Due to this property, you can actually keep these balls in smaller boxes(in size) with force and they will acquire their original shape back (approximately) once the box is opened.

The idea behind autoencoders is very similar. An autoencoder will take a data point, squeeze to fit it in a smaller box(smaller data shape), and will be able to convert it back(to its original form) when required. As our data is not elastic like the stress balls, It will not get back to its original form perfectly which means that we will be losing some information in the process. An autoencoder is made up of two parts-Encoder and Decoder.

The Encoder part of an autoencoder is a neural network (non-recurrent generally) that takes the input data point and squeezes it into a lower dimension state (h: as shown in the below figure). This code(h) is expected to learn important information from the input data point. The Decoder part of the autoencoder is again a neural network that is supposed to take this hidden-code representation(h) of the data as input and reconstruct the original data point as output. Thus the inputs and outputs of an autoencoder are the same, and the most important thing is the hidden state(compressed representation of data) that learns important features from the data.

Autoencoders in Keras and Deep Learning
Autoencoders in Keras and Deep Learning | Image Source

Autoencoders were primarily used for dimensionality reduction and feature learning in the past decades. But recently this technique is quite popular and is widely used in developing generative models (like-Generative Adversarial Networks (GANs)).

Dimensionality reduction is the task of representing high dimensional data points in lower dimensions without losing much information. In this manner, autoencoders are significantly related to the Principal Component Analysis(PCA) when used with linear activations. While the non-linearity(non-linear activations) makes the autoencoders much superior and capable of learning more complicated relationships in the data so that reconstruction is performed without losing any significant information.

The next section will talk about how to implement Autoencoders in Keras and Deep Learning-


2. Python Implementation

In this exercise, we will train a simple autoencoder and examine how well it reconstructs the data. We will further review, how well the reduced dimensions distinguish the output labels. We will use MNIST handwritten digits dataset for our experiments.

Data

The following python code will load the MNIST handwritten digits dataset and display some sample images from the dataset. This dataset has 60K training and 10K test images of handwritten digits with dimensions=28*28. It also has corresponding labels for each digit that we won’t need for our experiment. We will train an autoencoder that takes a digit image as input and reduces it to just 2-dimensions and reconstructs it.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
from tensorflow.keras.datasets import mnist

(trainX, trainy), (testX, testy) = mnist.load_data()

print('Training data shapes: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Testing data shapes: X=%s, y=%s' % (testX.shape, testy.shape))

for j in range(5):
    i = np.random.randint(0, 10000)
    plt.subplot(550 + 1 + j)
    plt.imshow(trainX[i], cmap='gray')
    plt.title(trainy[i])
plt.show()
Autoencoders in Keras and Deep Learning
DataSet overview | Autoencoders in Keras and Deep Learning

These images are 2-D NumPy arrays filled with pixel intensities. We will convert each image into a single dimensional array as we want to pass it into a Multi-layer perceptron model. (Ideally, we should have used Convolutional Neural Network-based autoencoder which is ideal for images).

# normalizing pixel intensities
trainX = trainX/255
testX = testX/255
#reshaping data into single dimension
train_data = np.reshape(trainX, (60000, 28*28))
test_data = np.reshape(testX, (10000, 28*28))
print (train_data.shape, test_data.shape)
(60000, 784) (10000, 784)

Encoder

Let’s define the encoder part of the model. It takes an array of size 784 as input and passes it through a multi-layer dense network. The final layer of the encoder has only two neurons, this layer is expected to represent each given image with these two float numbers.

import tensorflow

input_data = tensorflow.keras.layers.Input(shape=(784))

encoder = tensorflow.keras.layers.Dense(100)(input_data)
encoder = tensorflow.keras.layers.Activation('relu')(encoder)

encoder = tensorflow.keras.layers.Dense(50)(encoder)
encoder = tensorflow.keras.layers.Activation('relu')(encoder)

encoder = tensorflow.keras.layers.Dense(25)(encoder)
encoder = tensorflow.keras.layers.Activation('relu')(encoder)

encoded = tensorflow.keras.layers.Dense(2)(encoder)

Decoder

Most of the time, the decoder part of the autoencoder network is a mirror image of the encoder model. It will take those reduced dimensions as input and reconstruct the original image by increasing the dimensions again to the original ones. Finally, we will get an array of dimensions 784 as output. This output array is expected to be similar to the input array.

decoder = tensorflow.keras.layers.Dense(25)(encoded)
decoder = tensorflow.keras.layers.Activation('relu')(decoder)

decoder = tensorflow.keras.layers.Dense(50)(decoder)
decoder = tensorflow.keras.layers.Activation('relu')(decoder)

decoder = tensorflow.keras.layers.Dense(100)(decoder)
decoder = tensorflow.keras.layers.Activation('relu')(decoder)

decoded = tensorflow.keras.layers.Dense(784)(decoder)

Loss

We need a loss function that will tell how similar our reconstructed image is with respect to the original input image. As images are just pixel values, Mean Squared Error (‘mse’) would be a good choice to measure how close each pixel is to its corresponding pixel in the real and predicted images.

autoencoder = tensorflow.keras.models.Model(inputs=input_data, outputs=decoded)
autoencoder.compile(loss='mse', optimizer='adam')
autoencoder.summary()
Model: "functional_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, 784)]             0         
_________________________________________________________________
dense_11 (Dense)             (None, 100)               78500     
_________________________________________________________________
activation_9 (Activation)    (None, 100)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 50)                5050      
_________________________________________________________________
activation_10 (Activation)   (None, 50)                0         
_________________________________________________________________
dense_13 (Dense)             (None, 25)                1275      
_________________________________________________________________
activation_11 (Activation)   (None, 25)                0         
_________________________________________________________________
dense_14 (Dense)             (None, 2)                 52        
_________________________________________________________________
dense_15 (Dense)             (None, 25)                75        
_________________________________________________________________
activation_12 (Activation)   (None, 25)                0         
_________________________________________________________________
dense_16 (Dense)             (None, 50)                1300      
_________________________________________________________________
activation_13 (Activation)   (None, 50)                0         
_________________________________________________________________
dense_17 (Dense)             (None, 100)               5100      
_________________________________________________________________
activation_14 (Activation)   (None, 100)               0         
_________________________________________________________________
dense_18 (Dense)             (None, 784)               79184     
=================================================================
Total params: 170,536
Trainable params: 170,536
Non-trainable params: 0

Training

Input(x) and output(y) are same for autoencoders. We will train our model on training digit images for 30 epochs with a batch size of 64 images.

autoencoder.fit(train_data, train_data, epochs=30, batch_size=64, validation_data=(test_data, test_data))
Epoch 1/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0529 - val_loss: 0.0470
Epoch 2/30
938/938 [==============================] - 1s 1ms/step - loss: 0.0459 - val_loss: 0.0450
Epoch 3/30
938/938 [==============================] - 1s 2ms/step - loss: 0.0441 - val_loss: 0.0434
Epoch 4/30
938/938 [==============================] - 1s 2ms/step - loss: 0.0429 - val_loss: 0.0421
Epoch 5/30
938/938 [==============================] - 1s 2ms/step - loss: 0.0420 - val_loss: 0.0417
Epoch 6/30
938/938 [==============================] - 1s 2ms/step - loss: 0.0413 - val_loss: 0.0408
Epoch 7/30
938/938 [==============================] - 1s 2ms/step - loss: 0.0405 - val_loss: 0.0400
Epoch 8/30
938/938 [==============================] - 1s 2ms/step - loss: 0.0401 - val_loss: 0.0397
Epoch 9/30
938/938 [==============================] - 1s 2ms/step - loss: 0.0396 - val_loss: 0.0394
Epoch 10/30
938/938 [==============================] - 1s 2ms/step - loss: 0.0393 - val_loss: 0.0389
Epoch 11/30
938/938 [==============================] - 1s 2ms/step - loss: 0.0389 - val_loss: 0.0388
Epoch 12/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0389 - val_loss: 0.0390
Epoch 13/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0386 - val_loss: 0.0382
Epoch 14/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0385 - val_loss: 0.0383
Epoch 15/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0382 - val_loss: 0.0382
Epoch 16/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0380 - val_loss: 0.0380
Epoch 17/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0378 - val_loss: 0.0377
Epoch 18/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0377 - val_loss: 0.0377
Epoch 19/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0378 - val_loss: 0.0376
Epoch 20/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0375 - val_loss: 0.0378
Epoch 21/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0376 - val_loss: 0.0379
Epoch 22/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0376 - val_loss: 0.0375
Epoch 23/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0373 - val_loss: 0.0372
Epoch 24/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0376 - val_loss: 0.0378
Epoch 25/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0375 - val_loss: 0.0378
Epoch 26/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0372 - val_loss: 0.0381
Epoch 27/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0372 - val_loss: 0.0369
Epoch 28/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0370 - val_loss: 0.0371
Epoch 29/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0370 - val_loss: 0.0369
Epoch 30/30
938/938 [==============================] - 2s 2ms/step - loss: 0.0368 - val_loss: 0.0371

Prediction

Time to check how well it performs on the test datasets. We will plot the original as well as re-constructed images to check how good our model is performing-

# Real Images
for i in range(5):
    plt.subplot(550 + 1 + i)
    plt.imshow(testX[i], cmap='gray')
plt.show()

# Reconstructed Images
for i in range(5):
    plt.subplot(550 + 1 + i)
    output = autoencoder.predict(np.array([test_data[i]]))
    op_image = np.reshape(output[0]*255, (28, 28))
    plt.imshow(op_image, cmap='gray')
plt.show()

The first row has 5 real images, and the second row has corresponding reconstructed images. We can see that the output images are a little blurry still they are quite readable. In the last image, our model mistakes a 4 for a 9 as both digits are very similar in shape. With more training and deeper network, model should be able to learn the difference.

Autoencoders in Keras and Deep Learning
Model output | Autoencoders in Keras and Deep Learning

Reduced Dimensions

We saw that this small network was capable of re-constructing the original image using just two numbers(h: layer) from the encoding layer(bottleneck layer). Now these reduced dimensions can be used in other applications as features for these images. To verify the fact, let’s plot these embeddings(features) for all the test dataset digits.

dr_model = tensorflow.keras.models.Model(inputs=autoencoder.get_layer('input_3').input, outputs=autoencoder.get_layer('dense_14').output)
dr_model.summary()
x = []
y = []
z = []
for i in range(10000):
    z.append(testy[i])
    op = dr_model.predict(np.array([test_data[i]]))
    x.append(op[0][0])
    y.append(op[0][1])

df = pd.DataFrame()
df['x'] = x
df['y'] = y
df['z'] = ["digit-"+str(k) for k in z]

plt.figure(figsize=(8, 6))
sns.scatterplot(x='x', y='y', hue='z', data=df)
plt.show()

Each image from the test dataset is represented with 2-dimensions and is plotted as a point in the scatterplot below where color represents the label of the image. We can notice that the same digits are closer in the latent space. Even this simple autoencoder is able to separate out digits in different cluster-

Autoencoders in Keras and Deep Learning
Plotting Latent Dimensions | Autoencoders in Keras and Deep Learning

As we know that the convolutional neural networks are the best choices for image understanding/representations, results would have been even better with CNN-based autoencoder.

Here is my article where I have trained a CNN-based autoencoder to predict future frames in a video-

Python Predicts PUBG Mobile


3. Types of Autoencoders

There are various techniques(regularization techniques) to make autoencoders robust such that they learn the important features instead of just remembering and copying the input data into output.

Here are a few regularized variants of autoencoders that work really well in practice-

Sparse autoencoders (SAE)

Sparse autoencoders are typically used when we need to learn features for other tasks, such as classification. These autoencoders involve a sparsity penalty on the embedding layer(h-layer) in addition to the reconstruction error.

This penalty can be thought of as a regularization parameter to the simple autoencoder where the task was to simply copy the input into an output. This sparsity penalty makes the model learn better representative features for the dataset instead of just copying input into the output.

Denoising autoencoders (DAE)

Denoising autoencoders are trained by changing the reconstruction error term of the cost function(or loss). Traditionally the input data points and output data points are the same for an autoencoder but in the case of denoising autoencoders, the input data points are noisy in nature while the reconstructed data points are compared with the clean data point(without noise). In this manner, a DAE would learn to ignore(or remove) noise from the noisy dataset.

Denoising autoencoders have the ability to learn useful features from the input data and discard the useless information(like-noise) at the time of reconstruction. This makes them learn better representations of the data and they are trained to remove noise from noisy datasets.

Contractive autoencoders (CAE)

Denoising autoencoders typically work well when there is small and limited noise in the dataset, Contractive autoencoders on the other hand are robust to significant perturbations on the input data. Contractive autoencoders apply a contractive penalty on encoder output that encourages the derivatives of the encoder to be as small as possible. This penalty is basically the sum of squared elements of partial derivates from the encoder function.

Contractive autoencoders are trained to learn the distributions of input instead of copying it, thus they are robust to the slight variations in the inputs.

Variational autoencoders (VAE)

Variational autoencoders are different from the traditional autoencoders discussed above(like-denoising, sparse, contractive). These are generative models like Generative Adversarial Networks(GANs).

Traditional autoencoders basically learn latent representations of inputs using reconstruction error. The distribution of these latent representations is usually uneven(input-related). Variational autoencoders use reconstruction loss along with the KL-divergence loss. Here KL-divergence loss encourages the model to learn broader distributions of latent features.

These latent features from the Variational autoencoder are assumed to follow the Gaussian distribution when trained on sufficient training data. We can sample features from the latent distribution and form a generative model capable of generating new data without any input.


4. Summary

In this tutorial, we have gained a basic understanding of the autoencoders in Keras and deep learning. We have implemented a simple autoencoder network to learn representations on the MNIST handwritten digit dataset. We have also seen that the model was able to learn effective latent features for different digits.

We later talk about different variations of the autoencoders that make this algorithm robust by using different regularization techniques. I will explain these techniques in detail in my future tutorials.

You can find the python notebook here:

Github Repo: https://github.com/kartikgill/Autoencoders

Thanks for reading. Hope this article was helpful. Kindly give your feedback/suggestions by commenting below. See you in the next article.


Read Next >>

  1. Convolutional Denoising Autoencoders for image noise reduction
  2. Variational AutoEncoders and Image Generation with Keras
  3. Sentiment Classification with Deep Learning: RNN, LSTM, and CNN
  4. Optimizers explained for training Neural Networks
  5. Optimizing TensorFlow models with Quantization Techniques
  6. Deep Learning with PyTorch: Introduction
  7. Deep Learning with PyTorch: First Neural Network

Learn More (related research papers)

  1. Autoencoders, Unsupervised Learning, and Deep
    Architecture
  2. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
  3. Generative Adversarial Nets
  4. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
  5. Recent Advances in Autoencoder-Based Representation Learning

4 thoughts on “Autoencoders in Keras and Deep Learning

  1. Pingback: Variational AutoEncoders and Image Generation with Keras - Drops of AI

  2. Pingback: Convolutional Denoising Autoencoders for image noise reduction - Drops of AI

  3. Itamar Katz

    Thanks, very nice and clear. Just a correction, in the code above (and in the github notebook as well) you predict the reduced-dimension sample using “autoencoder” model, and seems it should be the “dr_model”

    1. Kartik Chaudhary Post author

      Thanks, @Itamar Katz for pointing out this mistake. I have updated the content accordingly. Thanks, again.

Comments are closed.