Sentiment Classification with Deep Learning: RNN, LSTM, and CNN

By | October 21, 2020

Sentiment classification is a common task in Natural Language Processing(NLP). There are various ways to do sentiment classification in Machine Learning (ML). In this article, we talk about how to perform sentiment classification with Deep Learning (Artificial Neural Networks).

In my previous two articles, We have already talked about how to perform sentiment analysis using different traditional machine learning algorithms with their performance comparisons on the IMDB movie reviews dataset.

Here are the links to my old articles on Sentiment Classification:

  1. Sentiment Analysis with Python: Bag of Words
  2. Sentiment Analysis with Python: TFIDF features

In this article, we will experiment with neural network-based architectures to perform the task of sentiment classification with Deep Learning techniques. We will experiment with four different architectures-Dense networks, Recurrent Neural Networks, Long short-term memory, and finally 1-dimensional Convolutional neural networks.

All four architectures utilize the Embedding layer from Tensorflow.Keras in order to learn the word embeddings during training for each input review. These embeddings are passed to four different algorithms described above. We try to keep all the models approximately similar in size(trainable parameter wise) and compare the performances later. Let’s get started on Sentiment Classification with Deep Learning.

The rest of the article is divided into following parts-

  1. Preparing IMDB reviews for Sentiment Analysis
  2. Multi-layer Perceptron (Dense Neural Network) model
  3. RNN (Recurrent Neural Network)
  4. LSTM (Long Short Term Memory)
  5. 1D CNN (Convolutional Neural Network)
  6. Summary
Sentiment Classification with Deep Learning: RNN, LSTM, and CNN | Image by Markus Winkler | Image Source

Preparing IMDB reviews for Sentiment Analysis

Just like my previous articles (links in Introduction) on Sentiment Analysis, We will work on the IMDB movie reviews dataset and experiment with four different deep learning architectures as described above.

Quick dataset background: IMDB movie review dataset is a collection of 50K movie reviews tagged with corresponding true sentiment value. Out of which 25K reviews belong to the ‘positive‘ category and the rest, 25K belong to the ‘negative‘ sentiment category.

You can download this dataset from Kaggle (URL is provided in the references below). Here is a quick peek into the data-

data = pd.read_csv("data/IMDB Dataset.csv")
print (data.shape)
data.head(10)
Sentiment Classification with Deep Learning: RNN, LSTM, and CNN
Sentiment Classification with Deep Learning: RNN, LSTM, and CNN

These movie reviews need some cleaning as we can clearly see that there are a few HTML tags and special characters present in sentences. These special symbols are not helpful in determining the emotion of any given sentiment. It’s a good idea to remove these symbols and pass the cleaner data to the algorithm.

Data Cleaning

Let’s remove the HTML-tags and other special characters from the reviews as they do not add any value to the sentiment of a given review (sentence). Additionally, let’s convert all the reviews to lowercase so that ‘Happy’ and ‘happy’ would be similar for the algorithm.

Below are some simple data cleaning techniques that are commonly used in natural language processing tasks-

def remove_html(text):
    bs = BeautifulSoup(text, "html.parser")
    return ' ' + bs.get_text() + ' '

def keep_only_letters(text):
    text=re.sub(r'[^a-zA-Z\s]',' ',text)
    return text

def convert_to_lowercase(text):
    return text.lower()

def clean_reviews(text):
    text = remove_html(text)
    text = keep_only_letters(text)
    text = convert_to_lowercase(text)
    return text

data['review'] = data['review'].apply(lambda review: clean_reviews(review))

Depending up0n the data, requirements, and problem statement, the data cleaning techniques could be different. The next step would be to know how many different words are present in our dataset.

Data Vocabulary Creation

Let’s check out how many unique words are present in the dataset and create a vocabulary out of them. This vocabulary of words would help us in converting these reviews into their numerical representations as ML algorithms(current implementations) only deal with numbers (passing the text directly is not yet supported).

In our vocabulary, each unique word is assigned with a unique index value. Using these index mappings of the words, reviews can be converted into a list of integers. This list of integers will later be passed to the model(ML algorithm) as input.

Let’s first split our dataset into train and test because this vocabulary should only be created on the training dataset. As the test data is supposed to be hidden for the model. Additionally, test data is allowed to have unseen(to the algorithm) words as we can’t control what kind of data comes in the future for inference.

imdb_train = data[:40000]
imdb_test = data[40000:]

We are taking the first 40K reviews as a training dataset and the remaining 10K reviews are kept aside as a test set. We will utilize the same partitions in all experiments to make sure that the comparison of results is fair.

Let’s create our word vocabulary from training dataset in Python notebook-

from collections import Counter

counter = Counter([words for reviews in imdb_train['review'] for words in reviews.split()])
df = pd.DataFrame()
df['key'] = counter.keys()
df['value'] = counter.values()
df.sort_values(by='value', ascending=False, inplace=True)

print (df.shape[0])
print (df[:10000].value.sum()/df.value.sum())
top_10k_words = list(df[:10000].key.values)
92279
0.9477437089109378

There are 92K unique words in the training dataset. We will be considering the most frequent first 10K words only for simplicity. We can see that these 10K unique words cover almost 95% of the total text in the training dataset, which seems good enough and things are much simpler.

Converting Reviews into lists of integers

Our vocabulary is ready, it’s time to convert the reviews into numerical form by replacing each word with its corresponding integer index value from the vocabulary. All the remaining out-of-vocabulary words (unseen words) will be assigned a common index-10000 as our vocabulary has indices from 0 to 9999 assigned already.

We will encode our training dataset as well as testing dataset using the same vocabulary. After this step our reviews are ready to be passed into ML models. Here is the python code-

def get_encoded_input(review):
    words = review.split()
    if len(words) > 500:
        words = words[:500]
    encoding = []
    for word in words:
        try:
            index = top_10k_words.index(word)
        except:
            index = 10000
        encoding.append(index)
    while len(encoding) < 500:
        encoding.append(10001)
    return encoding

training_data = np.array([get_encoded_input(review) for review in imdb_train['review']])
testing_data = np.array([get_encoded_input(review) for review in imdb_test['review']])
print (training_data.shape, testing_data.shape)
(40000, 500) (10000, 500)

Note: The maximum word length of a review is considered to be 500. If a review is shorter in length(<500 words), then it is padded with a different index (10001 in this case) to make it 500 words long. While only the first 500 words are considered when the reviews are longer than 500 words in length (we are doing this because the algorithm wants all the reviews to have a similar size).

The idea behind choosing this number as 500 was based on the distribution of word lengths of different reviews. We wanted it to be big enough such that we don’t lose much data. Additionally, a very big number would make the model complex. (This is something that can be taken as a parameter to be tuned).

Here is how we can create a histogram of lengths of reviews in our dataset. This shows that majority of the reviews are having word lengths close to 200

data['review_word_length'] = [len(review.split()) for review in data['review']]
data['review_word_length'].plot(kind='hist', bins=30)
plt.title('Word length distribution')
plt.show()
Sentiment Classification with Deep Learning: RNN, LSTM, and CNN
Sentiment Classification with Deep Learning: RNN, LSTM, and CNN

If we consider the length 500 here, we can see that most of the reviews are shorter than this and it seems like a fair cutoff value (I have not experimented with other values, this can be tuned).

Output labels into numerical form

Finally, let’s convert our output labels into the numerical format. This is the last step of data preparation and after this, we will jump into training various models. Positive sentiment is represented with the number 1 while negative sentiment is represented with a 0.

train_labels = [1 if sentiment=='positive' else 0 for sentiment in imdb_train['sentiment']]
test_labels = [1 if sentiment=='positive' else 0 for sentiment in imdb_test['sentiment']]
train_labels = np.array(train_labels)
test_labels = np.array(test_labels)

Multi-layer Perceptron (Dense Neural Network) model

Let’s define a simple Dense network to perform sentiment classification with deep learning. The first layer of the network would an Embedding Layer (Keras Embedding Layer) that will learn embeddings for different words during the network training itself.

Embedding Layer (Keras Embedding Layer): This layer trains with the network itself and learns fix-sized embeddings for every token (word in our case). We will basically pass an integer list of length 500 for each review along with the vocabulary size(10002) to the embedding layer as input. This embedding layer would take input reviews of dimensions 10002*500(unique indexes/words in input data * word-length) and transform this input into Embedding_size*500 dimensions, thus each word in the review will be represented by an embedding vector of size= Embedding_size (instead of 10002-like we represented in our encoding).

In our experiments, with the output_dim=32 parameter, the model will learn embeddings of size 32 for each dictionary(the one we have defined earlier) word. Thus we will get 500 embeddings of size 32 for each input review. After flattening those 500 embeddings of size 32 each, we will get a flat(one dimensional) output tensor of size=500*32=16000. This tensor is further passed to multiple densely connected layers. The final output layer with sigmoid activation generates a probability for each review representing the likelihood of the given input review being positive in sentiment. If the output probability is close to 0 then the sentiment is inferred as negative.

Here is the python (using tensorflow.keras API) implementation-

import tensorflow
from tensorflow.keras.layers import Activation

input_data = tensorflow.keras.layers.Input(shape=(500))

data = tensorflow.keras.layers.Embedding(input_dim=10002, output_dim=32, input_length=500)(input_data)

data = tensorflow.keras.layers.Flatten()(data)

data = tensorflow.keras.layers.Dense(16)(data)
data = tensorflow.keras.layers.Activation('relu')(data)
data = tensorflow.keras.layers.Dropout(0.5)(data)

data = tensorflow.keras.layers.Dense(8)(data)
data = tensorflow.keras.layers.Activation('relu')(data)
data = tensorflow.keras.layers.Dropout(0.5)(data)

data = tensorflow.keras.layers.Dense(4)(data)
data = tensorflow.keras.layers.Activation('relu')(data)
data = tensorflow.keras.layers.Dropout(0.5)(data)

data = tensorflow.keras.layers.Dense(1)(data)
output_data = tensorflow.keras.layers.Activation('sigmoid')(data)

model = tensorflow.keras.models.Model(inputs=input_data, outputs=output_data)

model.compile(loss='binary_crossentropy', optimizer='adam', metrics='accuracy')
model.summary()

Model is compiled with ‘binary_crossentropy‘ loss as it’s a binary classification problem. Here is the summary of the model-

Model: "functional_17"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_9 (InputLayer)         [(None, 500)]             0         
_________________________________________________________________
embedding_8 (Embedding)      (None, 500, 32)           320064    
_________________________________________________________________
flatten_8 (Flatten)          (None, 16000)             0         
_________________________________________________________________
dense_32 (Dense)             (None, 16)                256016    
_________________________________________________________________
activation_32 (Activation)   (None, 16)                0         
_________________________________________________________________
dropout_24 (Dropout)         (None, 16)                0         
_________________________________________________________________
dense_33 (Dense)             (None, 8)                 136       
_________________________________________________________________
activation_33 (Activation)   (None, 8)                 0         
_________________________________________________________________
dropout_25 (Dropout)         (None, 8)                 0         
_________________________________________________________________
dense_34 (Dense)             (None, 4)                 36        
_________________________________________________________________
activation_34 (Activation)   (None, 4)                 0         
_________________________________________________________________
dropout_26 (Dropout)         (None, 4)                 0         
_________________________________________________________________
dense_35 (Dense)             (None, 1)                 5         
_________________________________________________________________
activation_35 (Activation)   (None, 1)                 0         
=================================================================
Total params: 576,257
Trainable params: 576,257
Non-trainable params: 0

Let’s train this model and see how it performs-

model.fit(training_data, train_labels, epochs=10, batch_size=256, validation_data=(testing_data, test_labels))
Epoch 1/10
157/157 [==============================] - 3s 20ms/step - loss: 0.6934 - accuracy: 0.5025 - val_loss: 0.6931 - val_accuracy: 0.5005
Epoch 2/10
157/157 [==============================] - 3s 20ms/step - loss: 0.6911 - accuracy: 0.5161 - val_loss: 0.6746 - val_accuracy: 0.6201
Epoch 3/10
157/157 [==============================] - 3s 20ms/step - loss: 0.6187 - accuracy: 0.6736 - val_loss: 0.4977 - val_accuracy: 0.8556
Epoch 4/10
157/157 [==============================] - 3s 20ms/step - loss: 0.5396 - accuracy: 0.7637 - val_loss: 0.4433 - val_accuracy: 0.8651
Epoch 5/10
157/157 [==============================] - 3s 19ms/step - loss: 0.4654 - accuracy: 0.8156 - val_loss: 0.4423 - val_accuracy: 0.8325
Epoch 6/10
157/157 [==============================] - 3s 21ms/step - loss: 0.3999 - accuracy: 0.8511 - val_loss: 0.4657 - val_accuracy: 0.8680
Epoch 7/10
157/157 [==============================] - 3s 20ms/step - loss: 0.3679 - accuracy: 0.8657 - val_loss: 0.4480 - val_accuracy: 0.8535
Epoch 8/10
157/157 [==============================] - 3s 20ms/step - loss: 0.3295 - accuracy: 0.8816 - val_loss: 0.5505 - val_accuracy: 0.8697
Epoch 9/10
157/157 [==============================] - 3s 19ms/step - loss: 0.3064 - accuracy: 0.8915 - val_loss: 0.5173 - val_accuracy: 0.8599
Epoch 10/10
157/157 [==============================] - 3s 22ms/step - loss: 0.2878 - accuracy: 0.8983 - val_loss: 0.5476 - val_accuracy: 0.8624

This was a 570K parameter dense model and it achieves the best accuracy of close to 87% on the test dataset.


RNN (Recurrent Neural Network)

In the last exercise, we were flattening the learned embedding. In fact, we were not utilizing the sequential information of words in the reviews. This time, we will try to capture that sequential information using simple Recurrent Neural Networks (RNN).

Here is the python code for RNN based model-

import tensorflow
from tensorflow.keras.layers import Activation

input_data = tensorflow.keras.layers.Input(shape=(500))

data = tensorflow.keras.layers.Embedding(input_dim=10002, output_dim=32, input_length=500)(input_data)

data = tensorflow.keras.layers.Bidirectional(tensorflow.keras.layers.SimpleRNN(50))(data)

data = tensorflow.keras.layers.Dense(1)(data)
output_data = tensorflow.keras.layers.Activation('sigmoid')(data)

model = tensorflow.keras.models.Model(inputs=input_data, outputs=output_data)

model.compile(loss='binary_crossentropy', optimizer='adam', metrics='accuracy')
model.summary()
Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 500)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 500, 32)           320064    
_________________________________________________________________
bidirectional (Bidirectional (None, 100)               8300      
_________________________________________________________________
dense (Dense)                (None, 1)                 101       
_________________________________________________________________
activation (Activation)      (None, 1)                 0         
=================================================================
Total params: 328,465
Trainable params: 328,465
Non-trainable params: 0

Let’s train it-

model.fit(training_data, train_labels, epochs=10, batch_size=256, validation_data=(testing_data, test_labels))
Epoch 1/10
157/157 [==============================] - 48s 305ms/step - loss: 0.6839 - accuracy: 0.5548 - val_loss: 0.6907 - val_accuracy: 0.5250
Epoch 2/10
157/157 [==============================] - 55s 352ms/step - loss: 0.6743 - accuracy: 0.5792 - val_loss: 0.6689 - val_accuracy: 0.5985
Epoch 3/10
157/157 [==============================] - 49s 311ms/step - loss: 0.6374 - accuracy: 0.6433 - val_loss: 0.6313 - val_accuracy: 0.6435
Epoch 4/10
157/157 [==============================] - 53s 339ms/step - loss: 0.5445 - accuracy: 0.7265 - val_loss: 0.5861 - val_accuracy: 0.6838
Epoch 5/10
157/157 [==============================] - 51s 323ms/step - loss: 0.3828 - accuracy: 0.8332 - val_loss: 0.4239 - val_accuracy: 0.8258
Epoch 6/10
157/157 [==============================] - 52s 334ms/step - loss: 0.2664 - accuracy: 0.8942 - val_loss: 0.5263 - val_accuracy: 0.7429
Epoch 7/10
157/157 [==============================] - 52s 334ms/step - loss: 0.2234 - accuracy: 0.9125 - val_loss: 0.5065 - val_accuracy: 0.7780
Epoch 8/10
157/157 [==============================] - 50s 321ms/step - loss: 0.1269 - accuracy: 0.9573 - val_loss: 0.4620 - val_accuracy: 0.8289
Epoch 9/10
157/157 [==============================] - 51s 325ms/step - loss: 0.0713 - accuracy: 0.9796 - val_loss: 0.5079 - val_accuracy: 0.8270
Epoch 10/10
157/157 [==============================] - 49s 310ms/step - loss: 0.0426 - accuracy: 0.9895 - val_loss: 0.5668 - val_accuracy: 0.8280

Simple RNN based models are not very good at capturing long-term contexts. Thus it does not perform quite well and achieves an accuracy of close to 83%.


LSTM (Long Short-Term Memory)

LSTM (Long short-term Memory) networks were designed to address the problem of remembering longer contexts(wrt. to simple RNNs). In this experiment, we will pass our embeddings to Bidirectional LSTM cells. Let’s check out how it works on our test dataset.

Here is the python implementation of LSTM based model-

import tensorflow
from tensorflow.keras.layers import Activation

input_data = tensorflow.keras.layers.Input(shape=(500))

data = tensorflow.keras.layers.Embedding(input_dim=10002, output_dim=32, input_length=500)(input_data)

data = tensorflow.keras.layers.Bidirectional(tensorflow.keras.layers.LSTM(50))(data)

data = tensorflow.keras.layers.Dense(1)(data)
output_data = tensorflow.keras.layers.Activation('sigmoid')(data)

model = tensorflow.keras.models.Model(inputs=input_data, outputs=output_data)

model.compile(loss='binary_crossentropy', optimizer='adam', metrics='accuracy')
model.summary()
Model: "functional_25"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_17 (InputLayer)        [(None, 500)]             0         
_________________________________________________________________
embedding_16 (Embedding)     (None, 500, 32)           320064    
_________________________________________________________________
bidirectional_2 (Bidirection (None, 100)               33200     
_________________________________________________________________
dense_16 (Dense)             (None, 1)                 101       
_________________________________________________________________
activation_15 (Activation)   (None, 1)                 0         
=================================================================
Total params: 353,365
Trainable params: 353,365
Non-trainable params: 0

We have defined a pretty simple LSTM based model with just 50 cells. Let’s train it and check how it performs on our test dataset.

model.fit(training_data, train_labels, epochs=10, batch_size=128, validation_data=(testing_data, test_labels))
Epoch 1/10
313/313 [==============================] - 140s 449ms/step - loss: 0.5606 - accuracy: 0.7155 - val_loss: 0.4979 - val_accuracy: 0.7681
Epoch 2/10
313/313 [==============================] - 142s 453ms/step - loss: 0.3372 - accuracy: 0.8625 - val_loss: 0.3041 - val_accuracy: 0.8714
Epoch 3/10
313/313 [==============================] - 142s 455ms/step - loss: 0.2705 - accuracy: 0.8979 - val_loss: 0.3145 - val_accuracy: 0.8777
Epoch 4/10
313/313 [==============================] - 155s 496ms/step - loss: 0.2132 - accuracy: 0.9235 - val_loss: 0.3785 - val_accuracy: 0.8484
Epoch 5/10
313/313 [==============================] - 153s 488ms/step - loss: 0.2182 - accuracy: 0.9210 - val_loss: 0.3360 - val_accuracy: 0.8748
Epoch 6/10
313/313 [==============================] - 159s 507ms/step - loss: 0.1941 - accuracy: 0.9291 - val_loss: 0.3379 - val_accuracy: 0.8725
Epoch 7/10
313/313 [==============================] - 181s 577ms/step - loss: 0.1592 - accuracy: 0.9448 - val_loss: 0.3238 - val_accuracy: 0.8800
Epoch 8/10
313/313 [==============================] - 164s 523ms/step - loss: 0.1471 - accuracy: 0.9496 - val_loss: 0.3347 - val_accuracy: 0.8791
Epoch 9/10
313/313 [==============================] - 166s 531ms/step - loss: 0.1371 - accuracy: 0.9535 - val_loss: 0.3633 - val_accuracy: 0.8805
Epoch 10/10
313/313 [==============================] - 196s 628ms/step - loss: 0.1229 - accuracy: 0.9598 - val_loss: 0.4345 - val_accuracy: 0.8161

This was a pretty simple model with just 350K parameters, still it achieves an accuracy of close to 88% on our test data.


1D CNN (Convolutional Neural Network)

Recent research in the fields of NLP and Speech Recognition shows that Convolutional Neural Networks are really good at capturing long-term sequential dependencies. Additionally, as these networks are convolutional, they take advantage of parallel training and are much faster and scalable in practice than LSTM based networks.

In this experiment, we will pass sequential embeddings of words into multiple stacked layers of a 1D Convolution based network for the task of sentiment classification with deep learning.

Here is the python implementation for CNN based sentiment classifier-

import tensorflow

input_data = tensorflow.keras.layers.Input(shape=(500))

data = tensorflow.keras.layers.Embedding(input_dim=10002, output_dim=32, input_length=500)(input_data)

data = tensorflow.keras.layers.Conv1D(50, kernel_size=3, activation='relu')(data)
data = tensorflow.keras.layers.MaxPool1D(pool_size=2)(data)

data = tensorflow.keras.layers.Conv1D(40, kernel_size=3, activation='relu')(data)
data = tensorflow.keras.layers.MaxPool1D(pool_size=2)(data)

data = tensorflow.keras.layers.Conv1D(30, kernel_size=3, activation='relu')(data)
data = tensorflow.keras.layers.MaxPool1D(pool_size=2)(data)

data = tensorflow.keras.layers.Conv1D(30, kernel_size=3, activation='relu')(data)
data = tensorflow.keras.layers.MaxPool1D(pool_size=2)(data)

data = tensorflow.keras.layers.Flatten()(data)

data = tensorflow.keras.layers.Dense(20)(data)
data = tensorflow.keras.layers.Dropout(0.5)(data)

data = tensorflow.keras.layers.Dense(1)(data)
output_data = tensorflow.keras.layers.Activation('sigmoid')(data)

model = tensorflow.keras.models.Model(inputs=input_data, outputs=output_data)

model.compile(loss='binary_crossentropy', optimizer='adam', metrics='accuracy')
model.summary()
Model: "functional_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         [(None, 500)]             0         
_________________________________________________________________
embedding_4 (Embedding)      (None, 500, 32)           320064    
_________________________________________________________________
conv1d_9 (Conv1D)            (None, 498, 50)           4850      
_________________________________________________________________
max_pooling1d_9 (MaxPooling1 (None, 249, 50)           0         
_________________________________________________________________
conv1d_10 (Conv1D)           (None, 247, 40)           6040      
_________________________________________________________________
max_pooling1d_10 (MaxPooling (None, 123, 40)           0         
_________________________________________________________________
conv1d_11 (Conv1D)           (None, 121, 30)           3630      
_________________________________________________________________
max_pooling1d_11 (MaxPooling (None, 60, 30)            0         
_________________________________________________________________
conv1d_12 (Conv1D)           (None, 58, 30)            2730      
_________________________________________________________________
max_pooling1d_12 (MaxPooling (None, 29, 30)            0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 870)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 20)                17420     
_________________________________________________________________
dropout_3 (Dropout)          (None, 20)                0         
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 21        
_________________________________________________________________
activation_4 (Activation)    (None, 1)                 0         
=================================================================
Total params: 354,755
Trainable params: 354,755
Non-trainable params: 0

Though this model is deeper it is still efficient parameter-wise and let’s see how fast it trains and how good it performs-

model.fit(training_data, train_labels, epochs=10, batch_size=256, validation_data=(testing_data, test_labels))
Epoch 1/10
157/157 [==============================] - 28s 180ms/step - loss: 0.6393 - accuracy: 0.5771 - val_loss: 0.3717 - val_accuracy: 0.8485
Epoch 2/10
157/157 [==============================] - 27s 175ms/step - loss: 0.2787 - accuracy: 0.8913 - val_loss: 0.2644 - val_accuracy: 0.8974
Epoch 3/10
157/157 [==============================] - 28s 177ms/step - loss: 0.1885 - accuracy: 0.9319 - val_loss: 0.2611 - val_accuracy: 0.9004
Epoch 4/10
157/157 [==============================] - 30s 193ms/step - loss: 0.1398 - accuracy: 0.9523 - val_loss: 0.3670 - val_accuracy: 0.8766
Epoch 5/10
157/157 [==============================] - 30s 191ms/step - loss: 0.1009 - accuracy: 0.9664 - val_loss: 0.3944 - val_accuracy: 0.8888
Epoch 6/10
157/157 [==============================] - 29s 184ms/step - loss: 0.0651 - accuracy: 0.9788 - val_loss: 0.4523 - val_accuracy: 0.8902
Epoch 7/10
157/157 [==============================] - 29s 183ms/step - loss: 0.0435 - accuracy: 0.9866 - val_loss: 0.4840 - val_accuracy: 0.8872
Epoch 8/10
157/157 [==============================] - 28s 177ms/step - loss: 0.0261 - accuracy: 0.9919 - val_loss: 0.6312 - val_accuracy: 0.8873
Epoch 9/10
157/157 [==============================] - 29s 182ms/step - loss: 0.0287 - accuracy: 0.9911 - val_loss: 0.7780 - val_accuracy: 0.8845
Epoch 10/10
157/157 [==============================] - 31s 200ms/step - loss: 0.0107 - accuracy: 0.9973 - val_loss: 0.9143 - val_accuracy: 0.8835

Model trains much faster(wrt. LSTM and RNN based networks) and achieves the best accuracy of close to 90% on our test dataset. This simpler model beats all the other results without having a huge number of parameters. With increased capacity, results might improve further.


Summary (Sentiment Classification with Deep Learning)

In this article, we have explored different ways of performing Sentiment Classification with Deep Learning architectures. We have experimented with four different architectures for the task of sentiment classification. We found out that the 1D CNN-based model gives the best results with an approximately similar number of trainable parameters.

Though this comparison is not fair as our dataset was small and we have not tested the potential of each of the architecture with different settings (hyper-parameter tuning). One thing is still clear that 1D CNNs are good at capturing sequential dependency and are faster and efficient for such tasks.

Here is the comparison table-

Variant Trainable ParametersAccuracy (%)
Dense Network (MLP)576,25787
RNN328,46583
LSTM353,36588
1D CNN354,75590
Sentiment Classification with Deep Learning: RNN, LSTM, and CNN

Github repo: https://github.com/kartikgill/SentimentAnalysis

Thanks for reading! Hope this article was helpful for you. Please share your comments and feedback with me by commenting below. See you in the next article.


Read Next >>>

  1. Sentiment Analysis with Python: Bag of Words
  2. Sentiment Analysis with Python: TFIDF features
  3. Deep Learning with PyTorch: Introduction
  4. Deep Learning with PyTorch: First Neural Network
  5. 1D-CNN based Fully Convolutional Model for Handwriting Recognition

References

  1. Dataset Citation: Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
  2. Downloaded from: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

One thought on “Sentiment Classification with Deep Learning: RNN, LSTM, and CNN

  1. Pingback: Autoencoders in Keras and Deep Learning - Drops of AI

Comments are closed.