Explaining Reinforcement Learning to your next-door-neighbor

By | July 28, 2020

An intuitive introduction to Reinforcement Learning

Reinforcement Learning(RL) is a very interesting sub-field of Machine Learning(ML). While other ML techniques rely on static input-output pairs to learn the hidden rules and then apply those rules on the unseen data to get the possible outcomes. A Reinforcement Learning algorithm tends to learn the best decisions automatically over time. 

RL techniques are widely used in solving puzzles and developing smart agents capable of defeating humans in hundreds of different games. Apart from this, there are multiple practical applications of RL like — 

  1. Robotics: Industrial automation
  2.  Developing student training systems
  3. RL based Neural Architecture Search(NAS)

Let’s learn how RL works — 

=> Quick peek into the content:-

  1. Why RL if we have other ML techniques?
  2. What is Reinforcement Learning?
  3. Complications of Reinforcement Learning.
  4. Conclusion

1. Why RL if we have other ML techniques?

Machine Learning techniques like — supervised learning, unsupervised learning, learn from given latent historical data, and then are deployed to produce results on the unseen(future) data. The goodness of such models depends upon the quality of training data given. These models fail abruptly when/if some new unseen(new variety of data which was not present in the training set) examples come into the picture. 

Reinforcement learning-based algorithms are able to address such issues. RL models are designed in such a way that they learn the variations of the data with time and keep the performance high. Let’s learn more about RL — 

2. What is Reinforcement Learning?

An RL based algorithm learns to make the best decisions automatically over time. It learns from the mistakes done in the past and attempts to make the best decisions at each point in time in the future. This approach of learning from experience is very similar to how humans learn and grow. This idea draws RL closer to the purpose of Artificial Intelligence. Let’s dive into more details of RL — 

Hope everyone can recollect memories of old snake-feeding games, if not, then the following video will definitely remind you — 

https://gfycat.com/wildunevenhackee
https://gfycat.com/wildunevenhackee

Now let’s write down five magic words, whole RL thing will circle around these five terms — 

a. Agent

b. Environment

c. Action

d. Observation

e. Reward

Let’s understand these five terms while relating to the snake-feeding game.

a & b. Agent and Environment

Every RL problem can be broken into two major blocks — 1.Agent 2.Environment. An agent is something that can do a few things(a set of well-defined things) and the goal of our RL algorithm is to teach this agent to do those things in such a way that a particular objective(defined by solver) is achieved. Apart from the agent, everything else is called the environment. Agents perform all their activities in the environments and keep changing the states of environments at each step.

Example: Relating to the snake game

Here — Snake is the agent and the entire green playground along with the bait/food is the environment. This snake can do things, it can walk-straight, turn-left, turn-right. The solver can define an objective like — eat as many baits as possible ‘or’ eat 100 baits. Now our trained and working RL algorithm should instruct this snake to take appropriate actions such that the solver’s objective is achieved.

c, d& e. Action, Observation, and Reward

As discussed, RL agents do certain things and can change the state of the environment at each step. A well-defined set of those things is called the action space of the given agent. At each step, the agent picks one action from the action space(Randomly — if no RL algorithm is implemented) and performs it. 

This action might change the environment in a certain way, and this change in the environment by the performed action is called the Observation. Each action step is associated with some reward (ie. a scalar value- 2, 5, 100… anything) and observation. The reward is defined by the problem-solver, such that more reward takes the agent closer to the objective.

Summing up:- At each step, the agent will perform an action and get some reward and record the observation. Here — the primary goal of our RL algorithm is to help the agent pick the best action at each step so that it gets a good reward every time and eventually completes the objective. 

Observations and Rewards collected(from past actions) help the RL algorithm understand the environment and decide the next best move for the agent. Most of the time some supervised learning algorithm is used to decide the best move. Usually, the observations from games are screenprints of the environment, thus Deep Learning algorithms(Convolutional Neural Networks) are used very frequently.

Example: relating to the snake game

In our snake feeding game, the RL algorithm is supposed to give the snake instructions(one of the — go straight, turn left, turn right) at each step. These commands should help the snake eat sufficient baits so that our objective is completed. RL model needs to keep one thing in mind that — snake mustn’t die before completing the objective, otherwise — Game Over. Once our game is over, episode completes and we need to start over. 

Episode

Just another term used in RL-vocabulary to define the end of the task. If your episode ends before completing the objective then your RL algorithm needs more tweaking/training/enhancements. 

Reinforcement Learning is so cool! 

Why don’t we apply RL in all our ML problems?

I guess now everyone has a fair understanding of how this whole RL thing works. The concept seems pretty simple and intuitive, but truly there are few hurdles we encounter while implementing such algorithms. Let’s learn more about them–

3. Complications of RL

Here are a few things which make the development of RL based models complicated — 

I. Suppose our agent keeps making mistakes and doesn’t gather any award. Now observations from such mistakes are not fruitful and may not show the agent how to earn reward going further. The agent may suffer in such situations and RL might fail to solve such scenarios.

II. Imagine you are writing an RL algorithm to play and win Chess. Now chess is a different kind of challenging game where your moves might not make sense at the start but they might make a big difference going deep into the game. In such problems, you can’t decide the reward for each step. You can’t write a reward function here. There is only one reward — win the game and you get it only when the game is finished. This makes it difficult for RL agents to choose the best action at each step. 

Before jumping into the third complication, let’s me first introduce two more important terms — 

Story of Exploration/Exploitation

Imagine you have recently moved to a new city and you have thousands of dinner places around your house. Every evening, you have a choice to make like — Should I explore a new place today? or Eat Freddy’s delicious chicken wings? Now as you already know that Freddy’s is good, it’s never a bad idea to eat there again. But if you don’t explore newer restaurants you will never discover more interesting places and even the best ones. Again at a cost — you might need to go through many bad places before finding a good place and also you might actually not find a single good place. This kind of situation may arrive very frequently in real life. For example — changing jobs, changing smartphone brands…etc.

III. Exploration/Exploitation dilemma in Reinforcement Learning

RL agents also face such situations, exploring is necessary as a good reward might be waiting somewhere un-explored, and exploiting the already studied behavior(through observations) is necessary otherwise your agent is as good as random. Solver always needs to find a balance between these two in order to design an efficient RL agent.

4. Conclusion

Yes, these complications have always been there and still are. But as researchers are stubborn, RL based algorithms have seen drastic improvements over time. Reinforcement Learning is becoming more and more interesting and active as a field of research. It’s time to find a few relevant business problems and start solving them efficiently using RL based algorithms.

References:

  1. Understanding from: Lapan, M. (2018). Deep Reinforcement Learning Hands-On. Birmingham, UK: Packt Publishing.
  2. GIF from: https://gfycat.com/wildunevenhackee
  3. Image from: https://itnext.io/reinforcement-learning-with-q-tables-5f11168862c8

Thanks for reading! Kindly share your feedback/comments.

You might be interested in reading the second part–

Coming Soon

2 thoughts on “Explaining Reinforcement Learning to your next-door-neighbor

Comments are closed.