An introduction to Q-Learning: reinforcement learning (2024)

/ #Machine Learning
An introduction to Q-Learning: reinforcement learning (1)

by ADL

An introduction to Q-Learning: reinforcement learning (2)

This article is the second part of my “Deep reinforcement learning” series. The complete series shall be available both on Medium and in videos on my YouTube channel.

In the first part of the series we learnt the basics of reinforcement learning.

Q-learning is a values-based learning algorithm in reinforcement learning. In this article, we learn about Q-Learning and its details:

  • What is Q-Learning ?
  • Mathematics behind Q-Learning
  • Implementation using python

Q-Learning — a simplistic overview

Let’s say that a robot has to cross a maze and reach the end point. There are mines, and the robot can only move one tile at a time. If the robot steps onto a mine, the robot is dead. The robot has to reach the end point in the shortest time possible.

The scoring/reward system is as below:

  1. The robot loses 1 point at each step. This is done so that the robot takes the shortest path and reaches the goal as fast as possible.
  2. If the robot steps on a mine, the point loss is 100 and the game ends.
  3. If the robot gets power ⚡️, it gains 1 point.
  4. If the robot reaches the end goal, the robot gets 100 points.

Now, the obvious question is: How do we train a robot to reach the end goal with the shortest path without stepping on a mine?

An introduction to Q-Learning: reinforcement learning (3)

So, how do we solve this?

Introducing the Q-Table

Q-Table is just a fancy name for a simple lookup table where we calculate the maximum expected future rewards for action at each state. Basically, this table will guide us to the best action at each state.

An introduction to Q-Learning: reinforcement learning (4)

There will be four numbers of actions at each non-edge tile. When a robot is at a state it can either move up or down or right or left.

So, let’s model this environment in our Q-Table.

In the Q-Table, the columns are the actions and the rows are the states.

An introduction to Q-Learning: reinforcement learning (5)

Each Q-table score will be the maximum expected future reward that the robot will get if it takes that action at that state. This is an iterative process, as we need to improve the Q-Table at each iteration.

But the questions are:

  • How do we calculate the values of the Q-table?
  • Are the values available or predefined?

To learn each value of the Q-table, we use the Q-Learning algorithm.

Mathematics: the Q-Learning algorithm

Q-function

The Q-function uses the Bellman equation and takes two inputs: state (s) and action (a).

An introduction to Q-Learning: reinforcement learning (6)

Using the above function, we get the values of Q for the cells in the table.

When we start, all the values in the Q-table are zeros.

There is an iterative process of updating the values. As we start to explore the environment, the Q-function gives us better and better approximations by continuously updating the Q-values in the table.

Now, let’s understand how the updating takes place.

Introducing the Q-learning algorithm process

An introduction to Q-Learning: reinforcement learning (7)

Each of the colored boxes is one step. Let’s understand each of these steps in detail.

Step 1: initialize the Q-Table

We will first build a Q-table. There are n columns, where n= number of actions. There are m rows, where m= number of states. We will initialise the values at 0.

An introduction to Q-Learning: reinforcement learning (8)
An introduction to Q-Learning: reinforcement learning (9)

In our robot example, we have four actions (a=4) and five states (s=5). So we will build a table with four columns and five rows.

Steps 2 and 3: choose and perform an action

This combination of steps is done for an undefined amount of time. This means that this step runs until the time we stop the training, or the training loop stops as defined in the code.

We will choose an action (a) in the state (s) based on the Q-Table. But, as mentioned earlier, when the episode initially starts, every Q-value is 0.

So now the concept of exploration and exploitation trade-off comes into play. This article has more details.

We’ll use something called the epsilon greedy strategy.

In the beginning, the epsilon rates will be higher. The robot will explore the environment and randomly choose actions. The logic behind this is that the robot does not know anything about the environment.

As the robot explores the environment, the epsilon rate decreases and the robot starts to exploit the environment.

During the process of exploration, the robot progressively becomes more confident in estimating the Q-values.

For the robot example, there are four actions to choose from: up, down, left, and right. We are starting the training now — our robot knows nothing about the environment. So the robot chooses a random action, say right.

An introduction to Q-Learning: reinforcement learning (10)

We can now update the Q-values for being at the start and moving right using the Bellman equation.

Steps 4 and 5: evaluate

Now we have taken an action and observed an outcome and reward.We need to update the function Q(s,a).

An introduction to Q-Learning: reinforcement learning (11)

In the case of the robot game, to reiterate the scoring/reward structure is:

  • power = +1
  • mine = -100
  • end = +100
An introduction to Q-Learning: reinforcement learning (12)
An introduction to Q-Learning: reinforcement learning (13)

We will repeat this again and again until the learning is stopped. In this way the Q-Table will be updated.

Python implementation of Q-Learning

The concept and code implementation are explained in my video.

Subscribe to my YouTube channel For more AI videos : ADL .

At last…let us recap

  • Q-Learning is a value-based reinforcement learning algorithm which is used to find the optimal action-selection policy using a Q function.
  • Our goal is to maximize the value function Q.
  • The Q table helps us to find the best action for each state.
  • It helps to maximize the expected reward by selecting the best of all possible actions.
  • Q(state, action) returns the expected future reward of that action at that state.
  • This function can be estimated using Q-Learning, which iteratively updates Q(s,a) using the Bellman equation.
  • Initially we explore the environment and update the Q-Table. When the Q-Table is ready, the agent will start to exploit the environment and start taking better actions.

Next time we’ll work on a deep Q-learning example.

Until then, enjoy AI ?.

Important: As stated earlier, this article is the second part of my “Deep Reinforcement Learning” series. The complete series shall be available both in articles on Medium and in videos on my YouTube channel.

If you liked my article, please click the ? to help me stay motivated to write articles. Please follow me on Medium and other social media:

An introduction to Q-Learning: reinforcement learning (14)
An introduction to Q-Learning: reinforcement learning (15)
An introduction to Q-Learning: reinforcement learning (16)

If you have any questions, please let me know in a comment below or on Twitter.

Subscribe to my YouTube channel for more tech videos.

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

If this article was helpful, .

Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

ADVERTIsem*nT

An introduction to Q-Learning: reinforcement learning (2024)
Top Articles
The Nikki Catsouras death - HERE the incredible photos | Horror Galore
Remembering Nikki Catsouras: The Tragic and Haunting Photos - This Week in Libraries
ALLEN 'CHAINSAW' KESSLER | LAS VEGAS, NV, United States
Vons Credit Union Routing Number
Ann Taylor Assembly Row
Darshelle Stevens Thothub
What Is Carrier Default App? Everything You Need To Know - Mobile Soon
The 10 Best Drury Hotels in the United States
Lojë Shah me kompjuterin në internet. Luaj falas
Large Storage Unit Nyt Crossword
American Airlines Companion Certificate Blackout Dates 2023
Schmidt & Schulta Funeral Home Obituaries
Wall Street Journal Currency Exchange Rates Historical
Best Charter Schools Tampa
Warped Pocket Dimension
Slmd Skincare Appointment
Craigslist Tools Las Cruces Nm
8042872020
ZQuiet Review | My Wife and I Both Tried ZQuiet for Snoring
Uca Cheerleading Nationals 2023
Brise Stocktwits
Emmi Sellers Cheerleader
Gopher Hockey Forum
Www.publicsurplus.com Motor Pool
Cool Motion matras kopen bij M line? Sleep well. Move better
Sdn Upstate 2023
Foreign Languages Building
Orileys Auto Near Me
Work with us | Pirelli
Ufc 281 Tapology
Used Fuel Tanks For Sale Craigslist
Operation Fortune Showtimes Near Century Rio 24
Megan Hall Bikini
Mcdonald Hours Near Me
Sentara Norfolk General Visiting Hours
The Abduction of Heather Teague
Ridgid Pro Tool Storage System
Scholastic to kids: Choose your gender
Donald Vacanti Obituary
Alt J Artist Presale Code
Fedex Express Location Near Me
cGMP vs GMP: What's the Difference? | Ascendia Pharma
Best Th13 Base
Leuke tips & bezienswaardigheden voor een dagje Wijk bij Duurstede
Claudy Jongstra on LinkedIn: Tonight at 7 PM opens NAP+, a new, refreshing and enriching addition to…
Topic: Prisoners in the United States
Thoren Bradley Lpsg
Lesson 8 Skills Practice Solve Two-Step Inequalities Answer Key
Why Did Jen Lewis Leave Wavy 10
Wrdu Contests
Pnp Telegram Group
tweedehands auto kopen in Gilze en Rijen
Latest Posts
Article information

Author: Dr. Pierre Goyette

Last Updated:

Views: 5430

Rating: 5 / 5 (70 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Dr. Pierre Goyette

Birthday: 1998-01-29

Address: Apt. 611 3357 Yong Plain, West Audra, IL 70053

Phone: +5819954278378

Job: Construction Director

Hobby: Embroidery, Creative writing, Shopping, Driving, Stand-up comedy, Coffee roasting, Scrapbooking

Introduction: My name is Dr. Pierre Goyette, I am a enchanting, powerful, jolly, rich, graceful, colorful, zany person who loves writing and wants to share my knowledge and understanding with you.