Introduction To Reinforcement Learning Summary — Part 2

LZP Data Science
3 min readSep 4, 2022

Building on Part 1 of an introduction to reinforcement learning, we looked at the underlying logic of reinforcement learning for the algorithm to learn an optimal policy and to take better actions that lead to good outcomes.

We also looked at the Q-function, which helps to adjust the action value after performing it on a state based on the reward output.

However, a limitation we encountered was that the existing logic discussed in Part 1 is myopic, only looking at the immediate reward and not future scenarios.

Now, we would want to avoid such scenarios in reinforcement learning, for example:

  • A cure for a disease might get rid of it immediately, but it could lead to significant health complications in the long run, making this a counterproductive solution.

This post will look at ways to shift the logic discussed earlier to develop a non-myopic policy where the optimal approach considers immediate and subsequent rewards.

--

--