Member-only story
Introduction To Reinforcement Learning Summary — Part 1
3 min readAug 28, 2022
- Quantifying P(s, a, s′) where the probability of each state > action > new state.
- The goal of reinforcement learning is for the algorithm to learn an optimal policy and take an optimal action when presented with state s.
- Over time, we reinforce actions that lead to good outcomes and penalise actions that lead to poor outcomes.
Optimal Policy
- It aims to maximise the average reward over time.
- Should be able to account for the outcome and costs.
- Non-myopic (Think about the immediate reward and also for future scenarios).
- Near-term impacts have a heavier weight than longer-term scenarios.
Reinforcement Learning Solution Setup
- Discretise each continuous value in the current state to n bins.
- We apply the same discretisation logic for each action as well.