Q-Value

In reinforcement learning, the Q-value, or action-value, quantifies the expected utility of taking a specific action in a given state, under a particular policy. This measure is crucial for agents to make decisions that maximize cumulative rewards. Q-values are updated iteratively through learning processes such as Q-learning or Deep Q-Networks (DQN), using the Bellman equation. This equation recursively adjusts Q-values based on the reward received for an action plus the highest Q-value for the next state, discounted by a factor that represents the importance of future rewards.

The concept of Q-value was introduced in conjunction with the development of Q-learning by Christopher Watkins in his 1989 PhD thesis. It gained prominence in the early 1990s as a foundational element in various reinforcement learning algorithms.

Christopher Watkins is credited with the development of the Q-learning algorithm, which centrally features the use of Q-values. His work laid the groundwork for many advances in reinforcement learning, including the exploration of efficient methods for updating Q-values in complex environments.

Q-Value

Key Contributors

Newsletter

Academic Papers

Deep reinforcement learning: A brief survey

Q-learning algorithms: A comprehensive classification and applications

Cooperative deep Q-learning with Q-value transfer for multi-intersection signal control

Modeling of route planning system based on Q value-based dynamic programming with multi-agent reinforcement learning algorithms

Q-value path decomposition for deep multiagent reinforcement learning