John D. Williams
(2 articles)1988
Temporal Difference Learning
A method in reinforcement learning that updates predictions based on the difference between successive predictions, rather than solely relying on final outcome errors.
Generality: 775
1992
Policy Gradient Algorithm
Type of RL algorithm that optimizes the policy directly by computing gradients of expected rewards with respect to policy parameters.
Generality: 805