John D. Williams

(2 articles)

1988

Temporal Difference Learning

A method in reinforcement learning that updates predictions based on the difference between successive predictions, rather than solely relying on final outcome errors.

Generality: 775

1992

Policy Gradient Algorithm

Type of RL algorithm that optimizes the policy directly by computing gradients of expected rewards with respect to policy parameters.

Generality: 805