Michael G. Jordan

(1 article)

Policy Gradient Algorithm

Type of RL algorithm that optimizes the policy directly by computing gradients of expected rewards with respect to policy parameters.

Generality: 805