Class of algorithms in RL that optimizes the parameters of a policy directly through gradient ascent on expected future rewards.
Generality: 675