Philipp Moritz
(2 articles)
2015
TRPO
Trust Region Policy Optimization
Trust Region Policy Optimization
Advanced algorithm used in RL to ensure stable and reliable policy updates by optimizing within a trust region, thus preventing drastic policy changes.
Generality: 635

2017
PPO
Proximal Policy Optimization
Proximal Policy Optimization
RL algorithm that aims to balance ease of implementation, sample efficiency, and reliable performance by using a simpler but effective update method for policy optimization.
Generality: 670