John Schulman

(12 articles)

1952

RL
Reinforcement Learning

Type of ML where an agent learns to make decisions by performing actions in an environment to achieve a goal, guided by rewards.

Generality: 890

1956

Motor Learning

Process by which robots or AI systems acquire, refine, and optimize motor skills through experience and practice.

Generality: 675

1992

Policy Gradient

Class of algorithms in RL that optimizes the parameters of a policy directly through gradient ascent on expected future rewards.

Generality: 675

2000

IRL
Inverse Reinforcement Learning

Technique in which an algorithm learns the underlying reward function of an environment based on observed behavior from an agent, essentially inferring the goals an agent is trying to achieve.

Generality: 658

2013

DRL
Deep Reinforcement Learning

Combines neural networks with a reinforcement learning framework, enabling AI systems to learn optimal actions through trial and error to maximize a cumulative reward.

Generality: 855

2015

TRPO
Trust Region Policy Optimization

Advanced algorithm used in RL to ensure stable and reliable policy updates by optimizing within a trust region, thus preventing drastic policy changes.

Generality: 635

2015

DQN
Deep Q-Networks

RL technique that combines Q-learning with deep neural networks to enable agents to learn how to make optimal decisions from high-dimensional sensory inputs.

Generality: 853

2016

Move 37

Pivotal move made by AlphaGo in its second game against Go champion Lee Sedol, which showcased the superior strategic capabilities of AI in the game of Go.

Generality: 140

2016

RLHF
Reinforcement Learning from Human Feedback

Technique that combines reinforcement learning (RL) with human feedback to guide the learning process towards desired outcomes.

Generality: 625

2016

Sample Efficiency

Ability of a ML model to achieve high performance with a relatively small number of training samples.

Generality: 815

2017

PPO
Proximal Policy Optimization

RL algorithm that aims to balance ease of implementation, sample efficiency, and reliable performance by using a simpler but effective update method for policy optimization.

Generality: 670

2021

Text-to-Code Model

AI models designed to translate natural language descriptions into executable code snippets, facilitating automation in software development and assisting developers.

Generality: 665

John Schulman

RLReinforcement Learning

Motor Learning

Policy Gradient

IRLInverse Reinforcement Learning

DRLDeep Reinforcement Learning

TRPOTrust Region Policy Optimization

DQNDeep Q-Networks

Move 37

RLHFReinforcement Learning from Human Feedback

Sample Efficiency

PPOProximal Policy Optimization

Text-to-Code Model

RL
Reinforcement Learning

IRL
Inverse Reinforcement Learning

DRL
Deep Reinforcement Learning

TRPO
Trust Region Policy Optimization

DQN
Deep Q-Networks

RLHF
Reinforcement Learning from Human Feedback

PPO
Proximal Policy Optimization