John Schulman

(12 articles)
RL (Reinforcement Learning)
1952

RL
Reinforcement Learning

Type of ML where an agent learns to make decisions by performing actions in an environment to achieve a goal, guided by rewards.

Generality: 890

Motor Learning
1956

Motor Learning

Process by which robots or AI systems acquire, refine, and optimize motor skills through experience and practice.

Generality: 675

Policy Gradient
1992

Policy Gradient

Class of algorithms in RL that optimizes the parameters of a policy directly through gradient ascent on expected future rewards.

Generality: 675

IRL (Inverse Reinforcement Learning)
2000

IRL
Inverse Reinforcement Learning

Technique in which an algorithm learns the underlying reward function of an environment based on observed behavior from an agent, essentially inferring the goals an agent is trying to achieve.

Generality: 658

DRL (Deep Reinforcement Learning)
2013

DRL
Deep Reinforcement Learning

Combines neural networks with a reinforcement learning framework, enabling AI systems to learn optimal actions through trial and error to maximize a cumulative reward.

Generality: 855

TRPO (Trust Region Policy Optimization)
2015

TRPO
Trust Region Policy Optimization

Advanced algorithm used in RL to ensure stable and reliable policy updates by optimizing within a trust region, thus preventing drastic policy changes.

Generality: 635

DQN (Deep Q-Networks)
2015

DQN
Deep Q-Networks

RL technique that combines Q-learning with deep neural networks to enable agents to learn how to make optimal decisions from high-dimensional sensory inputs.

Generality: 853

Move 37
2016

Move 37

Pivotal move made by AlphaGo in its second game against Go champion Lee Sedol, which showcased the superior strategic capabilities of AI in the game of Go.

Generality: 140

RLHF (Reinforcement Learning from Human Feedback)
2016

RLHF
Reinforcement Learning from Human Feedback

Technique that combines reinforcement learning (RL) with human feedback to guide the learning process towards desired outcomes.

Generality: 625

Sample Efficiency
2016

Sample Efficiency

Ability of a ML model to achieve high performance with a relatively small number of training samples.

Generality: 815

PPO (Proximal Policy Optimization)
2017

PPO
Proximal Policy Optimization

RL algorithm that aims to balance ease of implementation, sample efficiency, and reliable performance by using a simpler but effective update method for policy optimization.

Generality: 670

Text-to-Code Model
2021

Text-to-Code Model

AI models designed to translate natural language descriptions into executable code snippets, facilitating automation in software development and assisting developers.

Generality: 665