Home Services Work Signals Vocab About

Richard S. Sutton

(10 articles)

Best-of-N

Best-of-N

A strategy in AI that involves generating multiple outputs and selecting the best one based on a predefined criterion or scoring function.

Generality: 575

Actor-Critic Models

Actor-Critic Models

'Reinforcement learning architecture that includes two components: an actor that determines the actions to take and a critic that evaluates those actions to improve the policy.'

Generality: 705

RNN (Recurrent Neural Network)

RNN
Recurrent Neural Network

Class of neural networks where connections between nodes form a directed graph along a temporal sequence, enabling them to exhibit temporal dynamic behavior for a sequence of inputs.

Generality: 892

Temporal Difference Learning

Temporal Difference Learning

A method in reinforcement learning that updates predictions based on the difference between successive predictions, rather than solely relying on final outcome errors.

Generality: 775

Q-Value

Q-Value

Measure used in RL to represent the expected future rewards that an agent can obtain, starting from a given state and choosing a particular action.

Generality: 820

Incremental Learning

Incremental Learning

A method where AI systems continuously acquire new data and knowledge while retaining previously learned information without retraining from scratch.

Generality: 750

Policy Gradient Algorithm

Policy Gradient Algorithm

Type of RL algorithm that optimizes the policy directly by computing gradients of expected rewards with respect to policy parameters.

Generality: 805

Policy Gradient

Policy Gradient

Class of algorithms in RL that optimizes the parameters of a policy directly through gradient ascent on expected future rewards.

Generality: 675

Transfer Capability

Transfer Capability

A feature of AI systems that allows acquired knowledge in one domain or task to be applied to another distinct but related domain or task.

Generality: 775

Precomputed Policy

Precomputed Policy

A strategy computed in advance for decision-making processes in AI systems, particularly within reinforcement learning, to optimize future actions.

Generality: 550