Richard S. Sutton
(10 articles)Best-of-N
A strategy in AI that involves generating multiple outputs and selecting the best one based on a predefined criterion or scoring function.
Generality: 575
Actor-Critic Models
'Reinforcement learning architecture that includes two components: an actor that determines the actions to take and a critic that evaluates those actions to improve the policy.'
Generality: 705
RNN
Recurrent Neural Network
Recurrent Neural Network
Class of neural networks where connections between nodes form a directed graph along a temporal sequence, enabling them to exhibit temporal dynamic behavior for a sequence of inputs.
Generality: 892
Temporal Difference Learning
A method in reinforcement learning that updates predictions based on the difference between successive predictions, rather than solely relying on final outcome errors.
Generality: 775
Q-Value
Measure used in RL to represent the expected future rewards that an agent can obtain, starting from a given state and choosing a particular action.
Generality: 820
Incremental Learning
A method where AI systems continuously acquire new data and knowledge while retaining previously learned information without retraining from scratch.
Generality: 750
Policy Gradient Algorithm
Type of RL algorithm that optimizes the policy directly by computing gradients of expected rewards with respect to policy parameters.
Generality: 805
Policy Gradient
Class of algorithms in RL that optimizes the parameters of a policy directly through gradient ascent on expected future rewards.
Generality: 675
Transfer Capability
A feature of AI systems that allows acquired knowledge in one domain or task to be applied to another distinct but related domain or task.
Generality: 775
Precomputed Policy
A strategy computed in advance for decision-making processes in AI systems, particularly within reinforcement learning, to optimize future actions.
Generality: 550