Paul Christiano
(6 articles)2000
Control Problem
Challenge of ensuring that highly advanced AI systems act in alignment with human values and intentions.
Generality: 845
2000
AI Safety
Field of research aimed at ensuring AI technologies are beneficial and do not pose harm to humanity.
Generality: 870
2016
AI Failure Modes
Diverse scenarios where AI systems do not perform as expected or generate unintended consequences.
Generality: 714
2016
RLHF
Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback
Technique that combines reinforcement learning (RL) with human feedback to guide the learning process towards desired outcomes.
Generality: 625
2016
Alignment
Process of ensuring that an AI system's goals and behaviors are consistent with human values and ethics.
Generality: 790
2022
PDoom
Probability of an existential catastrophe, often discussed within the context of AI safety and risk assessment.
Generality: 550