Last updated: April 5, 2026 · Safety & Alignment · by Daniel Ashford
What is Alignment?
The challenge of making AI systems behave in accordance with human values.
Definition
Alignment refers to ensuring AI systems behave in accordance with human values, intentions, and expectations. An aligned model consistently produces helpful, honest, and harmless outputs — even in edge cases.
How It Works
Current techniques include RLHF, Constitutional AI, red-teaming, and scalable oversight. The LLM Judge Index safety dimension partially measures alignment quality. Alignment is considered one of the most important problems in AI safety.
Example
A well-aligned model refuses to help write a phishing email and explains why phishing is harmful. A poorly aligned model might comply.
Related Terms
See How Models Compare
Understanding alignment is important when choosing the right AI model. See how 12 models compare on our leaderboard.