Last updated: April 5, 2026 · Safety & Alignment · by Daniel Ashford
What is Constitutional AI?
Anthropic approach to safety that trains models using written principles rather than solely human ratings.
Definition
Constitutional AI (CAI) is an alignment methodology where a model is trained to follow written principles (a "constitution") rather than relying solely on human feedback for every decision.
How It Works
Two phases: (1) Self-critique — the model generates, critiques against principles, and revises responses. (2) RLAIF — a model trained on the constitution scores outputs, replacing human reward models. This scales better than RLHF and makes values more transparent. Claude models use Constitutional AI.
Example
A CAI principle: "Choose the response that is most helpful while being least likely to cause harm. If a request could be interpreted as harmful or benign, assume the benign interpretation."
Related Terms
See How Models Compare
Understanding constitutional ai is important when choosing the right AI model. See how 12 models compare on our leaderboard.