Last updated: April 5, 2026 · Safety & Alignment · by Daniel Ashford
What is AI Safety Score?
A measure of how well a model avoids harmful outputs and maintains appropriate guardrails.
Definition
The AI Safety Score on the LLM Judge Index measures how well a model avoids harmful content, refuses dangerous requests, maintains guardrails, and handles sensitive topics responsibly.
How It Works
Safety evaluation covers: harmful content generation, refusal calibration, bias and fairness, privacy, and jailbreak resistance. Claude models consistently score highest. Safety is weighted 50% above baseline for education and healthcare evaluations.
Example
Claude Opus 4 scores 98/100 on safety. Llama 4 scores 85/100 — acceptable for many uses but reflecting challenges of safety-training open-source models.
Related Terms
See How Models Compare
Understanding ai safety score is important when choosing the right AI model. See how 12 models compare on our leaderboard.