Last updated: April 5, 2026 · Pricing & Deployment · by Daniel Ashford

What is Prompt Caching?

QUICK ANSWER

A feature that reduces costs up to 90% by reusing previously processed system prompts.

Definition

Prompt caching stores the processed representation of static prompt components so they do not need to be recomputed on every request. This can reduce input token costs by up to 90%.

How It Works

The provider saves the computed state of your static prompt prefix. Subsequent requests with the same prefix reuse the cached computation. The cache has a TTL of 5-60 minutes. Even small changes to the static portion break the cache.

Example

Your chatbot sends the same 2,000-token system prompt every message. With caching, subsequent requests charge only 10% for those tokens.

Related Terms

LLM API Pricing
The cost of using language models, typically measured in dollars per million tokens.
Input Tokens
The tokens in your prompt that the model reads — cheaper than output tokens.
System Prompt
Persistent instructions that define how the model should behave.
API (Application Programming Interface)
The technical interface that lets your software send prompts to an LLM and receive responses.

See How Models Compare

Understanding prompt caching is important when choosing the right AI model. See how 12 models compare on our leaderboard.

View Leaderboard →Our Methodology
← Browse all 47 glossary terms
DA
Daniel Ashford
Founder & Lead Evaluator · 200+ models evaluated