Last updated: April 5, 2026 · Prompting & Usage · by Daniel Ashford

What is Max Tokens?

QUICK ANSWER

An API parameter that limits how long the model response can be.

Definition

Max tokens sets the maximum number of tokens the model can generate. Once reached, generation stops — even mid-sentence. Important for cost control and response sizing.

How It Works

If set too low, responses may be cut off. If too high, the model may be unnecessarily verbose. A more effective approach is instructing the model for brevity in the prompt, which produces complete responses within desired length.

Example

Setting max_tokens=100 for a chatbot ensures concise responses. But a complex answer needing 150 tokens will be cut off at 100.

Related Terms

Tokens

The basic units of text that LLMs process — roughly 3/4 of a word.

Output Tokens

The tokens the model generates in its response — the most expensive part of API usage.

LLM API Pricing

The cost of using language models, typically measured in dollars per million tokens.

API (Application Programming Interface)

The technical interface that lets your software send prompts to an LLM and receive responses.

See How Models Compare

Understanding max tokens is important when choosing the right AI model. See how 12 models compare on our leaderboard.

View Leaderboard →Our Methodology

← Browse all 47 glossary terms

Daniel Ashford

Founder & Lead Evaluator · 200+ models evaluated