Last updated: April 5, 2026 · Pricing & Deployment · by Daniel Ashford

What is Streaming?

QUICK ANSWER

Receiving the model response word-by-word in real-time instead of waiting for the full answer.

Definition

Streaming delivers the model response token-by-token as they are generated, rather than waiting for the complete response. This dramatically improves perceived latency.

How It Works

Without streaming, a 10-second response shows nothing for 10 seconds, then appears all at once. With streaming, the first token appears within TTFT (0.4-2.1s), and subsequent tokens flow in at 50-150+ tokens per second. Implemented via Server-Sent Events (SSE).

Example

When using Claude in a chat interface, you see the response typed out word by word — that is streaming.

Related Terms

Latency

How long it takes to receive the first token of a response.

API (Application Programming Interface)

The technical interface that lets your software send prompts to an LLM and receive responses.

Inference

The process of an LLM generating a response to your input.

See How Models Compare

Understanding streaming is important when choosing the right AI model. See how 12 models compare on our leaderboard.

View Leaderboard →Our Methodology

← Browse all 47 glossary terms

Daniel Ashford

Founder & Lead Evaluator · 200+ models evaluated