Last updated: April 5, 2026 · Pricing & Deployment · by Daniel Ashford
What is Streaming?
Receiving the model response word-by-word in real-time instead of waiting for the full answer.
Definition
Streaming delivers the model response token-by-token as they are generated, rather than waiting for the complete response. This dramatically improves perceived latency.
How It Works
Without streaming, a 10-second response shows nothing for 10 seconds, then appears all at once. With streaming, the first token appears within TTFT (0.4-2.1s), and subsequent tokens flow in at 50-150+ tokens per second. Implemented via Server-Sent Events (SSE).
Example
When using Claude in a chat interface, you see the response typed out word by word — that is streaming.
Related Terms
See How Models Compare
Understanding streaming is important when choosing the right AI model. See how 12 models compare on our leaderboard.