Last updated: April 5, 2026 · Core Concepts · by Daniel Ashford
What is Tokens?
The basic units of text that LLMs process — roughly 3/4 of a word.
Definition
Tokens are the fundamental units of text that language models process. Rather than reading whole words, LLMs break text into smaller pieces called tokens using a process called tokenization. One token is approximately 3-4 characters or roughly 75% of a word in English.
How It Works
Different models use different tokenization methods. GPT models use Byte-Pair Encoding (BPE), while others use SentencePiece or WordPiece. Common words like "the" are usually a single token, while uncommon words may be split into multiple tokens. Token count matters because LLMs have maximum context windows measured in tokens, and API pricing is calculated per token.
Example
"Hello, how are you?" is approximately 6 tokens. A typical page of English text is about 500-700 tokens. GPT-5.3 Codex charges $10 per million input tokens.
Related Terms
See How Models Compare
Understanding tokens is important when choosing the right AI model. See how 12 models compare on our leaderboard.