Last updated: April 5, 2026 · Model Architecture · by Daniel Ashford
What is Transformer?
The neural network architecture behind all modern LLMs.
Definition
The Transformer is the neural network architecture that powers virtually all modern language models. Introduced in 2017, it replaced previous sequence models with self-attention that can process all tokens simultaneously rather than one at a time.
How It Works
The key innovation is the attention mechanism, which allows each token to "attend to" every other token in the input, learning which parts of the text are most relevant. This parallel processing enables massive speedups during training and captures long-range dependencies. Modern LLMs use decoder-only Transformer variants with modifications like rotary position embeddings and grouped query attention.
Example
When processing "The cat sat on the mat because it was tired," the Transformer attention mechanism helps the model understand that "it" refers to "the cat" — not "the mat."
Related Terms
See How Models Compare
Understanding transformer is important when choosing the right AI model. See how 12 models compare on our leaderboard.