← AI Glossary

What Is a Transformer in AI?

The transformer is the neural network architecture that powers every modern large language model — including GPT, Claude, Gemini, and Llama.

The Key Innovation: Attention

Introduced in the 2017 paper “Attention Is All You Need,” the transformer replaced older sequential architectures (RNNs, LSTMs) with a mechanism called self-attention.

Self-attention lets the model look at all words in a sentence simultaneously and learn which words are related to each other, regardless of distance:

“The cat sat on the mat because it was tired.”

A transformer understands that “it” refers to “the cat” — not “the mat” — by computing attention scores between all word pairs.

Why Transformers Won

  • Parallelization: Unlike RNNs, transformers process all tokens at once, making them much faster to train on GPUs.
  • Scaling: Performance improves predictably as you add more parameters and data.
  • Versatility: The same architecture works for text, code, images, audio, and video.

Transformer Variants

TypeUsed ForExamples
Decoder-onlyText generationGPT, Claude, Llama
Encoder-onlyText understandingBERT, RoBERTa
Encoder-decoderTranslation, summarizationT5, BART

Modern LLMs are almost exclusively decoder-only transformers — trained to predict the next token in a sequence.

Elvean brings all these concepts together in one native Mac app — local models, cloud APIs, agentic tools, and more.

Learn more about Elvean