Transformer (architecture)

Neural sequence model built mainly on attention: parallelizable and foundational for LLMs since "Attention Is All You Need" (2017).

The Transformer architecture replaced many recurrent motifs with self-attention: each token attends to others to build contextual vectors. Encoder-only, decoder-only, encoder–decoder hybrids exist (BERT, GPT-style causally etc.).

Nearly all contemporary large language models (LLMs) use transformers at scale for natural language processing (NLP): pretrained on token corpora via training, then optionally fine-tuned for chatbots or tools before inference at run time.