Deutsch Intern
RTG 2994 Particle physics at colliders in the LHC precision era

Msc. Luigi Di Marino and MSc. Rosy Caliri

Julius-Maximilians-Universität Würzburg

T a l k : 24. April 2025

Foundation models and Tronsformers

The Transformer is the first transduction model to rely entirely on self-attention for computing input and output representations, without using sequence-aligned recurrent neural networks (RNNs). Its architecture has since become the building block of foundation models, one of the latest frontiers in artificial intelligence. Thanks to the large amount of training data and self-supervision, these models are highly efficient, demonstrating remarkable versatility and performance across a wide range of tasks. For this reason, they are becoming appealing in physics research nowadays.
In this talk, we will first present the core components of the Transformer—the encoder and decoder—as introduced in the original paper, with a particular focus on their attention mechanisms. We will then introduce the concept of foundation models, highlighting their main characteristics and capabilities.