Computer Science & AI · 2017
Attention Is All You Need
Ashish Vaswani, et al. (Google Brain / Google Research)
Overview
The paper that introduced the Transformer, an architecture built entirely on attention mechanisms with no recurrence or convolution. It became the foundation of modern large language models such as GPT and BERT.
The architecture behind the large-language-model revolution.
Key findings
Methods
An encoder–decoder neural network using multi-head self-attention and position encodings, evaluated on machine-translation benchmarks against recurrent and convolutional baselines.
Keywords
Related papers
Computer Science & AI
Highly Accurate Protein Structure Prediction with AlphaFold
AlphaFold solved a 50-year grand challenge in biology: predicting a protein's three-dimensional shape from its amino-acid sequence. Its accuracy approached that of experimental methods, transforming structural biology.
Computer Science & AI
Deep Learning
A landmark review by three pioneers of the field, synthesising how deep neural networks learn layered representations of data. It explains why depth works and surveys the advances that made deep learning dominant across vision, speech, and language.
Computer Science & AI
Generative Adversarial Nets
Goodfellow and colleagues introduced generative adversarial networks (GANs), a way to train generative models by pitting two networks against each other — one creating fake data, the other trying to detect it — until the fakes are indistinguishable from real data.