Noam Shazeer

(10 articles)

2017

Attention Masking

Technique used in models based on transformers, where it manipulates the handling of sequence order and irrelevant elements in ML tasks.

Generality: 645

2017

Attention Projection Matrix

Matrix used in attention mechanisms within neural networks, particularly in transformer models, to project input vectors into query, key, and value vectors.

Generality: 625

2017

Attention Block

Core component in neural networks, particularly in transformers, designed to selectively focus on the most relevant parts of an input sequence when making predictions.

Generality: 835

2017

Positional Encoding

Technique used in neural network models, especially in transformers, to inject information about the order of tokens in the input sequence.

Generality: 762

2017

Encoder-Decoder Transformer

A structure used in NLP for understanding and generating language by encoding input and decoding the output.

Generality: 775

2017

Cross-Attention

Mechanism in neural networks that allows the model to weigh and integrate information from different input sources dynamically.

Generality: 675

2017

Masking

Technique used in NLP models to prevent future input tokens from influencing the prediction of current tokens.

Generality: 639

2017

Multi-headed Attention

Mechanism in neural networks that allows the model to jointly attend to information from different representation subspaces at different positions.

Generality: 801

2017

Self-Attention

Mechanism in neural networks that allows models to weigh the importance of different parts of the input data differently.

Generality: 800

2018

MLM
Masked-Language Modeling

Training technique where random words in a sentence are replaced with a special token, and the model learns to predict these masked words based on their context.

Generality: 735

Noam Shazeer

Attention Masking

Attention Projection Matrix

Attention Block

Positional Encoding

Encoder-Decoder Transformer

Cross-Attention

Masking

Multi-headed Attention

Self-Attention

MLMMasked-Language Modeling

MLM
Masked-Language Modeling