Noam Shazeer
(10 articles)
Attention Masking
Technique used in models based on transformers, where it manipulates the handling of sequence order and irrelevant elements in ML tasks.
Generality: 645

Attention Projection Matrix
Matrix used in attention mechanisms within neural networks, particularly in transformer models, to project input vectors into query, key, and value vectors.
Generality: 625

Attention Block
Core component in neural networks, particularly in transformers, designed to selectively focus on the most relevant parts of an input sequence when making predictions.
Generality: 835

Positional Encoding
Technique used in neural network models, especially in transformers, to inject information about the order of tokens in the input sequence.
Generality: 762

Encoder-Decoder Transformer
A structure used in NLP for understanding and generating language by encoding input and decoding the output.
Generality: 775

Cross-Attention
Mechanism in neural networks that allows the model to weigh and integrate information from different input sources dynamically.
Generality: 675

Masking
Technique used in NLP models to prevent future input tokens from influencing the prediction of current tokens.
Generality: 639

Multi-headed Attention
Mechanism in neural networks that allows the model to jointly attend to information from different representation subspaces at different positions.
Generality: 801

Self-Attention
Mechanism in neural networks that allows models to weigh the importance of different parts of the input data differently.
Generality: 800

MLM
Masked-Language Modeling
Masked-Language Modeling
Training technique where random words in a sentence are replaced with a special token, and the model learns to predict these masked words based on their context.
Generality: 735