Lukasz Kaiser

(6 articles)

Attention Masking

Technique used in models based on transformers, where it manipulates the handling of sequence order and irrelevant elements in ML tasks.

Generality: 645

Technique used in neural network models, especially in transformers, to inject information about the order of tokens in the input sequence.

Generality: 762

A structure used in NLP for understanding and generating language by encoding input and decoding the output.

Generality: 775

Mechanism in neural networks that allows the model to weigh and integrate information from different input sources dynamically.

Generality: 675

Mechanism in neural networks that allows the model to jointly attend to information from different representation subspaces at different positions.

Generality: 801

Mechanism in neural networks that allows models to weigh the importance of different parts of the input data differently.

Generality: 800