Ashish Vaswani
(10 articles)data:image/s3,"s3://crabby-images/eb88a/eb88a243368e584d4a41219d9922493f5b603d2f" alt="Attention Masking"
Attention Masking
Technique used in models based on transformers, where it manipulates the handling of sequence order and irrelevant elements in ML tasks.
Generality: 645
data:image/s3,"s3://crabby-images/51ef8/51ef858c988e9106832b40bfc72502ae8ef04ee7" alt="Attention Projection Matrix"
Attention Projection Matrix
Matrix used in attention mechanisms within neural networks, particularly in transformer models, to project input vectors into query, key, and value vectors.
Generality: 625
data:image/s3,"s3://crabby-images/475f7/475f76ac141310210687637a7fcd65d59cc43eec" alt="Attention Block"
Attention Block
Core component in neural networks, particularly in transformers, designed to selectively focus on the most relevant parts of an input sequence when making predictions.
Generality: 835
data:image/s3,"s3://crabby-images/6a4c4/6a4c4a2d08937a1b8f940949807f5a5b27518edb" alt="Positional Encoding"
Positional Encoding
Technique used in neural network models, especially in transformers, to inject information about the order of tokens in the input sequence.
Generality: 762
data:image/s3,"s3://crabby-images/91fac/91fac5622e50d901637c1c663a582de87b40c650" alt="Encoder-Decoder Transformer"
Encoder-Decoder Transformer
A structure used in NLP for understanding and generating language by encoding input and decoding the output.
Generality: 775
data:image/s3,"s3://crabby-images/cc68c/cc68c05657a7233d98c088eb987b7f26148f9821" alt="Cross-Attention"
Cross-Attention
Mechanism in neural networks that allows the model to weigh and integrate information from different input sources dynamically.
Generality: 675
data:image/s3,"s3://crabby-images/9ef70/9ef705832258e1788d04752570138b6ceae9220d" alt="Masking"
Masking
Technique used in NLP models to prevent future input tokens from influencing the prediction of current tokens.
Generality: 639
data:image/s3,"s3://crabby-images/59d02/59d0254c1d5052f2a43476afa952b29882ef69ef" alt="Multi-headed Attention"
Multi-headed Attention
Mechanism in neural networks that allows the model to jointly attend to information from different representation subspaces at different positions.
Generality: 801
data:image/s3,"s3://crabby-images/3b01b/3b01b054c9b7a8fb9383b31fce631020bf2bd72f" alt="Self-Attention"
Self-Attention
Mechanism in neural networks that allows models to weigh the importance of different parts of the input data differently.
Generality: 800
data:image/s3,"s3://crabby-images/8a786/8a786a728f6c3e4f4d1c0a6024f679c96a402726" alt="MLM (Masked-Language Modeling)"
MLM
Masked-Language Modeling
Masked-Language Modeling
Training technique where random words in a sentence are replaced with a special token, and the model learns to predict these masked words based on their context.
Generality: 735