Encoder-Decoder Transformer

An Encoder-Decoder Transformer is a deep learning model that uses the concept of self-attention, where it learns to pay attention to specific words in the input sequence, and then generates an output sequence based on that. This structure is widely applied to NLP for problems like translation, summarization, and generation. The transformer architecture replaces the use of traditional sequential processing of recurrent neural networks with attention mechanisms, allowing parallelization and managing long-distance dependencies in sentences more effectively.

Proposed in the paper "Attention is All You Need" by Vaswani et al., in 2017, the Encoder-Decoder Transformer model has since become a mainstay in NLP. Its paradigms and principles have given birth to advanced models like BERT and GPT, which are driving the recent AI revolution in automation and language understanding tasks.

The group led by Ashish Vaswani at Google Brain first introduced this model. Notable researchers in this group also include Noam Shazeer, Niki Parmar, and Jakob Uszkoreit. It is their work that has set the path for successive progress in NLP tasks.

Encoder-Decoder Transformer

Key Contributors

Newsletter

Academic Papers

An analysis of encoder representations in transformer-based machine translation

Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers

A comparison of transformer and lstm encoder decoder models for asr

Tied transformers: Neural machine translation with shared encoder and decoder

Multi-encoder-decoder transformer for code-switching speech recognition