Hypersphere-Based Transformer
An improved framework for transformers focused on enhancing efficiency and performance by leveraging hyperspheres.
Hypersphere-Based Transformer is a significant modification to the traditional transformer model used in AI, particularly in Natural Language Processing (NLP). It delivers improvements in efficiency and performance by exploiting the geometric properties of hyperspheres. Rather than relying on the dot product of two vectors to measure similarity (as conventional transformers do), the Hypersphere-Based Transformer uses the cosine of the angle between two vectors on the unit hypersphere. This fundamentally changes the self-attention mechanism of the transformer, leading to more efficient and accurate representations. It also allows the transformer to handle much larger input sequences, resolving one of the prime limitations of traditional transformer models.
The concept of Hypersphere-Based Transformers is relatively recent, emerging in the context of ongoing research and development in transformer models. The concept gained recognition following the 2017 introduction of the transformer model by Vaswani et al., which revolutionized NLP tasks.
The original transformer model was introduced by a team at Google led by Ashish Vaswani. Since then, the innovations and improvements to transformer models including the Hypersphere-Based Transformer have been the result of global collaborative work within the AI research community, with notable contributions coming from both academics and industry professionals.