xLSTM
Extended form of Long Short-Term Memory (LSTM), integrating enhancements for scalability and efficiency in DL models.
xLSTM, or Extended Long Short-Term Memory, adapts the traditional LSTM architecture to include new gating mechanisms and memory structures, specifically designed to address the limitations of LSTMs in handling large-scale data. This model incorporates exponential gating and stabilization techniques, alongside a modified memory system that can be fully parallelized. These innovations allow the xLSTM to compete favorably with more recent architectures like Transformers, especially in applications involving large language models where scalability and performance are crucial.
The concept of xLSTM was introduced in 2024, as part of efforts to scale LSTMs to the realm of billions of parameters, a domain typically dominated by Transformers. This development reflects the ongoing evolution in deep learning architectures aimed at improving efficiency and model capacity.
The xLSTM model was developed by a team including Maximilian Beck, Korbinian Pöppel, and Sepp Hochreiter, among others. Sepp Hochreiter is notably one of the co-inventors of the original LSTM, lending significant historical continuity and expertise to this advancement.