Dimension
Number of features or attributes that represent a data point in a vector space.
The dimension of an embedding is a crucial concept in machine learning, specifically in the handling of large and complex datasets. It represents the size of the vector space in which data points are mapped. Each dimension corresponds to a feature that can capture a specific aspect of the data, and higher dimensions typically allow for a more nuanced representation of data relationships. However, higher-dimensional embeddings can also lead to challenges such as the "curse of dimensionality," where the volume of the space increases so much that the available data becomes sparse, making it difficult to train models effectively without overfitting.
The concept of dimension in the context of embeddings became particularly significant with the rise of vector space models in natural language processing and information retrieval, gaining prominence in the early 2000s. Techniques like latent semantic analysis (LSA) and later methods such as word embeddings (Word2Vec, GloVe) highlighted the importance of dimensionality in effectively capturing semantic and syntactic meanings of words.
Prominent figures in the development of embedding dimensions include Thomas K. Landauer and Susan T. Dumais, who were instrumental in developing latent semantic analysis in the 1990s. More recently, researchers like Tomas Mikolov at Google developed Word2Vec, further advancing the practical applications of embeddings in AI.