Temperature

Hyperparameter that controls the randomness of predictions by adjusting the probability distribution of the output classes to make the model's predictions more or less deterministic.
 

In machine learning and specifically in natural language processing, the concept of temperature plays a critical role in controlling the randomness of model predictions. A lower temperature makes the model's output more deterministic, sharpening the probability distribution such that the highest probability outcomes become even more likely, and the lower probability outcomes become less likely. This is particularly useful in tasks like text generation or in reinforcement learning, where controlling the exploration-exploitation trade-off is crucial. Conversely, a higher temperature increases randomness, allowing for more exploration of potential outcomes, which can be beneficial in avoiding local minima during optimization processes like simulated annealing or in generating more diverse outputs in generative models.

The concept of temperature in this context does not have a specific "first use" date as it is a general principle borrowed from statistical mechanics, where it controls the distribution of states in systems. However, its application in machine learning and natural language processing became prominent with the advent of sophisticated models and algorithms, especially deep learning models, from the late 2000s onwards.

Key contributors to the development and application of temperature in AI are difficult to pinpoint due to its widespread use across multiple disciplines and its fundamental nature. However, it's an essential concept in the work of researchers involved in the development of advanced neural networks, optimization algorithms, and generative models, where controlling the randomness and exploration of model behavior is crucial.