Initialization

Process of setting the initial values of the parameters (weights and biases) of a model before training begins.
 

Initialization is a critical step in training neural networks as it can significantly impact the convergence and performance of the model. Proper initialization methods help in achieving a balance that avoids vanishing and exploding gradients, common issues that impede the learning process in deep networks. These initial values provide the starting point for optimization algorithms like gradient descent to begin iteratively updating the model's parameters towards minimizing the loss function.

Historical Overview: The concept of initialization became particularly significant with the rise of deep learning in the early 2000s. Methods like Xavier initialization and He initialization were developed around 2010 and later, specifically addressing the needs of deep neural networks.

Key Contributors: Xavier Glorot and Yoshua Bengio introduced the "Xavier initialization" in their 2010 paper, which was a breakthrough in setting weights in a manner that the variance remains the same across each layer's outputs. This was followed by Kaiming He et al., who proposed the "He initialization" method in 2015, optimized for layers with ReLU activation functions. These methodologies have been fundamental in enabling deeper networks to be trained more effectively and efficiently.