Model Compression

Model compression addresses the challenge of deploying large, resource-intensive machine learning models in environments with limited computational power or memory, such as mobile devices or edge computing devices. It involves various strategies, including quantization, pruning, knowledge distillation, and the use of more efficient architectures, to reduce the model's size and computational demands. These techniques allow for the deployment of powerful AI models in a wider range of devices, improving accessibility and functionality in real-world applications, such as mobile apps, IoT devices, and embedded systems.

Model compression techniques have been researched and developed over the past decade, with significant interest growing around 2010 as deep learning models began to dramatically increase in size and complexity.

Many researchers and organizations have contributed to the field of model compression. Notably, Geoffrey Hinton and his students introduced the concept of knowledge distillation in 2015, which has become a cornerstone technique in model compression. Other significant contributions have come from academic institutions and technology companies worldwide, continuously evolving the methods and applications of compressed models.

Model Compression

Key Contributors

Newsletter

Academic Papers

Amc: Automl for model compression and acceleration on mobile devices

Model compression and hardware acceleration for neural networks: A comprehensive survey

Model compression and acceleration for deep neural networks: The principles, progress, and challenges

A comprehensive survey on model compression and acceleration

An energy efficient IoT data compression approach for edge machine learning