Perceptual Hash Algorithm
Generates a unique hash that reflects the visual or auditory similarity of data, such as an image, rather than its exact content.
Perceptual hashing algorithms are designed to create hashes based on the content's characteristics that a human might perceive, such as the appearance of an image, rather than its byte-level composition. Unlike traditional cryptographic hashing (e.g., SHA or MD5) that produces completely different outputs for even the slightest data alteration, perceptual hashing produces similar hashes for similar data, allowing for approximate matching. These algorithms are widely used in tasks like duplicate image detection, content moderation, and copyright infringement detection, where slight variations (like resizing or color adjustments) should not lead to totally different hashes. The algorithms often involve downscaling the image, converting it to grayscale, and extracting key features that capture the essence of the visual information.
The concept of perceptual hashing emerged in the early 2000s, as the need for approximate matching in multimedia content detection grew. The technique gained wider recognition in the mid-2000s, as digital content proliferation required efficient ways to identify similar or derivative works.
One of the early key contributions to the field was the development of pHash (perceptual hash) by Johannes Buchmann and others, which became a popular open-source library for image comparison. Additionally, many advancements came from research in multimedia content retrieval and image processing communities, as the need for efficient algorithms in these areas grew.