ImageNet contains naturally occurring NeuralHash collisions
Blog post from Roboflow
Apple's NeuralHash, a perceptual hashing model designed for detecting child sexual abuse material (CSAM), has raised concerns due to potential false positives and vulnerabilities to adversarial attacks. The algorithm generates a 96-bit hash for images, which should ideally match only when images are nearly identical. However, testing revealed instances of distinct images sharing the same hash, both naturally and artificially, highlighting the limitations of a 96-bit space that can inevitably lead to collisions. Despite Apple's claim of a robust system with a low false-positive rate, researchers have found real-world examples of collisions, suggesting that the rate might be slightly higher than reported. Additionally, the possibility of artificially creating images that match CSAM hashes raises further security and privacy concerns. The opacity in the process of adding images to the CSAM database also poses potential risks, as it could be exploited by malicious actors for purposes beyond its intended use.