PerTh Multimodal: Enterprise Multimodal Watermarking for Audio, Video, Image, and Text
Blog post from Resemble AI
PerTh Multimodal is a comprehensive watermarking solution designed to address the increasing need for robust, imperceptible watermarking across audio, video, image, and text mediums, in response to regulatory requirements like the EU AI Act. Originally launched as an audio-only watermarker, PerTh has evolved to a multimodal platform with enhanced capabilities, achieving near-perfect accuracy in media watermark recovery. It embeds tamper-resistant signatures using modality-specific encoding techniques, such as psychoacoustic methods for audio and pixel-level modifications for images and videos, while linguistic rewriting is used for text watermarking. The new version rectifies limitations of the original PerTh audio model through a novel training attack system and curriculum design, incorporating diverse data augmentations to ensure robustness against real-world transformations. PerTh's integration capabilities are bolstered by ONNX support for edge deployment, ensuring interoperability and compliance with legal mandates, while offering detailed threshold settings to balance detection rates and false positives. The platform also complements metadata solutions like C2PA for dual-layer provenance, ensuring content integrity even after re-encoding or format changes. Despite its advancements, PerTh's explicit watermarking model requires further evaluation in non-speech audio and real-world settings to fully understand its potential and limitations.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Real-time | 2 | 5,457 | 1,338 | 238 | -5% |