PerTh Multimodal: Enterprise Multimodal Watermarking for Audio, Video, Image, and Text

Post Details

Company

Resemble AI

Date Published

June 24, 2026

Author

-

Word Count

2,402

Company Posts That Month

11

Language

English

Hacker News Points

-

Source URL

www.resemble.ai/resources/perth-multimodal-enterprise-multimodal-watermarking-for-audio-video-image-and-text

Summary

PerTh Multimodal is a comprehensive watermarking solution designed to address the increasing need for robust, imperceptible watermarking across audio, video, image, and text mediums, in response to regulatory requirements like the EU AI Act. Originally launched as an audio-only watermarker, PerTh has evolved to a multimodal platform with enhanced capabilities, achieving near-perfect accuracy in media watermark recovery. It embeds tamper-resistant signatures using modality-specific encoding techniques, such as psychoacoustic methods for audio and pixel-level modifications for images and videos, while linguistic rewriting is used for text watermarking. The new version rectifies limitations of the original PerTh audio model through a novel training attack system and curriculum design, incorporating diverse data augmentations to ensure robustness against real-world transformations. PerTh's integration capabilities are bolstered by ONNX support for edge deployment, ensuring interoperability and compliance with legal mandates, while offering detailed threshold settings to balance detection rates and false positives. The platform also complements metadata solutions like C2PA for dual-layer provenance, ensuring content integrity even after re-encoding or format changes. Despite its advancements, PerTh's explicit watermarking model requires further evaluation in non-speech audio and real-world settings to fully understand its potential and limitations.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	2	5,457	1,338	238	-5%