Background
The rapid development of the Internet has greatly facilitated the distribution of digital content, but it has also exacerbated unauthorized reproduction and copyright infringement. These problems have raised significant concerns about intellectual property protection. In response to these concerns, robust image/audio watermarking has emerged, combing with deep leaning techniques, which allows the embedding of information into digital content for ownership verification, piracy tracing, and copyright enforcement.
Current research directions
Traditional-based Watermarking: Traditional methods mainly embed watermarks in transform domains, including DCT, DWT, FFT, SVD. They rely on hand-crafted rules, HVS modeling, and energy distribution to keep invisibility and basic robustness.
Machine Learning-based Watermarking: Machine learning-based watermarking can jointly train embedding and extraction end-to-end, and learn optimal frequency–spatial mapping automatically. CNN / ViT / diffusion-based designs show stronger robustness under complex real-world distortions, platform re-encoding and adaptive attacks.
Challenges
Trade-off between Robustness and Imperceptibility: Traditional methods rely on handcrafted embedding rules, and increasing embedding strength inevitably leads to visible artifacts or structural distortion. Machine learning-based watermarking improves imperceptibility via learned feature spaces, but robustness still heavily drops when preserving high visual quality.
Limited Embedding Capacity: Capacity is fundamentally constrained in traditional frequency-domain approaches because only limited coefficient bands can be safely modified. Machine learning-based approaches attempt to boost capacity through semantic or feature-level embedding, but they still suffer severe degradation when attacks disrupt learned semantic alignment.
Combined and Unknown Geometric Attacks: Traditional synchronization templates are usually designed for single known geometric transforms, but fail under compound or unknown transformations. Machine learning-based extraction improves tolerance, yet still collapses under real-world mixed distortions that differ from synthetic training distributions.
Adversarial Vulnerability: Traditional watermarking is vulnerable to intentionally optimized distortions once the embedding rules are known. Machine learning-based watermarking becomes even more fragile where small adversarial perturbations can directly break extraction networks without perceptible quality loss.
Demo
Watermark Embedding Procedure
Related Publications
Qiutong Li, Tong Liu, Xiaochen Yuan
IEEE Transactions on Audio, Speech and Language Processing, 2025, pp. 1-12