Hyperdimensional Multimedia Perception and Frontier Security

Journal article

Qiutong Li, Tong Liu, Xiaochen Yuan
IEEE Transactions on Audio, Speech and Language Processing, 2025, pp. 1-12

DOI: 10.1109/TASLPRO.2025.3624964

Link

Cite

APA Click to copy
Li, Q., Liu, T., & Yuan, X. (2025). RNPM: Neural-Guided Embedding Region Selection and Error Correction for Robust Audio Multi-Watermarking. IEEE Transactions on Audio, Speech and Language Processing, 1–12. https://doi.org/10.1109/TASLPRO.2025.3624964

Chicago/Turabian Click to copy
Li, Qiutong, Tong Liu, and Xiaochen Yuan. “RNPM: Neural-Guided Embedding Region Selection and Error Correction for Robust Audio Multi-Watermarking.” IEEE Transactions on Audio, Speech and Language Processing (2025): 1–12.

MLA Click to copy
Li, Qiutong, et al. “RNPM: Neural-Guided Embedding Region Selection and Error Correction for Robust Audio Multi-Watermarking.” IEEE Transactions on Audio, Speech and Language Processing, 2025, pp. 1–12, doi:10.1109/TASLPRO.2025.3624964.

BibTeX Click to copy

@article{li2025a,
  title = {RNPM: Neural-Guided Embedding Region Selection and Error Correction for Robust Audio Multi-Watermarking},
  year = {2025},
  journal = {IEEE Transactions on Audio, Speech and Language Processing},
  pages = {1-12},
  doi = {10.1109/TASLPRO.2025.3624964},
  author = {Li, Qiutong and Liu, Tong and Yuan, Xiaochen}
}

[Picture]

Abstract: Robust audio watermarking plays a crucial role in copyright protection; however, existing techniques suffer from low embedding capacity and limited robustness under severe signal distortions. To solve these limitations, this paper proposes a Robust Neural-Guided Parallel Multi-Watermarking (RNPM) scheme. In the RNPM, we propose a U-Net-Based Embedding Region Selection (ERSU-Net) module to accurately locate multiple embedding regions based on robustness characteristics. To better exploit the intrinsic frequency and energy distribution of audio signals, the ERSU-Net module is enhanced with dual-attention modules, thereby improving the robustness. After determining the embedding regions, they are segmented into multiple overlapping frames to facilitate embedding. To further enhance embedding capacity without compromising robustness, the proposed RNPM integrates Discrete Cosine Transform (DCT) and inter-frame difference-based embedding with Gram–Schmidt orthogonalization, enabling parallel multi-watermark embedding. Furthermore, to mitigate extraction errors caused by signal distortion, an error correction mechanism is integrated with the localized embedding regions, improving overall extraction reliability. Experimental results demonstrate that the proposed RNPM achieves superior robustness and inaudibility. In particular, RNPM maintains high robustness with a Bit Error Rate (BER) value of 0 under 20% cropping, MP3 compression at 64 kbps, and 22.5 kHz resampling attacks, surpassing existing state-of-the-art methods.