Hyperdimensional Multimedia Perception and Frontier Security

Faculty of Applied Sciences, Macao Polytechnic University

RNPM: Neural-Guided Embedding Region Selection and Error Correction for Robust Audio Multi-Watermarking


Journal article


Qiutong Li, Tong Liu, Xiaochen Yuan
IEEE Transactions on Audio, Speech and Language Processing, 2025, pp. 1-12


Link
Cite

Cite

APA   Click to copy
Li, Q., Liu, T., & Yuan, X. (2025). RNPM: Neural-Guided Embedding Region Selection and Error Correction for Robust Audio Multi-Watermarking. IEEE Transactions on Audio, Speech and Language Processing, 1–12. https://doi.org/10.1109/TASLPRO.2025.3624964


Chicago/Turabian   Click to copy
Li, Qiutong, Tong Liu, and Xiaochen Yuan. “RNPM: Neural-Guided Embedding Region Selection and Error Correction for Robust Audio Multi-Watermarking.” IEEE Transactions on Audio, Speech and Language Processing (2025): 1–12.


MLA   Click to copy
Li, Qiutong, et al. “RNPM: Neural-Guided Embedding Region Selection and Error Correction for Robust Audio Multi-Watermarking.” IEEE Transactions on Audio, Speech and Language Processing, 2025, pp. 1–12, doi:10.1109/TASLPRO.2025.3624964.


BibTeX   Click to copy

@article{li2025a,
  title = {RNPM: Neural-Guided Embedding Region Selection and Error Correction for Robust Audio Multi-Watermarking},
  year = {2025},
  journal = {IEEE Transactions on Audio, Speech and Language Processing},
  pages = {1-12},
  doi = {10.1109/TASLPRO.2025.3624964},
  author = {Li, Qiutong and Liu, Tong and Yuan, Xiaochen}
}

Abstract: Robust audio watermarking plays a crucial role in copyright protection; however, existing techniques suffer from low embedding capacity and limited robustness under severe signal distortions. To solve these limitations, this paper proposes a Robust Neural-Guided Parallel Multi-Watermarking (RNPM) scheme. In the RNPM, we propose a U-Net-Based Embedding Region Selection (ERSU-Net) module to accurately locate multiple embedding regions based on robustness characteristics. To better exploit the intrinsic frequency and energy distribution of audio signals, the ERSU-Net module is enhanced with dual-attention modules, thereby improving the robustness. After determining the embedding regions, they are segmented into multiple overlapping frames to facilitate embedding. To further enhance embedding capacity without compromising robustness, the proposed RNPM integrates Discrete Cosine Transform (DCT) and inter-frame difference-based embedding with Gram–Schmidt orthogonalization, enabling parallel multi-watermark embedding. Furthermore, to mitigate extraction errors caused by signal distortion, an error correction mechanism is integrated with the localized embedding regions, improving overall extraction reliability. Experimental results demonstrate that the proposed RNPM achieves superior robustness and inaudibility. In particular, RNPM maintains high robustness with a Bit Error Rate (BER) value of 0 under 20% cropping, MP3 compression at 64 kbps, and 22.5 kHz resampling attacks, surpassing existing state-of-the-art methods.