Hyperdimensional Multimedia Perception and Frontier Security

Journal article

Jiahao Huang, Xiaochen Yuan, Chan-Tong Lam, Sio-Kei Im, Fangyuan Lei, Xiuli Bi
IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, 2025, pp. 9261-9275

DOI: 10.1109/TCSVT.2025.3559624

Link

Cite

APA Click to copy
Huang, J., Yuan, X., Lam, C.-T., Im, S.-K., Lei, F., & Bi, X. (2025). TransHFC: Joints Hypergraph Filtering Convolution and Transformer Framework for Temporal Forgery Localization. IEEE Transactions on Circuits and Systems for Video Technology, 35, 9261–9275. https://doi.org/10.1109/TCSVT.2025.3559624

Chicago/Turabian Click to copy
Huang, Jiahao, Xiaochen Yuan, Chan-Tong Lam, Sio-Kei Im, Fangyuan Lei, and Xiuli Bi. “TransHFC: Joints Hypergraph Filtering Convolution and Transformer Framework for Temporal Forgery Localization.” IEEE Transactions on Circuits and Systems for Video Technology 35 (2025): 9261–9275.

MLA Click to copy
Huang, Jiahao, et al. “TransHFC: Joints Hypergraph Filtering Convolution and Transformer Framework for Temporal Forgery Localization.” IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, 2025, pp. 9261–75, doi:10.1109/TCSVT.2025.3559624.

BibTeX Click to copy

@article{huang2025a,
  title = {TransHFC: Joints Hypergraph Filtering Convolution and Transformer Framework for Temporal Forgery Localization},
  year = {2025},
  journal = {IEEE Transactions on Circuits and Systems for Video Technology},
  pages = {9261-9275},
  volume = {35},
  doi = {10.1109/TCSVT.2025.3559624},
  author = {Huang, Jiahao and Yuan, Xiaochen and Lam, Chan-Tong and Im, Sio-Kei and Lei, Fangyuan and Bi, Xiuli}
}

[Picture]

Overview of the proposed TransHFC.

Abstract: The authenticity of audio-visual content is being challenged by advanced multimedia editing technologies inspired by Artificial Intelligence-Generated Content (AIGC). Temporal forgery localization aims to detect suspicious contents by locating forged segments. So far, most of the existing methods are based on Convolutional Neural Networks (CNNs) or Transformers, yet neither of them has fully considered the complex relationships within forged audio-visual content. To address this issue, in this paper, we propose a novel method, named TransHFC, which innovatively introduces hypergraphs to model group relationships among segments while considering point-to-point relationships through Transformers. Through its dual hypergraph filtering convolution branch, TransHFC captures both temporal and spatial level group relationships, enhancing the representation of forged segment features. Furthermore, we propose a new hypergraph filtering convolution Auto-Encoder that uses a multi-frequency filter bank for adaptive signal capture. This design compensates for the limitation of a single hypergraph filter. Our extensive experiments on Lav-DF, TVIL, Psynd, and HAD datasets demonstrate that TransHFC achieves state-of-the-art performance.