Hyperdimensional Multimedia Perception and Frontier Security

Faculty of Applied Sciences, Macao Polytechnic University

DFFormer: Capturing Dynamic Frequency Features to Locate Image Manipulation through Adaptive Frequency Transformer and Prototype Learning


Journal article


Yan Xiang, Kaiqi Zhao, Zhenghong Yu, Xiaochen Yuan, Guoheng Huang, Jinyu Tian, Jianqing Li
IEEE Transactions on Circuits and Systems for Video Technology, 2025, pp. 1-1


Link Codes
Cite

Cite

APA   Click to copy
Xiang, Y., Zhao, K., Yu, Z., Yuan, X., Huang, G., Tian, J., & Li, J. (2025). DFFormer: Capturing Dynamic Frequency Features to Locate Image Manipulation through Adaptive Frequency Transformer and Prototype Learning. IEEE Transactions on Circuits and Systems for Video Technology, 1–1. https://doi.org/10.1109/TCSVT.2025.3601659


Chicago/Turabian   Click to copy
Xiang, Yan, Kaiqi Zhao, Zhenghong Yu, Xiaochen Yuan, Guoheng Huang, Jinyu Tian, and Jianqing Li. “DFFormer: Capturing Dynamic Frequency Features to Locate Image Manipulation through Adaptive Frequency Transformer and Prototype Learning.” IEEE Transactions on Circuits and Systems for Video Technology (2025): 1–1.


MLA   Click to copy
Xiang, Yan, et al. “DFFormer: Capturing Dynamic Frequency Features to Locate Image Manipulation through Adaptive Frequency Transformer and Prototype Learning.” IEEE Transactions on Circuits and Systems for Video Technology, 2025, pp. 1–1, doi:10.1109/TCSVT.2025.3601659.


BibTeX   Click to copy

@article{xiang2025a,
  title = {DFFormer: Capturing Dynamic Frequency Features to Locate Image Manipulation through Adaptive Frequency Transformer and Prototype Learning},
  year = {2025},
  journal = {IEEE Transactions on Circuits and Systems for Video Technology},
  pages = {1-1},
  doi = {10.1109/TCSVT.2025.3601659},
  author = {Xiang, Yan and Zhao, Kaiqi and Yu, Zhenghong and Yuan, Xiaochen and Huang, Guoheng and Tian, Jinyu and Li, Jianqing}
}

[Picture]
Overview of DFFormer
Abstract: The proliferation of modern image editing tools has raised concerns about image manipulation, particularly regarding the potential to mislead the public and compromise privacy and security. Consequently, detecting and localizing tampered regions has become a critical research challenge. Traditional methods struggle with subtle manipulations, such as splicing, copy-move, and removal, which are often more discernible in the frequency domain than in the spatial domain. Additionally, the size imbalance between the tampered and background regions further complicates the detection process. To address these challenges, we propose DFFormer, an end-to-end network that leverages frequency feature differences and a dynamic token strategy for precise manipulation localization. DFFormer combines the Conventional Neural Network (CNN) and Transformer in a hybrid architecture with three key modules: the Adaptive Frequency Transformer (AFT), the Prototype Learning Module (PLM), and the Cascaded Progressive Token Fusion Head (CPTF-Head). AFT integrates high- and low-frequency components into self-attention via the Parallel Adaptive Frequency Attention (PAFA) block, enhancing tampering feature representation while preserving fine details. PLM employs KNN-based density peak clustering (DPC-KNN) and weighted token aggregation to optimize dynamic token reduction. The CPTF-Head adopts a hierarchical coarse-to-fine strategy to integrate multiscale features, thereby improving localization accuracy and edge refinement. Experiments demonstrate that DFFormer outperforms state-of-the-art models across four benchmark datasets and one real-world dataset, exhibiting superior generalization and robustness. The source code is publicly available at https://github.com/XiangGD/DFFormer.git.