基于多尺度融合注意力的多视角文档图像篡改检测与定位

doi:10.11896/jsjkx.240100142

计算机科学 ›› 2025, Vol. 52 ›› Issue (4): 327-335.doi: 10.11896/jsjkx.240100142

基于多尺度融合注意力的多视角文档图像篡改检测与定位

孟思江, 王宏霞, 曾强, 周炀

四川大学网络空间安全学院成都 610065

收稿日期:2024-01-17 修回日期:2024-07-01 出版日期:2025-04-15 发布日期:2025-04-14
通讯作者: 王宏霞(hxwang@scu.edu.cn)
作者简介:(mengsijiang@stu.scu.edu.cn)
基金资助:
国家自然科学基金(62272331)

Multi-view and Multi-scale Fusion Attention Network for Document Image Forgery Localization

MENG Sijiang, WANG Hongxia, ZENG Qiang, ZHOU Yang

School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China

Received:2024-01-17 Revised:2024-07-01 Online:2025-04-15 Published:2025-04-14
About author:MENG Sijiang,born in 1998,postgra-duate.Her main research interests include multimedia security and digital image forensics.
WANG Hongxia,born in 1973,Ph.D,professor,Ph.D supervisor.Her main research interests include multimedia security,information hiding,digital watermarking,digital forensics and intelligent information processing.
Supported by:
National Natural Science Foundation of China(62272331).

摘要/Abstract

摘要： 随着各类数字化平台的完善和应用,文档类图像在网络上得到了广泛传播。与此同时,图像处理技术的发展也增大了文档类图像被篡改的风险,保障文档图像的完整性和真实性变得至关重要。为了提高真实场景下文档类图像篡改区域定位的准确度,提出了一种基于多尺度融合注意力的多视角文档类图像篡改检测与定位方法(Multi-View and Multi-Scale Fusion Attention Network,MM-Net),采用多视角编码器结合RGB图像、噪声信息和字符特征信息,充分地挖掘篡改特征。此外,MM-Net设计多尺度融合注意力模块以实现不同尺度的特征交互,增强文档图像中的关键内容信息,从而提高文档类图像篡改区域定位的精度。在大规模数据集DocTamper上的大量实验结果表明,MM-Net实现了更精确的文档类图像篡改区域定位,在测试数据集、跨域数据集FCD和SCD上的F1值分别达到了0.809,0.807和0.774,并表现出了良好的泛化性和鲁棒性。

关键词: 文档类图像篡改检测, 深度学习, 多尺度, 数字图像取证, 多视角

Abstract: With the improvement and application of various digital platforms,document images have been widely spread on the Internet.At the same time,the development of image processing technology has increased the risk of document image tampering,making it crucial to ensure the integrity and authenticity of document images.In this paper,we propose multi-view and multi-scale fusion attention network(MM-Net),aiming for improving the accuracy of document image forgery localization in real-world.We adopt multi-view encoder combined with RGB information,noise information,and character information to fully extract tampering features.A multi-scale fusion attention module is designed to facilitate the interaction of multi-scale features,thus enhancing important content information in document images.Extensive experimental results on the large-scale dataset DocTamper demonstrate that the proposed MM-Net achieves more precise localization of tampered regions in document images,with F-score of 0.809,0.807,and 0.774 on the test dataset,cross domain dataset FCD and SCD,respectively.Moreover,MM-Net exhibits good generalizability and robustness.

Key words: Document image forgery detection, Deep learning, Multi-scale, Digital image forensics, Multi-view

中图分类号:

TP391

孟思江, 王宏霞, 曾强, 周炀. 基于多尺度融合注意力的多视角文档图像篡改检测与定位[J]. 计算机科学, 2025, 52(4): 327-335. https://doi.org/10.11896/jsjkx.240100142

MENG Sijiang, WANG Hongxia, ZENG Qiang, ZHOU Yang. Multi-view and Multi-scale Fusion Attention Network for Document Image Forgery Localization[J]. Computer Science, 2025, 52(4): 327-335. https://doi.org/10.11896/jsjkx.240100142

参考文献

[1]WU Y,ABDALMAGEED W,NATARAJAN P.ManTra-Net:Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE CompSoc,2019:9535-9544.
[2]KWON M J,YU I J,NAM S H,et al.CAT-Net:Compression Artifact Tracing Network for Detection and Localization of Image Splicing[C]//IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE Comp Soc,2021:375-384.
[3]LI S,XU S,MA W,et al.Image Manipulation Localization Using Attentional Cross-Domain CNN Features[J].IEEE Transactions on Neural Networks and Learning Systems,2021,34(9):5614-5628.
[4]CHEN X,DONG C,JI J,et al.Image Manipulation Detection byMulti-View Multi-Scale Supervision[C]//IEEE/CVF International Conference on Computer Vision(ICCV).IEEE Comp Soc,2021:14165-14173.
[5]WU H,ZHOU J,TIAN J,et al.Robust Image Forgery Detection Against Transmission Over Online Social Networks[J].IEEETransactions on Information Forensics and Security,2022,17:443-456.
[6]CHEN J,LIAO X,WANG W,et al.SNIS:A Signal Noise Separation-Based Network for Post-Processed Image Forgery Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(2):935-951.
[7]CRUZ F,SIDÈRE N,COUSTATY M,et al.Local Binary Patterns for Document Forgery Detection[C]//2017 14th IAPR International Conference on Document Analysis and Recognition(ICDAR):Vol.01.IEEE Comp Soc,2017:1223-1228.
[8]BERTRAND R,TERRADES O R,GOMEZ-KRÄMER P,et al.A Conditional Random Field model for font forgery detection[C]//2015 13th International Conference on Document Analysis and Recognition(ICDAR).IEEE Comp Soc,2015:576-580.
[9]GORNALE S S,PATIL G,BENNE R.Document Image ForgeryDetection Using RGB Color Channel[J].Transactions on Engineering and Computing Sciences,2022,10(5):1-14.
[10]XU W,LUO J,ZHU C,et al.Document images forgery localization using a two-stream network[J].International Journal of Intelligent Systems,2022,37(8):5272-5289.
[11]YANG P,FANG W,ZHANG F,et al.Document Image Forgery Detection Based on Deep Learning Models[C]//2022 International Symposium on Electrical,Electronics and Information Engineering(ISEEIE).World Scientific,2022:36-41.
[12]JAISWAL G,SHARMA A,YADAV S K.Deep feature extraction for document forgery detection with convolutional autoencoders[J].Computers and Electrical Engineering,2022,99:107770.
[13]QU C,LIU C,LIU Y,et al.Towards Robust Tampered Text Detection in Document Image:New Dataset and New Solution[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE Comp Soc,2023:5937-5946.
[14]LIAO X,CHEN S,CHEN J,et al.CTP-Net:Character Texture Perception Network for Document Image Forgery Localization[J].arXiv:2308.02158,2023.
[15]FERRARA P,BIANCHI T,DE ROSA A,et al.Image Forgery Localization via Fine-Grained Analysis of CFA Artifacts[J].IEEE Transactions on Information Forensics and Security,2012,7(5):1566-1577.
[16]SUTCU Y,COSKUN B,SENCAR H T,et al.Tamper Detection Based on Regularity of Wavelet Transform Coefficients[C]//IEEE International Conference on Image Processing(ICIP 2007).IEEE Comp Soc,2007:397-400.
[17]BAPPY J H,SIMONS C,NATARAJ L,et al.Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries[J].IEEE Transactions on Image Processing,2019,28(7):3286-3300.
[18]ZHANG Z,ZHANG Y,ZHOU Z,et al.Boundary-Based Image Forgery Detection by Fast ShallowCNN[C]//International Conference on Pattern Recognition(ICPR).IEEE Comp Soc,2018:2658-2663.
[19]AMERINI I,URICCHIO T,BALLAN L,et al.Localization of JPEG Double Compression Through Multi-domain Convolutional Neural Networks[C]//IEEE/CVF Conferenceon Computer Vision and Pattern Recognition Workshops(CVPRW).IEEE Comp Soc,2017:1865-1871.
[20]LI W,LI X,NI R,et al.Quantization Step Estimation for JPEG Image Forensics[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(7):4816-4827.
[21]WANG H,WANG J,LUO X,et al.Detecting Aligned Double JPEG Compressed Color Image With Same Quantization Matrix Based on the Stability of Image[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(6):4065-4080.
[22]GOU H,SWAMINATHAN A,WU M.Noise Features forImage Tampering Detection and Steganalysis[C]//IEEE International Conference on Image Processing(ICIP 2007).IEEE Comp Soc,2007:97-100.
[23]MAHDIAN B,SAIC S.Using Noise Inconsistencies for BlindImage Forensics[J].Image and Vision Computing,2009,27(10):1497-1503.
[24]ZHOU P,HAN X,MORARIU V I,et al.Learning Rich Fea-tures for Image Manipulation Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE Comp Soc,2018:1053-1061.
[25]HU X,ZHANG Z,JIANG Z,et al.SPAN:Spatial Pyramid Attention Network for Image Manipulation Localization[C]//Computer Vision - ECCV 2020.Springer International Publishing,2020:312-328.
[26]YANG C,LI H,LIN F,et al.Constrained R-Cnn:A General Image Manipulation Detection Model[C]//IEEE International Conference on Multimedia and Expo(ICME).IEEE Comp Soc,2020:1-6.
[27]LUO Z,SHAFAIT F,MIAN A.Localized forgery detection in hyperspectral document images[C]//2015 13th International Conference on Document Analysis and Recognition(ICDAR).IEEE Comp Soc,2015:496-500.
[28]BIBI M,HAMID A,MOETESUM M,et al.Document Forgery Detection using Printer Source Identification－A Text-Independent Approach[C]//2019 International Conference on Document Analysis and Recognition Workshops(ICDARW):Vol.8.IEEE Comp Soc,2019:7-12.
[29]LIANG W,DONG L,WANG R,et al.Robust Document Image Forgery Localization Against Image Blending[C]//2022 IEEE International Conference on Trust,Security and Privacy in Computing and Communications(TrustCom).IEEE Comp Soc,2022:810-817.
[30]TAN M,LE Q.EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks[C]//Proceedings of the 36th International Conference on Machine Learning.PMLR,2019:6105-6114.
[31]SHI B,BAI X,YAO C.An End-to-End Trainable Neural Net-work for Image-Based Sequence Recognition and Its Application to Scene Text Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(11):2298-2304.
[32]LYU P,LIAO M,YAO C,et al.Mask TextSpotter:An End-to-End Trainable Neural Network for Spotting Text with ArbitraryShapes[C]//Proceedings of the European Conference on Computer Vision(ECCV).Springer,2018:67-83.
[33]QIAO Z,ZHOU Y,YANG D,et al.SEED:Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE Comp Soc,2020:13528-13537.
[34]LI M,LV T,CHEN J,et al.TrOCR:Transformer-Based Optical Character Recognition with Pre-trained Models[J].Proceedings of the AAAI Conference on Artificial Intelligence,2023,37(11):13094-13102.
[35]CASTRO-BLEDA M J,ESPAÑA-BOQUERA S,PASTOR-PELLICER J,et al.The NoisyOffice Database:A Corpus To Train Supervised Machine Learning Filters For Image Proces-sing[J].The Computer Journal,2020,63(11):1658-1667.
[36]HUAWEI CLOUD.Huawei cloud visual information extraction competition[EB/OL].https://competition.huaweicloud.com/information/1000041696/introduction.
[37]BAO H,DONG L,PIAO S,et al.BEiT:BERT Pre-Training of Image Transformers[C]//International Conference on Learning Representations.OpenReview,2021.
[38]LIU Z,HU H,LIN Y,et al.Swin Transformer V2:Scaling Up Capacity and Resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE Comp Soc,2022:12009-12019.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于多尺度融合注意力的多视角文档图像篡改检测与定位

Multi-view and Multi-scale Fusion Attention Network for Document Image Forgery Localization

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0