计算机科学 ›› 2023, Vol. 50 ›› Issue (1): 114-122.doi: 10.11896/jsjkx.211100269

• 计算机图形学&多媒体 • 上一篇    下一篇

基于Transformer的多任务图像拼接篡改检测算法

张婧媛, 王宏霞, 何沛松   

  1. 四川大学网络空间安全学院 成都 610065
  • 收稿日期:2021-12-29 修回日期:2022-03-01 出版日期:2023-01-15 发布日期:2023-01-09
  • 通讯作者: 何沛松(gokeyhps@scu.edu.cn)
  • 作者简介:jyzhang.z@foxmail.com
  • 基金资助:
    四川省科技计划(2022YFG0320);国家自然科学基金(61902263,61972269);中央高校基本科研业务费专项资金(YJ201881,2020SCU12066);中国博士后科学基金(2020M673276)

Multitask Transformer-based Network for Image Splicing Manipulation Detection

ZHANG Jingyuan, WANG Hongxia, HE Peisong   

  1. School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China
  • Received:2021-12-29 Revised:2022-03-01 Online:2023-01-15 Published:2023-01-09
  • About author:ZHANG Jingyuan,born in 1996,postgraduate.Her main research interests include digital image forensics and deep learning.
    HE Peisong,born in 1991,Ph.D,asso-ciate professor.His main researchin-terests include multimedia security and deep learning.
  • Supported by:
    Science and Technology Program of Sichuan Province(2022YFG0320),National Natural Science Foundation of China(61902263,61972269),Fundamental Research Funds for Central Universities of Ministry of Education of China(YJ201881,2020SCU12066) and China Postdoctoral Science Foundation(2020M673276).

摘要: 现有基于深度学习的图像拼接篡改检测方法大多依赖卷积操作的局部计算过程,感受野有限。此外,现有方法大多仅将篡改区域定位用于指导检测模型训练,难以学习更加丰富的篡改痕迹特征。针对上述局限性,提出了基于Transformer的多任务图像拼接篡改检测网络(Multitask Transformer-based Network,MT-Net),利用Transformer中的自注意力机制在特征提取过程获取图像像素之间的相关性,自适应地为各像素提供不同的关注度,提升检测网络对篡改痕迹的表征能力。此外,MT-Net同时考虑多个子任务从局部细化和整体感知两个方面共同引导网络学习,包括篡改区域定位、篡改边缘定位和篡改比例预测,并根据子任务特点设计了对应的损失函数来指导网络进行优化。实验结果表明,相比现有算法,所提算法在CASIA V2.0,Columbia和IDM2020这3个公开数据集上均取得了更好的检测准确性,F1值分别达到了0.808,0.913和0.675。可视化检测结果图表明,所提算法在定位拼接篡改区域时也有较好的表现。

关键词: 数字图像取证, 图像拼接检测, Transformer, 自注意力机制, 多任务网络

Abstract: Most of existing deep learning-based methods for image splicing forgery detection use convolutional layer for forensics feature extraction.However,convolution kernel conducts the local computation process with the limited reception field.More-over,existing methods mainly apply the location of tampering regions to guide the detection model to train,and it is difficult to learn richer tamper trace features.To overcome above-mentioned limitations,a multitask transformer-based network(MT-Net) is proposed for image splicing detection and localization.The self-attention mechanism of Transformer is leveraged in encoder to learn the pixel correlation,which is able to provide different attention levels for pixels and makes the detection network pay more attention to tampering traces.Meanwhile,MT-Net considers three subtasks simultaneously to guide the detection network expose tampering traces from both local and global information,including tampered edge detection,tampered area detection and the prediction of the tampered area's proportion.Finally,three specific loss functions for their corresponding subtask are designed to better optimize the detection network in the training phase.In experiments,the proposed method(MT-Net) achieves better detection results compared with other state-of-the-art methods on three public available datasets,including CASIA v2.0,Columbia and IDM2020,where F1 scores are 0.808,0.913 and 0.675 respectively.The visualization results also demonstrate that the proposed method has the better capability of localizing the splicing regions.

Key words: Digital image forensics, Image splicing detection, Transformer, Self-attention mechanism, Multitask network

中图分类号: 

  • TP391
[1]LIU Y,WANG H X,CHEN Y,et al.A passive forensic scheme for copy-move forgery based on superpixel segmentation and K-means clustering[J].Multimedia Tools and Applications,2020,79(1/2):477-500.
[2]MAHDIAN B,SAIC S.Using noise inconsistencies for blindimage forensics[J].Image and Vision Computing,2009,27(10):1497-1503.
[3]HOU J U,LEE H K.Detection of Hue modification using photo response nonuniformity[J].IEEE Transactions on Circuits and Systems for Video Technology,2017,27(8):1826-1832.
[4]FERRARA P,BIANCHI T,ROSA A D,et al.Image forgery localization via fine-grained analysis of CFA artifacts[J].IEEE Transactions on Information Forensics and Security,2012,7(5):1566-1577.
[5]CHEN C,MCCLOSKEY S,YU J.Image splicing detection via camera response function analysis[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:1876-1885.
[6]LIU B,PUN C M.Deep fusion network for splicing forgery localization[C]//European Conference on Computer Vision(ECCV).2019:237-251.
[7]BI X L,WEI Y,XIAO B,et al.RRU-net:The ringed residual U-net for image splicing forgery detection[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Workshops.2019:30-39.
[8]WU Y,ABDALMAGEED W,NATARAJAN P.Mantra-net:Manipulation tracing network for detection and localization of image forgeries with anomalous features[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2019:9543-9552.
[9]ZHOU J H,NI J Q,RAO Y.Block-based convolutional neural network for image forgery detection[C]//International Workshop on Digital Watermarking(IWDW).2017:1-10.
[10]BAPPY J H,SIMONS C,NATARAJ L,et al.Hybrid LSTM and encoder-decoder architecture for detection of image forgeries[J].IEEE Transactions on Image Processing,2019,28(7):3286-3300.
[11]ZHOU P,HAN X T,MORARIU V I,et al.Learning rich features for image manipulation detection[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2018:1053-1061.
[12]DIRIK A E,MEMON N.Image tamper detection based on demosaicing artifact[C]//16th IEEE International Conference on Image Processing(ICIP).2009:1497-1500.
[13]LIN Z C,HE J F,TANG X O,et al.Fast,automatic and fine-grained tampered JPEG image detection via DCT coefficient analysis[J].Pattern Recognition,2019,42(11):2492-2501.
[14]KRAWETZ N.A picture's worth:digital image analysis and forensics[EB/OL].[2021-11-29].http://hackerfactor.org/papers/bh-usa-07-krawetz-wp.pdf.
[15]RAO Y,NI J Q.A deep learning approach to detection of splicing and copy-move forgeries in images[C]//2016 IEEE International Workshop on Information Forensics and Security(WIFS).2016:1-6.
[16]FRIDRICH J,KODOVSKY J.Rich models for steganalysis of digital images[J].IEEETransactions on Information Forensics and Security,2012,7(3):868-882.
[17]CUN X D,PUN C M.Image splicing localization via semi-global network and fully connected conditional random fields[C]//European Conference on Computer Vision(ECCV).2019:252-266.
[18]BAPPY J H,ROY-CHOWDHURY A K,BUNK J,et al.Exploiting Spatial Structure for Localizing Manipulated Image Regions[C]//IEEE International Conference on Computer Vision(ICCV).2017:4980-4989.
[19]KWON M J,YU I J,NAM S H,et al.CAT-Net:Compression artifact tracing network for detection and localization of image splicing[C]//IEEE Winter Conference on Applications of Computer Vision(WACV).2021:375-384.
[20]SALLOUM R,REN Y Z,JAY K C C.Image splicing localization using a multi-task fully convolutional network(MFCN)[J].Journal of Visual Communication and Image Representation,2018(51):201-209.
[21]KNIAZ V V,KNYAZ V A,REMONDINO F.The point where reality meets fantasy:Mixed adversarial generators for image splice detection[C]//35th Conference on Neural Information Processing Systems(NeurIPS).2019:215-226.
[22]BI X L,ZHANG Z P,LIU Y B,et al.Multi-Task wavelet corrected network for image splicing forgery detection and localization[C]//IEEE International Conference on Multimedia and Expo(ICME).2021:1-6.
[23]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//31st International Conference on Neural Information Processing Systems.2017:6000-6010.
[24]XIE E,WANG W H,YU Z D,et al.SegFormer:Simple and efficient design for semantic segmentation with Transformers[C]//Neural Information Processing Systems(NeurIPS).2021:1-18.
[25]HENDRYCKS D,GIMPEL K.Gaussianerror linear units(GELUs)[J].arXiv:1606.08415,2016.
[26]WEI J,WANG S H,HUANG Q M.F3Net:Fusion,feedbackand focus for salient object detection[C]//34th AAAI Confe-rence on Artificial Intelligence.2020:12321-12328.
[27]HE K M,ZHANG X Y,REN S Q,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2014,37(9):1904-1916.
[28]RAHMAN M A,WANG Y.Optimizing intersection-over-union in deep neural networks for image segmentation[C]//International Symposium on Visual Computing(ISVC).2016:234-244.
[29]DONG J,WANG W,TAN T,CASIA image tampering detection evaluation database[C]//2013 IEEE China Summit and International Conference on Signal and Information Processing.2013:422-426.
[30]HSU Y F,CHANG S F.Detecting image splicing using geometry invariants and camera characteristics consistency[C]//IEEE International Conference on Multimedia and Expo(ICME).2006:549-552.
[31]NOVOZAMSKY A,MAHDIAN B,SAIC S.Imd2020:A large-scale annotated dataset tailored for detecting manipulated images[C]//IEEE Winter Conference on Applications of Computer Vision(WACV) Workshops.2020:71-80.
[32]HE K M,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:770-778.
[1] 蔡肖, 陈志华, 盛斌.
基于移位窗口金字塔Transformer的遥感图像目标检测
SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing
计算机科学, 2023, 50(1): 105-113. https://doi.org/10.11896/jsjkx.211100208
[2] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[3] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[4] 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩.
基于Transformer和LSTM的药物相互作用预测
Drug-Drug Interaction Prediction Based on Transformer and LSTM
计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
[5] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[6] 赵小虎, 叶圣, 李晓.
多算法融合的骨骼重建信息动作分类方法
Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction
计算机科学, 2022, 49(6): 269-275. https://doi.org/10.11896/jsjkx.210500070
[7] 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀.
基于BERT-GRU-ATT模型的中文实体关系分类
Chinese Entity Relations Classification Based on BERT-GRU-ATT
计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123
[8] 陆亮, 孔芳.
面向对话的融入知识的实体关系抽取
Dialogue-based Entity Relation Extraction with Knowledge
计算机科学, 2022, 49(5): 200-205. https://doi.org/10.11896/jsjkx.210300198
[9] 李川, 李维华, 王迎晖, 陈伟, 文俊颖.
基于transformer的门控双塔模型预测H1N1流感抗原性
Gated Two-tower Transformer-based Model for Predicting Antigenicity of Influenza H1N1
计算机科学, 2022, 49(11A): 211000209-6. https://doi.org/10.11896/jsjkx.211000209
[10] 王帅, 张淑军, 叶康, 郭淇.
基于改进Transformer的连续手语识别方法
Continuous Sign Language Recognition Method Based on Improved Transformer
计算机科学, 2022, 49(11A): 211200198-6. https://doi.org/10.11896/jsjkx.211200198
[11] 胡新荣, 陈志恒, 刘军平, 彭涛, 叶鹏, 朱强.
基于多模态表示学习的情感分析框架
Sentiment Analysis Framework Based on Multimodal Representation Learning
计算机科学, 2022, 49(11A): 210900107-6. https://doi.org/10.11896/jsjkx.210900107
[12] 方仲俊, 张静, 李冬冬.
基于空间和多层级联合编码的图像描述算法
Spatial Encoding and Multi-layer Joint Encoding Enhanced Transformer for Image Captioning
计算机科学, 2022, 49(10): 151-158. https://doi.org/10.11896/jsjkx.210900159
[13] 胡艳丽, 童谭骞, 张啸宇, 彭娟.
融入自注意力机制的深度学习情感分析方法
Self-attention-based BGRU and CNN for Sentiment Analysis
计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063
[14] 杨慧敏, 马廷淮.
融合检索与生成的复合对话模型
Compound Conversation Model Combining Retrieval and Generation
计算机科学, 2021, 48(8): 234-239. https://doi.org/10.11896/jsjkx.200700162
[15] 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳.
基于多级特征和全局上下文的纵膈淋巴结分割算法
Mediastinal Lymph Node Segmentation Algorithm Based on Multi-level Features and Global Context
计算机科学, 2021, 48(6A): 95-100. https://doi.org/10.11896/jsjkx.200700067
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!