Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 220100106-9.doi: 10.11896/jsjkx.220100106

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Detection of Deepfakes Based on Dual-stream Network

LI Ying1, BIAN Shan1,2,3, WANG Chun-tao1,2, HUANG Qiong1,2   

  1. 1 College of Mathematics and Informatics,South China Agricultural University,Guangzhou 510642,China
    2 Guangzhou Key Laboratory of Intelligent Agriculture,Guangzhou 510642,China
    3 Guangdong Provincial Key Laboratory of Information Security Technology,Guangzhou 510006,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:LI Ying,born in 1998,postgraduate,is a student member of China Computer Federation.Her main research interests include video forensics and so on.
    BIAN Shan,born in 1986,Ph.D,asso-ciate professor,is a member of China Computer Federation.Her main research interests include video forensics and tampering detection.
  • Supported by:
    National Natural Science Foundation of China(61702199,62172165,61872152),Major Program of Guangdong Basic and Applied Research(2019B030302008),Opening Project of Guangdong Province Key Laboratory of Information Security Technology(2020B1212060078-07) and Science and Technology Program of Guangzhou(202102020582,201902010081).

Abstract: Deepfake is a kind of deep network model based on generative adversarial networks(GAN).It uses the source and target faces to generate highly realistic face videos that are difficult to identify.If some malicious person uses this technology to make fake videos and spread rumors on the Internet,it will infringe personal portrait right,cause adverse social impact,or even cause serious judicial disputes.In view of the serious threat brought by deepfake technology,many researchers at home and abroad pay close attention to the study of the deepfake detection technology,and have put forward some effective detection me-thods.The existing detection methods achieve good detection results in high-quality videos,but most videos in daily applications are usually compressed into low-quality versions through social software.However,most of the existing deepfake detection methods have a significant decline in detecting this kind of low-quality videos.Besides,the detection performances of existing methods are still unsatisfying in the case of cross datasets,limiting their real applications.To address this issue,this paper proposes a dual-flow network structure based on Xception model under multiple attention mechanism.The network structure includes an RGB branch using multiple attention mechanism and a frequency-domain branch for capturing low quality video artifacts.Based on our research,it is found that the tiny difference between real images and fake images tends to concentrates in some local area.The RGB branch under the multiple attention mechanism makes the model focus on different regions of the face,so it can get the glo-bal features aggregated by the low-level texture and high-level semantic features under the guidance of the attention map.Combined with the RGB branch,the discrete cosine transform(DCT) is introduced in the frequency domain branch to provide complementary feature representation,which can reflect subtle forgery traces or compression errors.Specifically,the proposed algorithm firstly extracts a large number of face frames from videos by face extractor algorithm,and feeds these face frames into the two-branch network model.The frequency branch decomposes the spectrum of images with three combined filters that provide additional learnable parts.In the RGB branch,the first three layers of the backbone network extract shallow features including texture information etc.Then the attention module makes the model attend to the shallow information from different local areas.The shallow information is then fed to the attention pooling layer to aggregate with the high-level semantic features from the rest layers of the backbone network.Finally,the network merges the feature vectors from both the RGB-branch and the frequency branch to obtain the final discriminant result.The combination of these two branches can significantly improve the detection performance of the model in cross database scenes and low-quality video sets.In order to verify the effectiveness of the proposed network structure,a large number of comparative experiments are conducted on three public datasets,including FaceForensics++,Celeb DF and DFDC.In the low-quality part of FaceForensics++ dataset,the AUC(Area Under the Curve) can reach 0.9271.In the video level,the detection accuracy of low-quality and high-quality videos can reach 93.84% and 99.69%,respectively.Experimental results show that the proposed algorithm outperforms the existing detection algorithms in low-quality video sets as well as in cross dataset scenes.It verifies that the combination of dual-stream features including the RGB branch and the frequency branch can improve the robustness of the detection method,especially in low-quality video sets and in cross dataset scenes.

Key words: Deepfake, Video forensics, Dual stream network, Attention mechanism, RGB branch, Frequency branch

CLC Number: 

  • TP391
[1]BRANDON J.Terrifying High-Tech Porn:Creepy 《deepfake》Videos Are on the Rise[EB/OL].Fox News,2018.(2018-02-16)[2021-06-27].https://www.foxnews.com/tech/terrifying-high-tech-porn-creepy-deepfake-videos-are-on-the-rise.
[2]ROETTGERS J,ROETTGERS J.Porn Producers Offer to Help Hollywood Take Down Deepfake Videos[EB/OL].(2018-02-21)[2021-06-27].https://variety.com/2018/digital/news/deepfakes-porn-adult-industry-1202705749/.
[3]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative Adversarial Networks[J/OL].arXiv:1406.2661,2014.
[4]ZHAO H,ZHOU W,CHEN D,et al.Multi-attentional deepfake detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:2185-2194.
[5]QIAN Y,YIN G,SHENG L,et al.Thinking in frequency:Face forgery detection by mining frequency-aware clues[C]//Euro-pean Conference on Computer Vision.Cham:Springer,2020:86-103.
[6]AFCHAR D,NOZICK V,YAMAGISHI J,et al.Mesonet:acompact facial video forgery detection network[C]//2018 IEEE International Workshop on Information Forensics and Security(WIFS).IEEE,2018:1-7.
[7]ZHOU P,HAN X,MORARIU V I,et al.Two-stream neural networks for tampered face detection[C]//2017 IEEE Confe-rence on Computer Vision and Pattern Recognition Workshops(CVPRW).IEEE,2017:1831-1839.
[8]YU N,DAVIS L,FRITZ M.Attributing fake images to gans:Learning and analyzing gan fingerprints[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:7556-7566.
[9]MCCLOSKEY S,ALBRIGHT M.Detecting GAN-GeneratedImagery Using Color Cues[J].arXiv:1812.08247,2018.
[10]XIAO J,GONG L Y,HUANG T Q,et al.Deepfake swapped face detection based on double attention[J].Chinese Journal of Netword and Information Security,2021,7(2):151.
[11]BIAN M Y,PENG B,WANG W,et al.Detection of low- quality facial deepfake image based on void convolution[J].Modern Electronics Technique,2021,44(6):133-138.
[12]LI X R,YU K.A Deepfakes detection technique based on two-stream network[J].Journal of Cyber Security,2020,5(2):84-91.
[13]BAO Y X,LU T L,DU Y H,et al.Deepfake VideosDetection Method Basedoni_ResNet34 Model and Data Augmentation[J].Comptuter Science,2021,48(7):77-85.
[14]LIU H,LI X,ZHOU W,et al.Spatial-phase shallow learning:rethinking face forgery detection in frequency domain[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:772-781.
[15]WANG J,WU Z,CHEN J,et al.M2TR:Multi-Modal Multi-ScaleTransformers for Deepfake Detection[J].arXiv:2104.09770,2021.
[16]SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[17]CHOLLET F.Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1251-1258.
[18]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[19]LI Y,YANG X,SUN P,et al.Celeb-df:A large-scale challenging dataset for deepfake forensics[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3207-3216.
[20]DOLHANSKY B,BITTON J,PFLAUM B,et al.The deepfake detection challenge(dfdc) dataset[J].arXiv:2006.07397,2020.
[21]ROSSLER A,COZZOLINO D,VERDOLIVA L,et al.Facefo-rensics++:Learning to detect manipulated facial images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1-11.
[22]COZZOLINO D,POGGI G,VERDOLIVA L.Recasting residual-based local descriptors as convolutional neural networks:an application to image forgery detection[C]//Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security.2017:159-164.
[23]LI Y,LYU S.Exposing deepfake videos by detecting face warping artifacts[J].arXiv:1811.00656,2018.
[24]LI L,BAO J,ZHANG T,et al.Face x-ray for more general face forgery detection[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:5001-5010.
[25]MASI I,KILLEKAR A,MARIAN R,et al.Two-branch recurrent network for isolating deepfakes in videos[C]//European Conference on Computer Vision.Cham:Springer,2020:667-684.
[26]BONETTINI N,CANNAS E D,MANDELLI S,et al.Video face manipulation detection through ensemble of cnns[C]//2020 25th International Conference on Pattern Recognition(ICPR).IEEE,2021:5012-5019.
[27]NGUYEN H H,FANG F,YAMAGISHI J,et al.Multi-tasklearning for detecting and segmenting manipulated facial images and videos[J].arXiv:1906.06876,2019.
[28]NGUYEN H H,YAMAGISHI J,ECHIZEN I.Use of a capsule network to detect fake images and videos[J].arXiv:1910.12467,2019.
[29]WODAJO D,ATNAFU S.Deepfake video detection using con-volutional vision transformer[J].arXiv:2102.11126,2021.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[3] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[4] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[5] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[6] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[8] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[9] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[10] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[11] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[12] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[13] MENG Yue-bo, MU Si-rong, LIU Guang-hui, XU Sheng-jun, HAN Jiu-qiang. Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism [J]. Computer Science, 2022, 49(7): 142-147.
[14] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
[15] XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism [J]. Computer Science, 2022, 49(7): 212-219.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!