Computer Science ›› 2021, Vol. 48 ›› Issue (7): 86-92.doi: 10.11896/jsjkx.210200127

Special Issue: Artificial Intelligence Security

• Artificial Intelligence Security • Previous Articles     Next Articles

Deepfake Video Detection Based on 3D Convolutional Neural Networks

XING Hao, LI Ming   

  1. College of Data Science,Taiyuan University of Technology,Jinzhong,Shanxi 030600,China
  • Received:2021-02-22 Revised:2021-04-29 Online:2021-07-15 Published:2021-07-02
  • About author:XING Hao,born in 1994,master.His main research interests include compu-ter vision and artificial intelligence.(923136917@qq.com)
    LI Ming,born in 1982,Ph.D,professor,Ph.D supervisor.His main research interests include computer vision and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(11771321) and Shanxi Province Plan Project on Science and Technology of Social Development(201703D321032).

Abstract: In recent years,“Deepfake” has attracted widespread attention.It is difficult for people to distinguish Deepfake videos.However,these forged videos will bring huge potential threats to our society,such as being used to make fake news.Therefore,it is necessary to find a method to identify these synthetic videos.In order to solve the problem,a Deepfake video detection model based on 3D CNNS for deepfake detection is proposed.This model notices the inconsistency of temporal and spatial features in the Deepfake video,and 3D CNNS can effectively capture temporal and spatial features of deepfake video.The experimental results show that models based on 3D CNNS have high accuracy rate,and strong robustness on the Deepfake-detection-challenge dataset and Celeb-DF dataset.The detection accuracy of the proposed model reaches 96.25%,and the AUC value reaches 0.92.This model also solves the problem of poor generalization.By comparing with the existing Deepfake detection models,the proposed model is superior to the existing models in terms of detection accuracy and AUC value,which verifies the effectiveness of the proposed model.

Key words: 3D CNNS, Deepfake detection, Spatial features, Synthetic videos, Temporal features

CLC Number: 

  • TP391.41
[1]Deepfake[EB/OL].http://github.com/deepfakes/faceswapAccessed October 29,2019.
[2]GOODFELLOW I,POUGET-ABADIE J,BENGIO Y,et al.Generative adversarial nets[C]//Neural Information Processing Systems (NeurIPS’14).2014:2672-2680.
[3]DOLHANSKY B,HOWES R,PFLAUM B,et al.The deepfake detection challenge preview dataset[J].arXiv:1910.08854,2019.
[4]FakeApp[EB/OL].http://www.malavida.com/en/soft/fakeapp.
[5]BADRINARAYANAN V,KENDALL A,CIPOLLA R.Segnet:A deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481-2495.
[6]DeepFaceLab[EB/OL].http://github.com/iperov/DeepfaceLab.
[7]DFaker[EB/OL].http://github.com/dfaker/df.
[8]JOSEPH I V,ZHOU Z,ZHANG C,et al.Facial Recognition via Transfer Learning:Fine-Tuning Keras_vggface[C]//2017 International Conference on Computational Science and Computational Intelligence (CSCI).2017.
[9]Faceswap-gan[EB/OL].http://github.com/shaoanlu/faceswap-GAN.
[10]NIRKIN Y,MASI I,TUAN A T,et al.On face segmentation,face swapping,and face perception[C]//2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).IEEE,2018:98-105.
[11]THIES J,ZOLLHOFER M,STAMMINGER M,et al.Face2-Face:Real-Time Face Capture and Reenactment of RGB Videos[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Las Vegas,NV,USA.Piscataway,NJ:IEEE,2016:2387-2395.
[12]YANG X,LI Y,LYU S.Exposing deepakes using inconsistent head poses[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2019:8261-8265.
[13]LI Y,LYU S.Exposing deepfake videos by detecting face warping artifacts[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2019:46-52.
[14]NGUYEN H H,YAMAGISHI J,ECHIZEN I.Use of a capsule network to detect fake images and videos[J].arXiv:1910.12467,2019.
[15]HINTON G E,KRIZHEVSKY A,WANG S D.Transformingauto-encoders[C]//International Conference on Artificial Neural Networks (ICANN).Springer,2011.
[16]SABOUR S,FROSST N,HINTON G E.Dynamic routing between capsules[C]//Conference on Neural Information Proces-sing Systems (NIPS).2017.
[17]ROSSLER A,COZZOLINO D,VERDOLIVA L,et al.FaceForensics++:Learning to Detect Manipulated Facial Images[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:1-11.
[18]AFCHAR D,NOZICK V,YAMAGISHI J,et al.Mesonet:acompact facial video forgery detection network[C]//2018 IEEE International Workshop on Information Forensics and Security (WIFS).IEEE,2018:1-7.
[19]SABIR E,CHENG J,JAISWAL A,et al.Recurrent Convolutional Strategies for Face Manipulation Detection in Videos[J].Interfaces (GUI),2019,3:1.
[20]HUANG G,LIU Z,VAN DER MAATEN L,et al.DenselyConnected Convolutional Networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017:2261-2269.
[21]CHO K,VAN MERRIENBOER B,GULCEHRE C,et al.Learning phrase representations using rnn encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014.
[22]GUERA D,DELP E J.Deepfake Video Detection Using Recurrent Neural Networks[C]//15th IEEE International Conference on Advanced Video and Signal-Based Surveillance,Institute of Electrical and Electronics Engineers Inc.doi:10.1109/AVSS.2018.8639163,2019.
[23]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9:1735-1780.
[24]CIFTCI U,DEMIR I.FakeCatcher:Detection of Synthetic Portrait Videos using Biological Signals[J].arXiv:1901.02212,2019.
[25]LI Y,CHANG M,LYU S.In ictu oculi:Exposing ai created fake videos by detecting eye blinking [C]//WIFS.2018.
[26]LI L,BAO J,ZHANG T,et al.Face X-Rayfor More General Face Forgery Detection[C]//CVPR,2020.
[27]ZHANG Y X,LI G,CAO Y,et al.A Method for Detecting Human-face-tampered Videosbased on Interframe Difference[J].Journal of Cyber Security,2020(2):49-72.
[28]LI J C,LIU B B,HU Y J,et al.Deepfake Video Detection Based on Consistency of Illumination Direction[J].Journal of Nanjing University of Aeronautics & Astronautics,2020,52(5):90-97.
[29]HU Y J,GAO Y F,LIU B B.Deepfake Videos Detection Based on Image Segmentation with Deep Neural Networks[J].Journal of Electronics & Information Technology,2021,43(1):162-170.
[30]MÄNTTÄRI J,BROOMÉ S,FOLKESSON J,et al.Interpreting video features:a comparison of 3D convolutional networks and convolutional LSTM networks[J].arXiv:2002.00367,2020.
[31]ZHANG K,ZHANG Z,LI Z,et al.Joint face detection andalignment using multitask cascaded convolutional networks[C]//IEEE Signal Processing Letters.2016:1499-1503.
[32]CARREIRA J,ZISSERMAN A.Quo Vadis,action recognition? a new model and the kinetics dataset[C]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.
[33]TRAN D,WANG H,TORRESANI L,et al.A closer look atspatio-temporal convolutions for action recognition[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2018:6450-6459.
[34]HARA K,KATAOKA H,SATOH Y.Learning spatio-temporal features with 3d residual networks for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2017:3154-3160.
[35]LI Y,YANG X,SUN P,et al.Celeb-DF:A Large-Scale Chal-lenging Dataset for DeepFake Forensics[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2020.
[1] ZHOU Jie, LUO Yun-fang, LEI Yao-jian, LI Wen-jing, FENG Yu. Multi-scale Convolutional Neural Network Air Quality Prediction Model Based on Spatio-Temporal Optimization [J]. Computer Science, 2020, 47(11A): 535-540.
[2] KONG Fan-yu, ZHOU Yu-feng, CHEN Gang. Traffic Flow Prediction Method Based on Spatio-Temporal Feature Mining [J]. Computer Science, 2019, 46(7): 322-326.
[3] XU Deng, HUANG Xiao-dong. Fire Images Features Extraction Based on Improved Two-stream Convolution Network [J]. Computer Science, 2019, 46(11): 291-296.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!