计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 86-92.doi: 10.11896/jsjkx.210200127

所属专题: 人工智能安全

• 人工智能安全* • 上一篇    下一篇

基于3D CNNS的深度伪造视频篡改检测

邢豪, 李明   

  1. 太原理工大学大数据学院 山西 晋中030600
  • 收稿日期:2021-02-22 修回日期:2021-04-29 出版日期:2021-07-15 发布日期:2021-07-02
  • 通讯作者: 李明(lm13653600949@126.com)
  • 基金资助:
    国家自然科学基金项目(11771321);山西省科技厅社会发展科技攻关计划项目(201703D321032)

Deepfake Video Detection Based on 3D Convolutional Neural Networks

XING Hao, LI Ming   

  1. College of Data Science,Taiyuan University of Technology,Jinzhong,Shanxi 030600,China
  • Received:2021-02-22 Revised:2021-04-29 Online:2021-07-15 Published:2021-07-02
  • About author:XING Hao,born in 1994,master.His main research interests include compu-ter vision and artificial intelligence.(923136917@qq.com)
    LI Ming,born in 1982,Ph.D,professor,Ph.D supervisor.His main research interests include computer vision and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(11771321) and Shanxi Province Plan Project on Science and Technology of Social Development(201703D321032).

摘要: 近年来,“Deepfake”视频引起了广泛的关注。 人们很难区分Deepfake视频。这些篡改的视频将给社会带来巨大的潜在威胁,如被用来制作假新闻等。 因此,目前需要找到一种有效识别这些合成视频的方法。 针对上述问题,提出了一种基于3D CNNS的深度伪造视频检测模型。 该模型注意到Deepfake视频的时域特征和空域特征的不一致,而3D CNNS可以有效捕获Deepfake视频的这一特征。实验结果表明,基于3D CNNS的模型在Deepfake检测挑战数据集和Celeb-DF数据集上具有较高的准确率和较强的鲁棒性,准确率可达96.25%,AUC值可达0.92,同时该模型解决了泛化性差的问题。通过与现有的Deepfake检测模型进行对比,所提模型在检测准确率和AUC取值方面均优于现有模型,验证了该模型的有效性。

关键词: Deepfake检测, 篡改视频, 空域特征, 三维卷积网络, 时域特征

Abstract: In recent years,“Deepfake” has attracted widespread attention.It is difficult for people to distinguish Deepfake videos.However,these forged videos will bring huge potential threats to our society,such as being used to make fake news.Therefore,it is necessary to find a method to identify these synthetic videos.In order to solve the problem,a Deepfake video detection model based on 3D CNNS for deepfake detection is proposed.This model notices the inconsistency of temporal and spatial features in the Deepfake video,and 3D CNNS can effectively capture temporal and spatial features of deepfake video.The experimental results show that models based on 3D CNNS have high accuracy rate,and strong robustness on the Deepfake-detection-challenge dataset and Celeb-DF dataset.The detection accuracy of the proposed model reaches 96.25%,and the AUC value reaches 0.92.This model also solves the problem of poor generalization.By comparing with the existing Deepfake detection models,the proposed model is superior to the existing models in terms of detection accuracy and AUC value,which verifies the effectiveness of the proposed model.

Key words: 3D CNNS, Deepfake detection, Spatial features, Synthetic videos, Temporal features

中图分类号: 

  • TP391.41
[1]Deepfake[EB/OL].http://github.com/deepfakes/faceswapAccessed October 29,2019.
[2]GOODFELLOW I,POUGET-ABADIE J,BENGIO Y,et al.Generative adversarial nets[C]//Neural Information Processing Systems (NeurIPS’14).2014:2672-2680.
[3]DOLHANSKY B,HOWES R,PFLAUM B,et al.The deepfake detection challenge preview dataset[J].arXiv:1910.08854,2019.
[4]FakeApp[EB/OL].http://www.malavida.com/en/soft/fakeapp.
[5]BADRINARAYANAN V,KENDALL A,CIPOLLA R.Segnet:A deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481-2495.
[6]DeepFaceLab[EB/OL].http://github.com/iperov/DeepfaceLab.
[7]DFaker[EB/OL].http://github.com/dfaker/df.
[8]JOSEPH I V,ZHOU Z,ZHANG C,et al.Facial Recognition via Transfer Learning:Fine-Tuning Keras_vggface[C]//2017 International Conference on Computational Science and Computational Intelligence (CSCI).2017.
[9]Faceswap-gan[EB/OL].http://github.com/shaoanlu/faceswap-GAN.
[10]NIRKIN Y,MASI I,TUAN A T,et al.On face segmentation,face swapping,and face perception[C]//2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).IEEE,2018:98-105.
[11]THIES J,ZOLLHOFER M,STAMMINGER M,et al.Face2-Face:Real-Time Face Capture and Reenactment of RGB Videos[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Las Vegas,NV,USA.Piscataway,NJ:IEEE,2016:2387-2395.
[12]YANG X,LI Y,LYU S.Exposing deepakes using inconsistent head poses[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2019:8261-8265.
[13]LI Y,LYU S.Exposing deepfake videos by detecting face warping artifacts[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2019:46-52.
[14]NGUYEN H H,YAMAGISHI J,ECHIZEN I.Use of a capsule network to detect fake images and videos[J].arXiv:1910.12467,2019.
[15]HINTON G E,KRIZHEVSKY A,WANG S D.Transformingauto-encoders[C]//International Conference on Artificial Neural Networks (ICANN).Springer,2011.
[16]SABOUR S,FROSST N,HINTON G E.Dynamic routing between capsules[C]//Conference on Neural Information Proces-sing Systems (NIPS).2017.
[17]ROSSLER A,COZZOLINO D,VERDOLIVA L,et al.FaceForensics++:Learning to Detect Manipulated Facial Images[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:1-11.
[18]AFCHAR D,NOZICK V,YAMAGISHI J,et al.Mesonet:acompact facial video forgery detection network[C]//2018 IEEE International Workshop on Information Forensics and Security (WIFS).IEEE,2018:1-7.
[19]SABIR E,CHENG J,JAISWAL A,et al.Recurrent Convolutional Strategies for Face Manipulation Detection in Videos[J].Interfaces (GUI),2019,3:1.
[20]HUANG G,LIU Z,VAN DER MAATEN L,et al.DenselyConnected Convolutional Networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017:2261-2269.
[21]CHO K,VAN MERRIENBOER B,GULCEHRE C,et al.Learning phrase representations using rnn encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014.
[22]GUERA D,DELP E J.Deepfake Video Detection Using Recurrent Neural Networks[C]//15th IEEE International Conference on Advanced Video and Signal-Based Surveillance,Institute of Electrical and Electronics Engineers Inc.doi:10.1109/AVSS.2018.8639163,2019.
[23]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9:1735-1780.
[24]CIFTCI U,DEMIR I.FakeCatcher:Detection of Synthetic Portrait Videos using Biological Signals[J].arXiv:1901.02212,2019.
[25]LI Y,CHANG M,LYU S.In ictu oculi:Exposing ai created fake videos by detecting eye blinking [C]//WIFS.2018.
[26]LI L,BAO J,ZHANG T,et al.Face X-Rayfor More General Face Forgery Detection[C]//CVPR,2020.
[27]ZHANG Y X,LI G,CAO Y,et al.A Method for Detecting Human-face-tampered Videosbased on Interframe Difference[J].Journal of Cyber Security,2020(2):49-72.
[28]LI J C,LIU B B,HU Y J,et al.Deepfake Video Detection Based on Consistency of Illumination Direction[J].Journal of Nanjing University of Aeronautics & Astronautics,2020,52(5):90-97.
[29]HU Y J,GAO Y F,LIU B B.Deepfake Videos Detection Based on Image Segmentation with Deep Neural Networks[J].Journal of Electronics & Information Technology,2021,43(1):162-170.
[30]MÄNTTÄRI J,BROOMÉ S,FOLKESSON J,et al.Interpreting video features:a comparison of 3D convolutional networks and convolutional LSTM networks[J].arXiv:2002.00367,2020.
[31]ZHANG K,ZHANG Z,LI Z,et al.Joint face detection andalignment using multitask cascaded convolutional networks[C]//IEEE Signal Processing Letters.2016:1499-1503.
[32]CARREIRA J,ZISSERMAN A.Quo Vadis,action recognition? a new model and the kinetics dataset[C]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.
[33]TRAN D,WANG H,TORRESANI L,et al.A closer look atspatio-temporal convolutions for action recognition[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2018:6450-6459.
[34]HARA K,KATAOKA H,SATOH Y.Learning spatio-temporal features with 3d residual networks for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2017:3154-3160.
[35]LI Y,YANG X,SUN P,et al.Celeb-DF:A Large-Scale Chal-lenging Dataset for DeepFake Forensics[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2020.
[1] 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥.
视频理解中的动作质量评估方法综述
Survey on Action Quality Assessment Methods in Video Understanding
计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028
[2] 武霖, 孙静宇.
多分支RA胶囊网络及在图像分类中的应用
Multi-branch RA Capsule Network and Its Application in Image Classification
计算机科学, 2022, 49(6): 224-230. https://doi.org/10.11896/jsjkx.210400087
[3] 吴子斌, 闫巧.
基于动量的映射式梯度下降算法
Projected Gradient Descent Algorithm with Momentum
计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
[4] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[5] 宗迪迪, 谢益武.
基于法线迭代的模型中轴生成方法
Model Medial Axis Generation Method Based on Normal Iteration
计算机科学, 2022, 49(6A): 764-770. https://doi.org/10.11896/jsjkx.210400050
[6] 魏勤, 李瑛娇, 娄平, 严俊伟, 胡辑伟.
基于边云协同的人脸识别方法研究
Face Recognition Method Based on Edge-Cloud Collaboration
计算机科学, 2022, 49(5): 71-77. https://doi.org/10.11896/jsjkx.210300222
[7] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[8] 瞿中, 陈雯.
基于空洞卷积和多特征融合的混凝土路面裂缝检测
Concrete Pavement Crack Detection Based on Dilated Convolution and Multi-features Fusion
计算机科学, 2022, 49(3): 192-196. https://doi.org/10.11896/jsjkx.210100164
[9] 左杰格, 柳晓鸣, 蔡兵.
基于图像分块与特征融合的户外图像天气识别
Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion
计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263
[10] 冷佳旭, 谭明圮, 胡波, 高新波.
基于隐式视角转换的视频异常检测
Video Anomaly Detection Based on Implicit View Transformation
计算机科学, 2022, 49(2): 142-148. https://doi.org/10.11896/jsjkx.210900266
[11] 温啸林, 李长林, 张馨艺, 刘尚松, 朱敏.
基于DPoS共识机制的区块链社区演化的可视分析方法
Visual Analysis Method of Blockchain Community Evolution Based on DPoS Consensus Mechanism
计算机科学, 2022, 49(1): 328-335. https://doi.org/10.11896/jsjkx.201200118
[12] 张倩, 肖丽.
基于流线的流场可视化绘制方法综述
Review of Visualization Drawing Methods of Flow Field Based on Streamlines
计算机科学, 2021, 48(12): 1-7. https://doi.org/10.11896/jsjkx.201200108
[13] 刘遵雄, 朱成佳, 黄稷, 蔡体健.
多跳连接残差注意网络的图像超分辨率重建
Image Super-resolution by Residual Attention Network with Multi-skip Connection
计算机科学, 2021, 48(11): 258-267. https://doi.org/10.11896/jsjkx.201000033
[14] 刘彦, 秦品乐, 曾建朝.
基于YOLOv3与分层数据关联的多目标跟踪算法
Multi-object Tracking Algorithm Based on YOLOv3 and Hierarchical Data Association
计算机科学, 2021, 48(11A): 370-375. https://doi.org/10.11896/jsjkx.201000115
[15] 栾晓, 李晓双.
基于多特征融合的人脸活体检测算法
Face Anti-spoofing Algorithm Based on Multi-feature Fusion
计算机科学, 2021, 48(11A): 409-415. https://doi.org/10.11896/jsjkx.210100181
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!