计算机科学 ›› 2021, Vol. 48 ›› Issue (3): 50-59.doi: 10.11896/jsjkx.210100210

所属专题: 多媒体技术进展

• 多媒体技术进展* 上一篇    下一篇

视频人脸识别进展综述

白子轶, 毛懿荣, 王瑞平   

  1. 中国科学院计算技术研究所智能信息处理重点实验室 北京100190
    中国科学院大学计算机科学与技术学院 北京100049
  • 收稿日期:2020-12-05 修回日期:2021-01-27 出版日期:2021-03-15 发布日期:2021-03-05
  • 通讯作者: 王瑞平(wangruiping@ict.ac.cn)
  • 作者简介:ziyi.bai@vipl.ict.ac.cn
  • 基金资助:
    国家自然科学基金(61922080,U19B2036,61772500)

Survey on Video-based Face Recognition

BAI Zi-yi, MAO Yi-rong , WANG Rui-ping   

  1. Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS),Institute of Computing Technology,Chinese Academyof Sciences,Beijing 100190,China
    School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2020-12-05 Revised:2021-01-27 Online:2021-03-15 Published:2021-03-05
  • About author:BAI Zi-yi,born in 1997,postgraduate.Her main research interests include computer vision and pattern recognition.
    WANG Rui-ping,born in 1981,Ph.D,professor,Ph.D supervisor,is a senior member of China Computer Federation.His main research interests include computer vision and pattern recognition.
  • Supported by:
    National Natural Science Foundation of China (61922080,U19B2036,61772500).

摘要: 人脸识别是生物特征识别领域的一项关键技术,长期以来得到研究者的广泛关注。视频人脸识别任务特指从一段视频中提取出人脸的关键信息,从而完成身份识别。相较于基于图像的人脸识别任务来说,视频数据中的人脸变化模式更为多样且视频帧之间存在较大差异,如何从冗长而复杂的视频中抽取到人脸的关键特征成为当前的研究重点。以视频人脸识别技术为研究对象,首先介绍了该技术的研究价值和存在的挑战;接着对当前研究工作的发展脉络进行了系统的梳理,依据建模方式将传统基于图像集合建模的方法分为线性子空间建模、仿射子空间建模、非线性流形建模、统计建模四大类,同时对深度学习背景下基于图像融合的方法进行了介绍;另外对现有视频人脸识别数据集进行分类整理并简要介绍了常用的评价指标;最后分别采用灰度特征和深度特征在YTC数据集及IJB-A数据集上对代表性工作进行评测。实验结果表明:神经网络可以从大规模数据中提取到鲁棒的视频帧特征,从而带来识别性能的大幅提升,而有效的视频数据建模能够挖掘出人脸潜在的变化模式,从视频序列包含的大量样本中找到更具判别力的关键信息,排除噪声样本的干扰,因此基于视频的人脸识别具有广泛的通用性和实用价值。

关键词: 流形学习, 深度学习, 视频人脸识别, 图像集合建模, 子空间学习

Abstract: Face recognition is a key technology in the field of biometrics,which has been widely concerned by researchers in the past decades.Video-based face recognition task refers specifically to extract the key information of human faces from a video to complete the personal identification.Compared with the image-based face recognition task,the changing patterns of faces in videos are much more diverse,and there are great differences among the whole video frames as well.Current research focuses on how to extract the key features of faces from lengthy videos.Firstly,this paper introduces the research value and challenges of video-based face recognition.Then,the developing venation of the current research work is explored.Based on the video modeling manners,traditional image set based methods are divided into four categories:linear subspace modeling,affine subspace modeling,nonlinear manifold modeling and statistical modeling.Besides,the methods based on image fusion under the background of deep learning are also introduced.This paper also briefly reviews existing datasets for video-based face recognition and the commonly used performance metrics.Finally,gray features and deep features are used to evaluate the representative works on YTC dataset and IJB-A dataset.Experimental results show that deep neural network can extract robust features of each frame after being trained with large-scale data,which greatly improves the performance of video-based face recognition.Moreover,the effective vi-deo modeling can help to identify the potential human face changing patterns.Therefore,more discriminative information can be found from the large number of samples contained in the video sequence,and the inference of noise samples can be eliminated,which suggests the advantages of video-based face recognition to be applied to a large range of practical application scenarios.

Key words: Deep learning, Image set modeling, Manifold learning, Subspace learning, Video-based face recognition

中图分类号: 

  • TP391
[1]CHEN S,MAU S,HARANDI M T,et al.Face recognition from still images to video sequences:a local-feature-based framework[J].Journal on Image and Video Processing,2011,2011(1):1-14.
[2]LI Z,ZHANG J,ZHANG K,et al.Visual tracking with weighted adaptive local sparse appearance model via spatio-temporal context learning[J].IEEE Transactions on Image Processing,2018,27(9):4478-4489.
[3]SIROVICH L,KIRBY M.Low-dimensional procedure for thecharacterization of human faces[J].Josa A,1987,4(3):519-524.
[4]OJALA T,PIETIKAINEN M,MAENPAA T.Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2002,24(7):971-987.
[5]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[6]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2005:886-893.
[7]KIM T K,KITTLER J,CIPOLLA R.Discriminative learning and recognition of image set classes using canonical correlations[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(6):1005-1018.
[8]HAMM J,LEE D D.Grassmann discriminant analysis:a uni-fying view on subspace-based learning[C]//Proceedings of the 25thInternational Conference on Machine Learning.2008:376-383.
[9]HARANDI M T,SALZMANN M,JAYASUMANA S,et al.Expanding the family of grassmannian kernels:An embedding perspective[C]//European Conference on Computer Vision.Springer,Cham,2014:408-423.
[10]HARANDI M T,SANDERSON C,SHIRAZI S,et al.Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2011:2705-2712.
[11]HUANG Z,WANG R,SHAN S,et al.Projection metric lear-ning on Grassmann manifold with application to video based face recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern recognition.2015:140-149.
[12]CEVIKALP H,TRIGGS B.Face recognition based on imagesets[C]//2010 IEEE Computer Society Conference on Compu-ter Vision and Pattern Recognition.IEEE,2010:2567-2573.
[13]HU Y,MIAN A S,OWENS R.Sparse approximated nearest points for image set classification[C]//2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2011:121-128.
[14]YANG M,ZHU P,VAN GOOL L,et al.Face recognition based on regularized nearest points between image sets[C]//2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).IEEE,2013:1-7.
[15]ZHU P,ZHANG L,ZUO W,et al.From point to set:Extend the learning of distance metrics[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:2664-2671.
[16]WANG R,SHAN S,CHEN X,et al.Manifold-manifold distance with application to face recognition based on image set[C]//2008 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2008:1-8.
[17]WANG R,CHEN X.Manifold discriminant analysis[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:429-436.
[18]CUI Z,SHAN S,ZHANG H,et al.Image sets alignment for video-based face recognition[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2012:2626-2633.
[19]CHEN S,SANDERSON C,HARANDI M T,et al.Improved image set classification via joint sparse approximated nearest subspaces[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2013:452-459.
[20]SHANKHNAROVICH G,FISHER J W,DARRELL T.Facerecognition from long-term observations[C]//European Confe-rence on Computer Vision.Berlin,Heidelberg:Springer,2002:851-865.
[21]WANG W,WANG R,HUANG Z,et al.Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:2048-2057.
[22]WANG R,GUO H,DAVIS L S,et al.Covariance discriminative learning:A natural and efficient approach to image set classification[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2012:2496-2503.
[23]WANG W,WANG R,SHANS,et al.Discriminative covariance oriented representation learning for face recognition with image sets[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5599-5608.
[24]HUANG Z,WANG R,SHAN S,et al.Log-euclidean metriclearning on symmetric positive definite manifold with application to image set classification[C]//International Conference on Machine Learning.2015:720-729.
[25]HASSNER T,MASI I,KIM J,et al.Pooling faces:Template based face recognition with pooled face images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2016:59-67.
[26]RAO Y,LIN J,LU J,et al.Learning discriminative aggregation network for video-based face recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:3781-3790.
[27]SHI Y,JAIN A K.Probabilistic face embeddings[C]//Procee-dings of the IEEE International Conference on Computer Vision.2019:6902-6911.
[28]LIU Y,YAN J,OUYANG W.Quality aware network for set to set recognition[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:5790-5799.
[29]YANG J,REN P,ZHANG D,et al.Neural aggregation network for video face recognition[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:4362-4371.
[30]ZHANG M,SONG G,ZHOU H,et al.Discriminability distillation in group representation learning[C]//European Confe-rence on Computer Vision.Springer,Cham,2020:1-19.
[31]ZHONG Y,ARANDJELOVIC R,ZISSERMAN A.GhostVLAD for set-based face recognition[C]//Asian Conference on Computer Vision.Springer,Cham,2018:35-50.
[32]ARANDJELOVIC R,GRONAT P,TROII A,et al.NetVLAD:CNN architecture for weakly supervised place recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:5297-5307.
[33]LIU X,VIJAYA K B V K,YANG C,et al.Dependency-aware attention control for unconstrained face recognition with image sets[C]//Proceedings of the European Conference on Computer Vision.2018:548-565.
[34]XIE W,SHEN L,ZISSERMAN A.Comparator networks[C]//Proceedings of the European Conference on Computer Vision.2018:782-797.
[35]GONG S,SHI Y,KALKA N D,et al.Video face recognition:Component-wise feature aggregation network (c-fan)[C]//2019 International Conference on Biometrics.IEEE,2019:1-8.
[36]LIU X,GUO Z,LI S,et al.Permutation-invariant feature re-structuring for correlation-aware image set-based recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:4986-4996.
[37]LEE K C,HO J,YANG M H,et al.Video-based face recognition using probabilistic appearance manifolds[C]//2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Proceedings.IEEE,2003:I/313-I/320.
[38]LEE K C,HO J,YANG M H,et al.Visual tracking and recognition using probabilistic appearance manifolds[J].Computer Vision and Image Understanding,2005,99(3):303-331.
[39]MESSER K,MATAS J,KITTLER J,et al.XM2VTSDB:The extended M2VTS database[C]//Second International Confe-rence on Audio and Video-based Biometric Person Authentication.1999:965-966.
[40]FATHY M E,PATEL V M,CHELLAPPA R.Face-based active authentication on mobile devices[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2015:1687-1691.
[41]GOH R,LIU L,LIU X,et al.The CMU face in action (FIA) database[C]//InternationalWorkshop on Analysis and Modeling of Faces and Gestures.Berlin,Heidelberg:Springer,2005:255-263.
[42]WONG Y,CHEN S,MAU S,et al.Patch-based probabilistic ima-ge quality assessment for face selection and improved video-based face recognition[C]//CVPR 2011 WORKSHOPS.IEEE,2011:74-81.
[43]PHILLIPS P J,FLYNN P J,BEVERIDGE J R,et al.Overview of the multiple biometrics grand challenge[C]//International Conference on Biometrics.Berlin,Heidelberg:Springer,2009:705-714.
[44]HUANG Z,SHAN S,WANG R,et al.A benchmark and comparative study of video-based face recognition on cox face database[J].IEEE Transactions on Image Processing,2015,24(12):5967-5981.
[45]BEVERIDGE J R,PHILLIPS P J,BOLME D S,et al.The challenge of face recognition from digital point-and-shoot cameras[C]//2013 IEEE Sixth International Conference on Biometrics:Theory,Applications and Systems.IEEE,2013:1-8.
[46]KALKA N D,MAZE B,DUNCAN J A,et al.IJB-S:IARPA Janus surveillance video benchmark[C]//2018 IEEE 9th International Conference on Biometrics Theory,Applications and Systems.IEEE,2018:1-9.
[47]KIM M,KUMAR S,PAVLOVIC V,et al.Face tracking and recognition with visual constraints in real-world videos[C]//2008 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2008:1-8.
[48]WOLF L,HASSNER T,MAOZ I.Face recognition in unconstrained videos with matched background similarity[C]//CVPR 2011.IEEE,2011:529-534.
[49]LIU L,ZHANG L,LIU H,et al.Toward large-population face identification in unconstrained videos[J].IEEE Transactions on Circuits and Systems for Video Technology,2014,24(11):1874-1884.
[50]KLARE B F,KLEIN B,TABORSKY E,et al.Pushing the frontiers of unconstrained face detection and recognition:Iarpa janus benchmark-a[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1931-1939.
[51]WHITELAM C,TABORSKY E,BLANTON A,et al.Iarpa janus benchmark-b face dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2017:90-98.
[52]MAZE B,ADAMS J,CUNCAN J A,et al.Iarpa janus benchmark-c:Face dataset andprotocol[C]//2018 International Conference on Biometrics.IEEE,2018:158-165.
[53]BAMSAL A,NANDURI A,CASTILLO C D,et al.Umdfaces:An annotated face dataset for training deep networks[C]//2017 IEEE International Joint Conference on Biometrics.IEEE,2017:464-473.
[54]BAMSAL A,CASTILLO C,RANJAN R,et al.The do’s anddon’ts for cnn-based face verification[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2017:2545-2554.
[55]LIU Y,PENG B,SHI P,et al.iqiyi-vid:A large dataset for multi-modal person identification[J].arXiv:1811.07548,2018.
[56]ZHANG K,ZHANG Z,LI Z,et al.Joint face detection andalignment using multitask cascaded convolutional networks[J].IEEE Signal Processing Letters,2016,23(10):1499-1503.
[57]CAO Q,SHEN L,XIE W,et al.Vggface2:A dataset for recognising faces across pose and age[C]//2018 13th IEEE International Conference on Automatic Face & Gesture Recognition.IEEE,2018:67-74.
[58]YI D,LEI Z,LIAO S,et al.Learning face representation from scratch[J].arXiv:1411.7923,2014.
[59]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[J].arXiv:1502.03167,2015.
[60]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[9] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[10] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[11] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[12] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[13] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[14] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15] 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩.
基于Transformer和LSTM的药物相互作用预测
Drug-Drug Interaction Prediction Based on Transformer and LSTM
计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!