融合三维人脸动态信息和光流信息的人脸表情识别

doi:10.11896/jsjkx.230700210

计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230700210-7.doi: 10.11896/jsjkx.230700210

• 图像处理&多媒体技术 • 上一篇下一篇

融合三维人脸动态信息和光流信息的人脸表情识别

张华忠, 潘曰凯, 涂晓光, 刘建华, 许罗鹏, 周超

中国民用航空飞行学院航空电子电气学院四川广汉 618300

发布日期:2024-06-06
通讯作者: 张华忠(zhz_233@yeah.net)
基金资助:
中国博士后科学基金(2022M722248);中央高校基本科研业务费(J2023-026,ZHMH2022-004);民航飞行技术与飞行安全重点实验室开放项目资助(FZ2022KF06);民航飞行技术与飞行安全重点实验室自主项目(FZ2021ZZ03)

Facial Expression Recognition Integrating 3D Facial Dynamic Information and Optical Flow Information

ZHANG Huazhong, PAN Yuekai, TU Xiaoguang, LIU Jianhua, XU Luopeng, ZHOU Chao

Institute of Electronic and Electrical Engineering,Civil Aviation Flight University of China,Guanghan,Sichuan 618300,China

Published:2024-06-06
About author:ZHANG Huazhong,born in 1989,asso-ciate professor,master’s supervisor.His main research interests include flying qualities monitoring,artificial intelligence and image processing.
Supported by:
China Postdoctoral Science Foundation(2022M722248),Project of Basic Scientific Research of Central Universities of China(J2023-026,ZHMH2022-004),Open Fund of Key Laboratory of Flight Techniques and Flight Safety,CAAC(FZ2022KF06) and Fund of Key Laboratory of Flight Techniques and Flight Safety,CAAC(FZ2021ZZ03).

摘要/Abstract

摘要： 人脸表情识别在静态图像上取得了卓越的成效,但这些方法应用于视频或图像序列时,准确度和鲁棒性往往会受到影响。传统的方法通常无法基于空间信息和光流信息进行人脸表情的识别,然而这些辅助识别信息都是二维信息,没有考虑到人脸的表情变化是一种三维的变化过程。为充分挖掘人脸表情识别的深层语义信息,提出了一种基于三维人脸动态信息和光流信息相结合的融合表情识别方法。该方法构建基于人脸深度图像、光流图像和RGB图像的多流卷积神经网络,通过融合3种模态的信息进行人脸表情识别。所提方法在CAER,RAVDESS数据集上进行了充分验证,实验结果表明,其在表情识别性能上优于目前的主流方法,证明了其有效性。

关键词: 表情识别, 多流卷积神经网络, 三维人脸动态信息, 光流信息

Abstract: Facial expression recognition has achieved excellent results in static images,but when these methods are applied to vi-deos or image sequences,their accuracy and robustness are often affected.Traditional methods cannot usually recognize facial expressions based on spatial information and optical flow information.However,these auxiliary recognition information are all two-dimensional information,without considering that facial expression changes are a three-dimensional change process.In order to fully mine the deep semantic information of facial expression recognition,this paper proposes a fusion expression recognition method based on the combination of 3D facial dynamic information and optical flow information.This method constructs a multi stream convolutional neural network based on facial depth images,optical flow images,and RGB images,and integrates information from three modalities for facial expression recognition.The proposed method has been fully validated on CAER and RAVDESS datasets,and experimental results show that it outperforms current mainstream methods in facial expression recognition performance,which proves its effectiveness.

Key words: Facial expression recognition, Multi-stream convolutional neural network, 3D facial dynamic information, Optical flow information

中图分类号:

TP391.41

张华忠, 潘曰凯, 涂晓光, 刘建华, 许罗鹏, 周超. 融合三维人脸动态信息和光流信息的人脸表情识别[J]. 计算机科学, 2024, 51(6A): 230700210-7. https://doi.org/10.11896/jsjkx.230700210

ZHANG Huazhong, PAN Yuekai, TU Xiaoguang, LIU Jianhua, XU Luopeng, ZHOU Chao. Facial Expression Recognition Integrating 3D Facial Dynamic Information and Optical Flow Information[J]. Computer Science, 2024, 51(6A): 230700210-7. https://doi.org/10.11896/jsjkx.230700210

参考文献

[1]XU F,ZHANG J,WANG J Z.Microexpression identificationand categorization using a facial dynamics map[J].IEEE Transa-ctions on Affective Computing,2017,8(2):254-267.
[2]MA H Y,AN G Y,RUAN Q Q.Micro expression recognition described by the average optical flow direction histogram[J].Journal of Signal Processing,2018,34(3):279-288.
[3]WANG Y,WANG F,JIA H R,et al.Microexpression recognition combined with facial key points and optical flow features[J].Laser Journal,2023,44(5):72-77.
[4]SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[J].Advances in Neural Information Processing Systems,2014,27.568-576.
[5]FERNANDO B,GOULD S.Learning end-to-end video classification with rank-pooling[C]//International Conference on Machine Learning.PMLR,2016:1187-1196.
[6]ZOLFAGHARI M,SINGH K,BROXT.Eco:Efficient convolutional network for online video understanding[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:695-712.
[7]WANG L,XIONG Y,WANG Z,et al.Towards good practices for very deep two-stream convnets[J].arXiv:1507.02159,2015.
[8]AGHAMALEKI J A,ASHKANI CHENARLOGH V.Multi-stream CNN for facial expression recognition in limited training data[J].Multimedia Tools and Applications,2019,78(16):22861-22882.
[9]ZHU X,LIU X,LEI Z,et al.Face alignment in full pose range:A 3d total solution[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,41(1):78-92.
[10]LEE J,KIM S,KIM S,et al.Context-aware emotion recognition networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:10143-10152.
[11]ZHANG W,ZHANG Y,MA L,et al.Multimodal learning forfacial expression recognition[J].Pattern Recognition,2015,48(10):3191-3202.
[12]FENG D,REN F.Dynamic Facial Expression Recognition based on Two-Stream-CNN with LBP-TOP[C]//2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems(CCIS).IEEE,2018.
[13]ZHOU P,HAN X,MORARIU V I,et al.Two-stream neural networks for tampered face detection[C]//2017 IEEE Confe-rence on Computer Vision and Pattern Recognition Workshops(CVPRW).IEEE,2017:1831-1839.
[14]LU B,ZHOU J,WANG Q,et al.Fusion-based color and depth image segmentation method for rocks on conveyor belt[J].Mi-nerals Engineering,2023,199:108107.
[15]XING H,YANG J,XIAO Y.Learning dynamic relationship between joints for 3D hand pose estimation from single depth map[J].Journal of Visual Communication and Image Representation,2023,92:103803.
[16]JIANG H,ZHANG Q,NIE Y,et al.Learning Multi-Scale Deep Image Prior for High-Quality Unsupervised Image Denoising[J].Computer Graphics Forum.2022,41(7):323-334.
[17]NIU W,ZHAO Y,YU Z,et al.Research on a face recognition algorithm based on 3D face data and 2D face image matching[J].Journal of Visual Communication and Image Representation,2023,91:103757.
[18]LIVINGSTONE S R,RUSSO F A.The Ryerson Audio-VisualDatabase of Emotional Speech and Song(RAVDESS):A dyna-mic,multimodal set of facial and vocal expressions in North American English[J].PloS one,2018,13(5):e0196391.
[19]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[20]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,20124.
[21]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[22]LUNA-JIMÉNEZ C,GRIOL D,CALLEJAS Z,et al.Multimodal emotion recognition on ravdess dataset using transfer learning[J].Sensors,2021,21(22):7665.
[23]LUNA-JIMÉNEZ C,KLEINLEIN R,GRIOL D,et al.A pro-posal for multimodal emotion recognition using aural transfor-mers and action units on RAVDESS dataset[J].Applied Sciences,2021,12(1):327.
[24]KANANI C S,GILL K S,BEHERAS,et al.Shallow over Deep Neural Networks:A Empirical Analysis for Human Emotion Classification Using Audio Data[C]//5th International Conference on Internet of Things and Connected Technologies(ICIoTCT).2020.Cham:Springer International Publishing,2021:134-146.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

融合三维人脸动态信息和光流信息的人脸表情识别

Facial Expression Recognition Integrating 3D Facial Dynamic Information and Optical Flow Information

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0