一种基于Bottleneck Transformer的轻量级微表情识别架构

doi:10.11896/jsjkx.210500023

计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 370-377.doi: 10.11896/jsjkx.210500023

• 图像处理&多媒体技术 • 上一篇下一篇

一种基于Bottleneck Transformer的轻量级微表情识别架构

张嘉淏¹, 刘峰^2,3,4, 齐佳音⁴

1 华东师范大学计算机科学与技术学院上海 200062
2 华东师范大学上海智能教育研究院上海 201620
3 华东师范大学心理与认知科学学院上海市心理健康与危机干预重点实验室上海 200062
4 上海对外经贸大学人工智能与变革管理研究院上海 201620

出版日期:2022-06-10 发布日期:2022-06-08
通讯作者: 刘峰(lsttoy@163.com)
作者简介:(zjh20000218@163.com)
基金资助:
德科学中心项目“中国与德国的数字化转型:应对老龄社会的战略、结构与方案”(GZ1507);上海市科技计划项目(20dz2260300);中央高校基本科研业务费专项资金

Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer

ZHANG Jia-hao¹, LIU Feng^2,3,4, QI Jia-yin⁴

1 School of Computer Science and Technology,East China Normal University,Shanghai 200062,China
2 Shanghai Institute of Intelligent Education,East China Normal University,Shanghai 200062,China
3 Shanghai Key Laboratory of Mental Health and Psychological Crisis Intervention,Other Institutes,School of Psychology and Cognitive Science,East China Normal University,Shanghai 200062,China
4 Institute of Artificial Intelligence and Change Management,Shanghai University of International Business and Economics,Shanghai 201620,China

Online:2022-06-10 Published:2022-06-08
About author:ZHANG Jia-hao,born in 2000,undergraduate,is a student member of the China Computer Federation.His main reasearch interests include affective computing,computer vision and deep learning.
LIU Feng,born in 1988,Ph.Dcandidate,engineer,is a senior member of China Computer Federation.His main research interests include deep lear-ning,cognitive science and blockchain technology.
Supported by:
Digital Transformation in China and Germany:Strategies,Structures and Solutions for Ageing Societies(GZ1570),Research Project of Shanghai Science and Technology Commission(20dz2260300) and Fundamental Research Funds for the Central Universities.

摘要/Abstract

摘要： 微表情是一种能够体现人真实情感的自发面部动作,其持续时间较短,动作幅度轻微,识别难度较大,但是有重要的研究价值。为解决微表情情感识别问题,提出了一种新型的轻量级微表情识别网络mini-AORCNN。该神经网络以顶点-起始点光流特征为输入,结合残差卷积神经网络与视觉Transformer的相关架构,可以有效完成微表情识别任务。这一网络包含一种参数量更小的新型残差模块,并用自注意力算子替换了最后一个残差块中的卷积算子,从而实现了Bottleneck Transformer架构。这一新型微表情识别网络在中科院CASME系列数据集上经过“留一被试交叉验证”(LOSO)的检验,确定其在情感分类任务上取得了73.09%的平均召回率(UAR)以及72.25%的平均F1-Score(UF1),上述准确率评价指标与极低的参数量(39 185)在与微表情领域的多种主流模型的比较中体现出了明显的优势。文中还包含了一组消融实验,确保了光学应变强度、自注意力机制和相对位置编码等设计的优越性。

关键词: 残差卷积神经网络, 可计算情感, 视觉Transformer, 微表情识别, 自注意力机制

Abstract: Micro-expressions are spontaneous facial movements at a marginal spatiotemporal scale,which reveal one's true fee-lings.Its duration is short,the amplitude of the movement is slight,and it is difficult to recognize,but it has important research value.In order to solve the micro-expression recognition problem,a novel extremely lightweight micro-expression recognition neural architecture is proposed.The neural network which takes apex-onset optical-flow features as the input and integrates approaches in residual convolutional networks and visual Transformers,could effectively solve the micro-expression sentiment classification problem.This architecture containsnovel parameter-saving residual blocks,and a bottleneck Transformer block which replace the convolution operators in residual blocks with self-attention mechanism.The model evaluation experiments are conducted with a LOSO cross-validation strategy on a combined database con-sists of the 3 CASME datasets.With obviously fewer total parameters(39 685),the model achieves an average recall of 73.09% and an average F1-Score of 72.25%,exceeding those mainstream architectures in this domain.A series ablation experiments are also conducted to ensure the superiority of the optical strain strength,self-attention mechanism and relativeposition encoding.

Key words: Computational affection, Micro-expression recognition, Residual convolutional neural network, Self-attention mechanism, Visual Transformer

中图分类号:

TP301.6

张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构[J]. 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023

ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer[J]. Computer Science, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023

参考文献

[1] EKMAN P,FRIESEN W V.Nonverbal leakage and clues to deception[J].Psychiatry,1969,32(1):88-106.
[2] O'SULLIVAN M,FRANK M G,HURLEY C M,et al.Policelie detection accuracy:The effect of lie scenario[J].Law and Human Behavior,2009,33(6):530.
[3] SEIDENSTAT P,SPLANE F X.Protecting airline passengers in the age of terrorism[M].ABC-CLIO,2009.
[4] YAN W J,WANG S J,LIU Y J,et al.For micro-expression re-cognition:Database and suggestions[J].Neurocomputing,2014,136:82-87.
[5] ZHANG M,FU Q,CHEN Y H,et al.Emotional context influences micro-expression recognition[J].PloS One,2014,9(4):e95018.
[6] LIONG S T,SEE J,WONG K S,et al.Less is more:Micro-expression recognition from video using apex frame[J].Signal Processing:Image Communication,2018,62:82-92.
[7] MERGHANI W,DAVISON A K,YAP M H.A review on facial micro-expressions analysis:datasets,features and metrics[J].arXiv:1805.02397,2018.
[8] POLIKOVSKY S,KAMEDA Y,OHTA Y.Facial micro-expressions recognition using high speed camera and 3D-gradient descriptor[C]//3rd International Conference on Imaging for Crime Detection and Prevention(ICDP 2009).IET,2009:1-6.
[9] ESSA I A,PENTLAND A P.Coding,analysis,interpretation,and recognition of facial expressions[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(7):757-763.
[10] YAP M H,SEE J,HONG X,et al.Facial micro-expressionsgrand challenge 2018 summary[C]//2018 13th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2018).IEEE,2018:675-678.
[11] SEE J,YAP M H,LI J,et al.Megc 2019-the second facial micro-expressions grand challenge[C]//2019 14th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2019).IEEE,2019:1-5.
[12] LI J,WANG S J,YAP M H,et al.MEGC2020-The Third Facial Micro-Expression Grand Challenge[C]//2020 15th IEEE International Conference on Automatic Face and Gesture Recognition(FG 2020)(FG).IEEE Computer Society,2020:234-237.
[13] HUANG X,ZHAO G,HONG X,et al.Spontaneous facial micro-expression analysis using spatiotemporal completed local quantized patterns[J].Neurocomputing,2016,175:564-578.
[14] LO L,XIE H X,SHUAI H H,et al.MER-GCN:Micro-Expression Recognition Based on Relation Modeling with Graph Con-volutional Networks[C]//2020 IEEE Conference on Multimedia Information Processing and Retrieval(MIPR).IEEE,2020:79-84.
[15] YAN W J,WU Q,LIU Y J,et al.CASME database:a dataset of spontaneous micro-expressions collected from neutralized faces[C]//2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition(FG).IEEE,2013:1-7.
[16] YAN W J,LI X,WANG S J,et al.CASME II:An improvedspontaneous micro-expression database and the baseline evaluation[J].PloS One,2014,9(1).
[17] QU F,WANG S J,YAN W J,et al.CAS(ME):A Database for Spontaneous Macro-Expression and Micro-Expression Spotting and Recognition[J].IEEE Transactions on Affective Computing,2017,9(4):424-436.
[18] DAVISON A K,LANSLEY C,COSTEN N,et al.Samm:Aspontaneous micro-facial movement dataset[J].IEEE Transactions on Affective Computing,2016,9(1):116-129.
[19] LI X,PFISTER T,HUANG X,et al.A spontaneous micro-expression database:Inducement,collection and baseline[C]//2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition(FG).IEEE,2013:1-6.
[20] COOTES T F,TAYLOR C J,COOPER D H,et al.Active shape models-their training and application[J].Computer Vision and Image Understanding,1995,61(1):38-59.
[21] KASINSKI A,SCHMIDT A.The architecture and performance of the face and eyes detection system based on the Haar cascade classifiers[J].Pattern Analysis and Applications,2010,13(2):197-211.
[22] PENG M,WU Z,ZHANG Z,et al.From macro to micro expression recognition:Deep learning on small datasets using transfer learning[C]//2018 13th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2018).IEEE,2018:657-661.
[23] MERGHANI W,DAVISON A,YAP M.Facial Micro-expres-sions Grand Challenge 2018:evaluating spatio-temporal features for classification of objective classes[C]//2018 13th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2018).IEEE,2018:662-666.
[24] KHOR H Q,SEE J,PHAN R C W,et al.Enriched long-term recurrent convolutional network for facial micro-expression recognition[C]//2018 13th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2018).IEEE,2018:667-674.
[25] LIU Y,DU H,ZHENG L,et al.A neural micro-expressionrecognizer[C]//2019 14th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2019).IEEE,2019:1-4.
[26] GAN Y S,LIONG S T,YAU W C,et al.Off-apexnet on micro-expression recognition system[J].Signal Processing:Image Communication,2019,74:129-139.
[27] ZHOU L,MAO Q,XUE L.Dual-inception network for cross-database micro-expression recognition[C]//2019 14th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2019).IEEE,2019:1-5.
[28] LIONG S T,GAN Y S,SEE J,et al.Shallow triple stream three-dimensional cnn(ststnet) for micro-expression recognition[C]//2019 14th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2019).IEEE,2019:1-5.
[29] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[30] WANG C,PENG M,BI T,et al.Micro-attention for micro-expression recognition[J].Neurocomputing,2020,410:354-362.
[31] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].arXiv:1706.03762,2017.
[32] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[33] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[34] CHEN M,RADFORD A,CHILD R,et al.Generative pretrai-ning from pixels[C]//International Conference on Machine Learning.PMLR,2020:1691-1703.
[35] SRINIVAS A,LIN T Y,PARMAR N,et al.Bottleneck transformers for visual recognition[J].arXiv:2101.11605,2021.
[36] LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[J].arXiv:2103.14030,2021.
[37] Electronic spatial sensing for the blind:contributions from perception,rehabilitation,and computer vision[M].Berlin:Springer,Springer Science & Business Media,2013.
[38] PÉREZ J S,MEINHARDT-LLOPIS E,FACCIOLO G.TV-L1 optical flow estimation[J].Image Processing on Line,2013,2013:137-150.

相关文章 12

[1]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[2]	赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀. 基于BERT-GRU-ATT模型的中文实体关系分类 Chinese Entity Relations Classification Based on BERT-GRU-ATT 计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123
[3]	李星燃, 张立言, 姚树婧. 结合特征融合和注意力机制的微表情识别方法 Micro-expression Recognition Method Combining Feature Fusion and Attention Mechanism 计算机科学, 2022, 49(2): 4-11. https://doi.org/10.11896/jsjkx.210900028
[4]	胡艳丽, 童谭骞, 张啸宇, 彭娟. 融入自注意力机制的深度学习情感分析方法 Self-attention-based BGRU and CNN for Sentiment Analysis 计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063
[5]	徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳. 基于多级特征和全局上下文的纵膈淋巴结分割算法 Mediastinal Lymph Node Segmentation Algorithm Based on Multi-level Features and Global Context 计算机科学, 2021, 48(6A): 95-100. https://doi.org/10.11896/jsjkx.200700067
[6]	王习, 张凯, 李军辉, 孔芳, 张熠天. 联合自注意力和循环网络的图像标题生成 Generation of Image Caption of Joint Self-attention and Recurrent Neural Network 计算机科学, 2021, 48(4): 157-163. https://doi.org/10.11896/jsjkx.200300146
[7]	周小诗, 张梓葳, 文娟. 基于神经网络机器翻译的自然语言信息隐藏 Natural Language Steganography Based on Neural Machine Translation 计算机科学, 2021, 48(11A): 557-564. https://doi.org/10.11896/jsjkx.210100015
[8]	梁正友, 何景琳, 孙宇. 一种用于微表情自动识别的三维卷积神经网络进化方法 Three-dimensional Convolutional Neural Network Evolution Method for Facial Micro-expression Auto-recognition 计算机科学, 2020, 47(8): 227-232. https://doi.org/10.11896/jsjkx.190700009
[9]	张鹏飞, 李冠宇, 贾彩燕. 面向自然语言推理的基于截断高斯距离的自注意力机制 Truncated Gaussian Distance-based Self-attention Mechanism for Natural Language Inference 计算机科学, 2020, 47(4): 178-183. https://doi.org/10.11896/jsjkx.190600149
[10]	康雁,崔国荣,李浩,杨其越,李晋源,王沛尧. 融合自注意力机制和多路金字塔卷积的软件需求聚类算法 Software Requirements Clustering Algorithm Based on Self-attention Mechanism and Multi- channel Pyramid Convolution 计算机科学, 2020, 47(3): 48-53. https://doi.org/10.11896/jsjkx.190700146
[11]	张义杰, 李培峰, 朱巧明. 基于自注意力机制的事件时序关系分类方法 Event Temporal Relation Classification Method Based on Self-attention Mechanism 计算机科学, 2019, 46(8): 244-248. https://doi.org/10.11896/j.issn.1002-137X.2019.08.040
[12]	凡子威, 张民, 李正华. 基于BiLSTM并结合自注意力机制和句法信息的隐式篇章关系分类 BiLSTM-based Implicit Discourse Relation Classification Combining Self-attention Mechanism and Syntactic Information 计算机科学, 2019, 46(5): 214-220. https://doi.org/10.11896/j.issn.1002-137X.2019.05.033

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

一种基于Bottleneck Transformer的轻量级微表情识别架构

Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 12

Metrics

本文评价

推荐阅读 0