计算机科学 ›› 2024, Vol. 51 ›› Issue (10): 295-301.doi: 10.11896/jsjkx.230900094

• 计算机图形学&多媒体 • 上一篇    下一篇

基于类注意力的眼睛凝视估计网络

徐金龙1,3, 董明瑞1,2, 李颖颖1,3, 刘艳青1, 韩林1   

  1. 1 国家超级计算郑州中心 郑州 450000
    2 郑州大学计算机与人工智能学院 郑州 450000
    3 信息工程大学 郑州 450000
  • 收稿日期:2023-09-18 修回日期:2024-01-16 出版日期:2024-10-15 发布日期:2024-10-11
  • 通讯作者: 韩林(hanlin@zzu.edu.cn)
  • 作者简介:(longkaizh@163.com)
  • 基金资助:
    2022年河南省重大科技专项(221100210600);22求是科研启动(自)(32213247);2023年度河南省科技攻关专项(232102210185)

Eye Gaze Estimation Network Based on Class Attention

XU Jinlong1,3, DONG Mingrui1,2, LI Yingying1,3, LIU Yanqing1, HAN Lin1   

  1. 1 National Supercomputing Center in Zhengzhou,Zhengzhou 450000,China
    2 School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,China
    3 Information Engineering University,Zhengzhou 450000,China
  • Received:2023-09-18 Revised:2024-01-16 Online:2024-10-15 Published:2024-10-11
  • About author:XU Jinlong,born in 1985,Ph.D,master'ssupervisor.His main research interests include high performance computing and parallel compilation.
    HAN Lin,born in 1978,Ph.D,associate professor,is a senior member of CCF(No.16416M).His main research interests include compiler optimization and high performance computing.
  • Supported by:
    2022 Henan Province Major Science and Technology Special Project(2211002110600),22 Qiushi Research Initiation(Natural Science)(32213247) and 2023 Henan Province Science and Technology Research Special Project(232102210185).

摘要: 近年来,眼睛凝视估计引起广泛关注。基于RGB外观的凝视估计方法使用普通摄像机和深度学习来进行凝视估计,避免了像商用眼动仪一样使用昂贵的红外设备,为更准确和成本更低的眼睛凝视估计提供了可能。然而,RGB外观图像中包含如光照强度、肤色等多种与凝视无关的特征,这些无关特征会在深度学习回归的过程中产生干扰,进而影响凝视估计的精度。针对以上问题,提出了一种名为类注意力网络(CA-Net)的新架构,它包含通道、尺度、眼睛3种不同的类注意力模块,通过这些类注意力模块可以提取和融合不同种类的注意力编码,从而降低与凝视无关特征所占的权重。在GazeCapture数据集上的大量实验表明,在基于RGB外观的凝视估计方法中,相比现有的最先进方法,CA-Net在手机和平板上分别能够提高约0.6%和7.4%的凝视估计精度。

关键词: 类注意力, 轻压缩激励, 自注意力, 多尺度, 眼睛凝视估计

Abstract: In recent years,eye gaze estimation has attracted widespread attention.The gaze estimation method based on RGB appearance uses ordinary cameras and deep learning for gaze estimation,avoiding the use of expensive infrared devices like commercial eye trackers,providing the possibility for more accurate and cost-effective eye gaze estimation.However,due to the presence of various features unrelated to gaze,such as lighting intensity and skin color,in RGB appearance images,these irrelevant features can cause interference in the deep learning regression process,thereby affecting the accuracy of gaze estimation.In response to the above issues,this paper proposes a new architecture called class attention network(CA-Net),which includes three different class attention modules:channel,scale,and eye.Through these class attention modules,different types of attention encoding can be extracted and fused,thereby reducing the weight of gaze independent features.Extensive experiments on the GazeCapture dataset show that,compared to the state-of-the-art method,CA-Net can improve gaze estimation accuracy by approximately 0.6% and 7.4% on mobile phones and tablets,respectively,in RGB based gaze estimation methods.

Key words: Class attention, Light squeeze-and-excitation, Self-attention, Multiscale, Eye gaze estimation

中图分类号: 

  • TP183
[1]GUO B Y,FANG W N.Fatigue detection method based on eye tracker[J].Aerospace Medicine and Medical Engineering,2004(4):256-260.
[2]YAN G L.The Application of Eye Movement Analysis in Advertising Psychology Research[J].Psychological Dynamics,1999(4):50-53.
[3]BORGESTIG M,SANDQVIST J,AHLSTEN G,et al.Gaze-based assistive technology in daily activities in children with severe physical impairments-an intervention study[J].Pediatric Rehabilitation,2017,20(3):129-141.
[4]CHENNAMMA H R,YUAN X H.A survey on eye-gaze tra-cking techniques[J].Indian Journal of Computer Science and Engineering,2013,4(5):388-393.
[5]MURTHY L R D,PRADIPTA B.Appearance-based gaze esti-mation using attention and difference mechanism[C]//Compu-ter Vision and Pattern Recognition Workshops.IEEE Computer Society,2021:3137-3146.
[6]BAO Y,CHENG Y,LIU Y,et al.Adaptive feature fusion network for gaze tracking in mobile tablets[C]//International Conference on Pattern Recognition.The International Association for Pattern Recognition,2021:9936-9943.
[7]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Computer Vision and Pattern Recognition.IEEE Computer So-ciety,2018:7132-7141.
[8]GUESTRIN E D,EIZENMAN M.General theory of remotegaze estimation using the pupil center and corneal reflections[J].IEEE Transactions on Biomedical Engineering,2006,53(6):1124-1133.
[9]SUN L,LIU Z,SUN M T.Real time gaze estimation with a con-sumer depth camera[J].Information Sciences,2015(320):346-360.
[10]DEMENTHON D F,DAVIS L S.Model-based object pose in 25 lines of code[J].International Journal of Computer Vision,1995,15(1/2):123-141.
[11]WANG K,JI Q.Real time eye gaze tracking with 3d deformable eye-face model[C]//International Conference on Computer Vision.IEEE Computer Society,2017:1003-1011.
[12]KRAFKA K,KHOSLA A,KELLNHOFER P,et al.Eye tra-cking for everyone[C]//Computer Vision and Pattern Recognition.IEEE Computer Society,2016:2176-2184.
[13]HE J F,PHAM K,LALLIAPPAN N,et al.On-device few-shot personalization for real-time gaze estimation[C]//International Conference on Computer Vision Workshop.IEEE Computer Society,2019:1149-1158.
[14]GUO T C,LIU Y C,ZHANG H,et al.A generalized and robust method towards practical gaze estimation on smart phone[C]//International Conference on Computer Vision Workshop.IEEE Computer Society,2019:1131-1139.
[15]ATHAVALE R,MOTATI L S,KALAHASTY R.One eye isall you need:lightweight ensembles for gaze estimation with single encoders[J].arXiv:2211.11936,2022.
[16]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[17]SHI W Z,CABALLERO J,HUSZAR F,et al.Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Computer Vision and Pattern Recognition.IEEE Computer Society,2016:1874-1883.
[18]JACOB D,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[C]//North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186.
[19]PORAC C,COREN S.The dominant eye[J].Psychological Bulletin,1976,83(5):880-897.
[20]BAO J,LIU B,YU J.An individual-difference-aware model for cross-person gaze estimation[J].IEEE Transactions on Image Processing,2022,31:3322-3333.
[21]KARTYNNIK Y,ABLAVATSKI A,GRISHCHENKO I,et al.Real-time facial surface geometry from monocular video on mobile gpus[J].arXiv:1907.06724,2019.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!