计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 197-204.doi: 10.11896/jsjkx.200600033

• 计算机图形学&多媒体 • 上一篇    下一篇

基于GPU的特征脸算法优化研究

李繁1, 严星2, 张晓宇1   

  1. 1 新疆财经大学网络与实验教学中心 乌鲁木齐830012
    2 新疆财经大学信息管理学院 乌鲁木齐830012
  • 收稿日期:2020-06-24 修回日期:2020-08-28 出版日期:2021-04-15 发布日期:2021-04-09
  • 通讯作者: 李繁(lifanxj@163.com)
  • 基金资助:
    国家自然科学基金(41830101);新疆社科基金(17BTQ093);新疆财经大学青年博士基金(2015BS003)

Optimization of GPU-based Eigenface Algorithm

LI Fan1, YAN Xing2, ZHANG Xiao-yu1   

  1. 1 Network & Experimental Teaching Center,Xinjiang University of Finance and Economics,Urumqi 830012,China
    2 School of Information Management,Xinjiang University of Finance and Economics,Urumqi 830012,China
  • Received:2020-06-24 Revised:2020-08-28 Online:2021-04-15 Published:2021-04-09
  • About author:LI Fan,born in 1974,Ph.D,associate professor.His main research interests include high-performance computing and so on.
  • Supported by:
    National Natural Science Foundation of China(41830101),Social Science Foundation of Xinjiang Uygur Autonomous Region(17BTQ093) and Doctoral Research Start-up Fund of Xinjiang University of Finance and Economics(2015BS003).

摘要: 特征脸算法是基于脸部表征的常用人脸辨识方法之一。当训练数据量较大时,不管是训练还是测试模块都非常耗时。基于此,采用CUDA并行运算架构实现GPU加速特征脸算法。针对GPU并行运算的效果取决于硬件规格、算法本身的复杂度和可并行性,以及程序开发者使用GPU的并行化方式等因素,文中首先提出在特征脸算法训练阶段的计算平均值、zero mean、正规化特征脸等计算步骤以及测试阶段的投影到特征脸空间、计算欧几里得距离等计算步骤使用GPU优化加速;其次在相应计算步骤采用不同的并行化加速方法并做出效能评估。实验结果表明,在人脸训练数据量在320~1920的范围内,各计算步骤加速效果明显。与Intel i7-5960X相比,GTX1060显示适配器在训练模块中可达到平均约71.7倍的加速效果,在测试模块中可达到平均约34.1倍的加速效果。

关键词: GPU并行运算, 核心函数, 人脸辨识, 特征脸, 旋转运算

Abstract: Eigenface algorithm is one of the commonly used face recognition methods based on facial representation.When the amount of training data is large,it is very time-consuming both training and testing modules.Based on this,the CUDA parallel computing architecture is used to implement GPU accelerated eigenface algorithm.The effect of GPU parallel computing depends on the hardware specifications,the complexity and parallelism of the algorithm itself,and the parallelization method used by the program developer to use GPU.Therefore,this paper first proposes the calculation of the average value and zero mean in the training phase of the eigenface algorithm.The calculation steps such as normalizing the eigenface and the calculation steps of the projection to the eigenface space and calculating the Euclidean distance in the test phase are optimized and accelerated by GPU.Secondly,different parallelization acceleration methods are used in the corresponding calculation steps and performance evaluation is made.Experimental results show that in the range of face training data from 320 to 1920,the acceleration effect of each calculation step is obvious.Compared with Intel i7-5960X,the GTX1060 display adapter can achieve an average acceleration effect of about 71.7 times in the training module,and an average acceleration effect of about 34.1 times in the test module.

Key words: Eigenface, Face recognition, GPU parallel computing, Kernel function, Rotary operation

中图分类号: 

  • TP301
[1]MACIEJ B,SKURSKI A,MAREK K,et al.Applications ofRay-Casting in Medical Imaging[J].Advances in Intelligent Systems & Computing,2014,283:3-14.
[2]JIN X X,DAKU B,KO S B.Improved GPU SIMD control flow efficiency via hybrid warp size mechanism[J].Microprocessors and Microsystems,2014,38(7):717-729.
[3]LIN Y C,WANG C C,LIN G,et al.A Simple Method to Improve the Quality of Diffusion-Weighted Magnetic Resonance Imaging with Rapid Histologic Correlation in a Murine Model[J].Molecular Imaging,2014,13:1-8.
[4]WANG W,ZENG X H,WANG F H,et al.Parallel Time-Space Processing Model Based Fast N-body Simulation[J].Journal of Frontiers of Computer Science & Technology,2011,5(11):63-69.
[5]SAPNA S,ANJALI R,KAMATH S N.Performance Analysisof Parallel Implementation of PCA-based Face Recognition using OpenCL[C]//2019 4th International Conference on Recent Trends on Electronics,Information,Communication & Techno-logy(RTEICT).IEEE,2019:877-881.
[6]DESHPANDE N T,RAVISHANKAR S.Face Detection andRecognition using Viola-Jones algorithm and Fusion of PCA and ANN[J].Advances in Computational Ences and Technology,2017,10(5):1173-1189.
[7]BANERJEE S,SCHEIRER W,BOWYER K,et al.Fast faceimage synthesis with minimal training[C]//2019 IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE,2019:2126-2136.
[8]MULGREW,AMY C.A geometric approach to study the relationship between maternal and fetal characteristics and the shape of placental surfaces[J].Dissertations & Theses Gradworks,2011,30(3):425-436.
[9]WOO Y,YI C,YI Y.Fast PCA-based face recognition on GPUs[C]//IEEE International Conference on Acoustics,Speech & Signal Processing.IEEE,2013:2659-2663.
[10]ZHANG D,MABU S,HIRASAWA K.Robust intelligent PCA-based face recognition framework using GNP-fuzzy data mining[J].IEEJ Transactions on Electrical & Electronic Engineering,2013,8(3):253-262.
[11]FENG C,YU-BO T,MIN Y.Research and Design of ParallelParticle Swarm Optimization Algorithm Based on CUDA[J].Computer Science,2014(47):280-287.
[12]HWANG F N,WEI Z H,HUANG T M,et al.A parallel addi-tive Schwarz preconditioned Jacobi-Davidson algorithm for polynomial eigenvalue problems in quantum dot simulation[J].Journal of Computational Physics,2010,229(8):2932-2947.
[13]ZHAO T.A Convergence Analysis of the Inexact Simplified Jacobi-Davidson Algorithm for Polynomial Eigenvalue Problems[J].Journal of Scientific Computing,2018,75(3):1207-1228.
[14]AYRES D L,CUMMINGS M P,BAELE G,et al.BEAGLE 3:Improved performance,scaling,and usability for a high-perfor-mance computing library for statistical phylogenetics[J].Syste-matic Biology,2019,68(6):1052-1061.
[15]ANDREW R,DINGLE N.Implementing QR Factorization Updating Algorithms on GPUs[J].Parallel Computing,2014,40(7):161-172.
[1] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2] 邵子灏, 杨世宇, 马国杰.
室内信息服务的基础——低成本定位技术研究综述
Foundation of Indoor Information Services:A Survey of Low-cost Localization Techniques
计算机科学, 2022, 49(9): 228-235. https://doi.org/10.11896/jsjkx.210900260
[3] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[4] 孙刚, 伍江江, 陈浩, 李军, 徐仕远.
一种基于切比雪夫距离的隐式偏好多目标进化算法
Hidden Preference-based Multi-objective Evolutionary Algorithm Based on Chebyshev Distance
计算机科学, 2022, 49(6): 297-304. https://doi.org/10.11896/jsjkx.210500095
[5] 王永, 崔源.
基于四边形最优圈内最短路径的旅行商问题割边方法
Cutting Edge Method for Traveling Salesman Problem Based on the Shortest Paths in Optimal Cycles of Quadrilaterals
计算机科学, 2022, 49(6A): 199-205. https://doi.org/10.11896/jsjkx.210400065
[6] 李丹丹, 吴宇翔, 朱聪聪, 李仲康.
基于多种改进策略的改进麻雀搜索算法
Improved Sparrow Search Algorithm Based on A Variety of Improved Strategies
计算机科学, 2022, 49(6A): 217-222. https://doi.org/10.11896/jsjkx.210700032
[7] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于DBSCAN聚类的集群联邦学习方法
Clustered Federated Learning Methods Based on DBSCAN Clustering
计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[8] 胡聪, 何晓晖, 邵发明, 张艳武, 卢冠林, 王金康.
基于极大极稳定区域及SVM的交通标志检测
Traffic Sign Detection Based on MSERs and SVM
计算机科学, 2022, 49(6A): 325-330. https://doi.org/10.11896/jsjkx.210300117
[9] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
[10] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[11] 王方红, 范兴刚, 杨静静, 周杰, 王德恩.
一种基于有向感知区域调整的强栅栏构建算法
Strong Barrier Construction Algorithm Based on Adjustment of Directional Sensing Area
计算机科学, 2022, 49(6A): 612-618. https://doi.org/10.11896/jsjkx.210300291
[12] 田真真, 蒋维, 郑炳旭, 孟利民.
基于服务器集群的负载均衡优化调度算法
Load Balancing Optimization Scheduling Algorithm Based on Server Cluster
计算机科学, 2022, 49(6A): 639-644. https://doi.org/10.11896/jsjkx.210800071
[13] 刘建美, 王洪, 马智.
Shor整数分解算法的线路优化
Optimization for Shor's Integer Factorization Algorithm Circuit
计算机科学, 2022, 49(6A): 649-653. https://doi.org/10.11896/jsjkx.210600149
[14] 陈博琛, 唐文兵, 黄鸿云, 丁佐华.
基于改进人工势场的未知障碍物无人机编队避障
Pop-up Obstacles Avoidance for UAV Formation Based on Improved Artificial Potential Field
计算机科学, 2022, 49(6A): 686-693. https://doi.org/10.11896/jsjkx.210500194
[15] 张志龙, 史贤俊, 秦玉峰.
基于改进准深度算法的诊断策略优化方法
Diagnosis Strategy Optimization Method Based on Improved Quasi Depth Algorithm
计算机科学, 2022, 49(6A): 729-732. https://doi.org/10.11896/jsjkx.210700076
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!