计算机科学 ›› 2016, Vol. 43 ›› Issue (8): 318-322.doi: 10.11896/j.issn.1002-137X.2016.08.065

• 图形图像与模式识别 • 上一篇    

基于GPU的压缩感知重构算法的设计与实现

张静,熊承义,高志荣   

  1. 中南民族大学电子信息工程学院 武汉430074,中南民族大学电子信息工程学院 武汉430074,中南民族大学计算机科学学院 武汉430074
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金资助

Implementation for Compressed Sensing Reconstruction Algorithm Based on GPU

ZHANG Jing, XIONG Cheng-yi and GAO Zhi-rong   

  • Online:2018-12-01 Published:2018-12-01

摘要: 针对大尺度压缩感知重构算法实时性应用的需要,探讨了基于图形处理器(GPU)的正交匹配追踪算法(OMP)的加速方法及实现。为降低中央处理器与GPU之间传输的高延迟,将整个OMP算法的迭代过程转移到GPU上并行执行。其中,在GPU端根据全局存储器的访问特点,改进CUDA程序使存储访问满足合并访问条件,降低访问延迟。同时,根据流多处理器(SM)的资源条件,增加SM中共享存储器的分配,通过改进线程访问算法来降低bank conflict,提高访存速度。在NVIDIA Tesla K20Xm GPU和Intel(R) E5-2650 CPU上进行了测试,结果表明,算法中耗时长的投影模块、更新权值模块分别可获得32和46倍的加速比,算法整体可获得34倍的加速比。

关键词: 压缩感知重构,正交匹配追踪,图形处理器,并行执行,加速

Abstract: Aiming at the need of real-time application of large scale compressed sensing reconstruction algorithm,the acceleration method and implementation of the Orthogonal Matching Pursuit (OMP) algorithm based on Graphic Proces-sing Unit (GPU) was discussed.In order to reduce the high latency of transmission between the central processing unit and GPU,the iterative process of the whole OMP algorithm is transferred to the GPU for parallel execution.According to the access characteristics of global memory,the CUDA program is improved in graphic processing unit,which makes the storage access meet the combined access conditions,and reduces the access delay.At the same time,according to the resource conditions of the Streaming Multiprocessor (SM),the allocation of shared memory is increased in SM.In addition,the bank conflict is reduced by improving the threads access algorithm to increase the memory access speed.Tests on the NVIDIA Tesla K20Xm GPU and Intel (R) E5-2650 CPU show that the time consuming projection module and the updated weight module can get 32 and 46 times of speedup ratio respectively,and the whole algorithm can achieve 34 times of speedup.

Key words: Compressed sensing reconstruction,Orthogonal matching pursuit, Graphic processing unit,Parallel execution,Acceleration

[1] Donoho D L.Compressed sensing [J].IEEE Transactions on Information Theory (S0018-9448),2006,52(4):1289-1306
[2] Shao Wen-ze,Wei Zhi-hui.Advances and perspectives on compressed sensing theory[J].Journal of Image and Graphics,2012,7(1):1-12(in Chinese) 邵文泽,韦志辉.压缩感知基本理论:回顾与展望[J].中国图象图形学报,2012,17(1):1-12
[3] Jiao Li-cheng,Yang Shu-yuan,Liu Fang,et al.Development and Prospect of Compressive Sensing[J].Acta Electronica Sinica,2011,9(7):1651-1662(in Chinese) 焦李成,杨淑媛,刘芳,等.压缩感知回顾与展望[J].电子学报,2011,39(7):1651-1662
[4] Dai Qiong-hai,Fu Chang-jun,Ji Xiang-yang.Research on Compressed Sensing[J].Chinese Journal of Computers,2011,4(3):425-434(in Chinese) 戴琼海,付长军,季向阳.压缩感知研究[J].计算机学报,2011,34(3):425-434
[5] Mueller K,Yagel R.On the Use of Graphics Hardware to Acce-lerate Algebraic Reconstruction Methods[C]∥Proceedings of SPIE Medical Imaging Conference 1999.San Diego,America,1999
[6] Xiong Z,Chi W Q,Lu K,et al.GPU Acceleration of Saliency Detection Algorithm[C]∥Proceedings of the 11th International Symposium on Distributed Computing and Applications to Business,Engineering & Science.IEEE,Guangxi,2012:48-51
[7] Guo Rui-ran,Song Jian-xin.Research and Implementation of Pa-rallel Signal Reconstruction about Image Compressed Sensing Based on GPU[J].Video Engineering,2014,8(11):15-19(in Chinese) 郭睿冉,宋建新.图象压缩感知的基于GPU的并行重构算法研究[J].电视技术,2014,38(11):15-19
[8] Li Guo-yan,Hou Xiang-dan,Gu Jun-hua,et al.A GPU-BasedParallel Design and Implementation of Sparse MRI Reconstruction Algorithm[J].Computer Applications and Software,2013,0(9):163-166(in Chinese) 李国燕,侯向丹,顾军华,等.稀疏磁共振图像重建算法的 GPU 并行设计与实现[J].计算机应用与软件,2013,30(9):163-166
[9] Kulkarni A,Mohsenin T.Accelerating Compressive Sensing Reconstruction OMP Algorithm with CPU,GPU,FPGA and Domain Specific Many-Core[C]∥IEEE International Symposium on Circuits and Systems(ISCAS 2015).Lisbon,Portugal,2015:970-973
[10] Fowler J E,Mun S,Tramel E W.Block-Based Compressed Sen-sing of Images and Video [J].Foundations Trends in Signal Processing,2012,4:297-416
[11] Lee H,Oh H,Lee S,et al.Visually Weighted Compressive Sen-sing:Measurement and Reconstruction [J].IEEE Trans.on Image Processing,2013,22(4):1444-1455
[12] Farber R.CUDA Application Design and Development [M].Burlington:Morgan Kaufmann,2011
[13] Wang Ze-huan,Wang Peng.Introduction to CPU Parallel Programming Technology[J].E-science Technology & Application,2013,4(1):81-87(in Chinese) 王泽寰,王鹏.GPU并行计算编程技术介绍[J].科研信息化技术与应用,2013,4(1):81-87
[14] 张舒,褚艳利.GPU高性能运算之CUDA[M].北京:中国水利水电出版社,2009
[15] Deng Chun-yuan,Wei Yi-min.Some New Results of the Sherman-Morrison-Woodbury Formula[C]∥Proceeding of The Sixth International Conference of Matrices and Operators.Chengdu,China,2011,2:220-223
[16] Hager W W.Updating the Inverse of a Matrix [J].SIAM REVIEW,1989,31(2):221-239

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!