计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 10-17.doi: 10.11896/jsjkx.220100128
王麓涵1,2, 贾海鹏1, 张云泉1, 张广婷1
WANG Lu-han1,2, JIA Hai-peng1, ZHANG Yun-quan1, ZHANG Guang-ting1
摘要: 高性能原语基础算法库(Intel Integrated Performance Primitives,Intel IPP)是面向信号、图像处理领域的高性能多媒体加速库。然而,截至目前,暂时没有基于ARM架构的高性能IPP库。文中针对镜像变换、重映射、仿射、透视变换等基础图像几何变换算法,实现了一个基于ARM计算平台的高性能算法库PerfIPP,并通过SIMD汇编优化、内存对齐、数据预计算、高性能矩阵转置等优化技术,显著提升了上述算法的性能。同时,通过对比不同指令组合、不同指令排列、不同取数存储方式等所带来的性能差异,总结图像几何变换算法在ARM计算平台上实现与优化的关键技术。实验结果表明,在华为鲲鹏920平台上,相比开源计算机视觉库OpenCV,PerfIPP在满足精度要求的同时,在上述基础图像几何变换上获得了108.08%~435.5%的性能提升,并达到了在英特尔至强E5-2640处理器上Intel IPP库平均性能的83.79%。
中图分类号:
[1]TENG S H,WANG F,ZHAO Z S,et al.Application of IntelIPP to comprehensive experiments for digital image processing [J].Laboratory Science,2016,19(5):76-79. [2]LI J,WEI J,SHI J H.Design of MPEG-4 Video Transmission System based on IPP library [J].Microcomputer Information,2008,24(11):16-17,33. [3]DEVIRANGALAKSHMI A,INABITHINI S R,VENKATARAMANA P.Realization of signal processing algorithms using Intel integrated performance primitives(IPP)[C]//2017 International Conference on Innovations in Information,Embedded and Communication Systems(ICIIECS).IEEE,2017:1-4. [4]OME LANDR J R.Programming with intel ipp (integrated performance primitives) and intel opencv (open computer vision) under gnu linux[EB/OL].http://www4.comp.polyu.edu.hk/~csajaykr/myhome/teaching/biometrics/ippocv.pdf. [5]CHEN T,LI Z H,JIA H P,et al.Implementation and optimization of multi-dimensional FFT based on ARMv8 platform [J].Chinese Journal of Computers,2019,42(11):2384-2402. [6]STEPHENS N.ARMv8-A next-generation vector architecturefor HPC[C]//2016 IEEE Hot Chips 28 Symposium(HCS).IEEE,2016:1-31. [7]GRISENTHWAITE R.Armv8 technology preview[C]//IEEE Conference.2011. [8]FLUR S,GRAY K E,PULTE C,et al.Modelling the ARMv8 architecture,operationally:concurrency and ISA[C]//Procee-dings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages.2016:608-621. [9]GUAN Y R,GUAN Y Q.Research and application of affinetransform based on OpenCV [J].Computer Technology and Development,2016,26(12):58-63. [10]ROBERT C,ASEEM A,MANEESH A,Image warps for artistic perspective manipulation[J].ACM Transactions on Gra-phics,2010,29(4CD):127.1-127.9. [11]HAMMAR L P,MARTINEZ A J,BAJWA A A,et al.Haswell:The fourth-generation intel core processor[J].IEEE Micro,2014,34(2):6-20. [12]KURD N,CHOWDHURY M,BURTON E,et al.Haswell:A family of IA 22 nm processors[J].IEEE Journal of Solid-State Circuits,2014,50(1):49-58. [13]HACKENBERG D,SCHÖNE R,ILSCHE T,et al.An energy efficiency feature survey of the intel haswell processor[C]//2015 IEEE International Parallel and Distributed Processing Symposium Workshop.IEEE,2015:896-904. [14]MOLKA D,HACKENBERG D,SCHÖNE R,et al.Cache cohe-rence protocol and memory performance of the intel haswell-ep architecture[C]//2015 44th International Conference on Parallel Processing.IEEE,2015:739-748. [15]KANTER D.Intel's haswell cpu microarchitecture[J/OL].Real World Technologies,2012.https://scholar.google.co.kr/citations-view_op=view_citation&hl=vi&user=jLyty0sAAAAJ&citation_for_view=jLyty0sAAAAJ:ufrVoPGSRksC. [16]WATANABE H,NAKAGAWA K M.SIMD vectorization forthe Lennard-Jones potential with AVX2 and AVX-512 instructions[J].Computer Physics Communications,2019,237:1-7. [17]HAMMARLUND P,MARTINEZ A J,BAJWA A,et al.4thgeneration Intel core processor,codenamed haswell[C]//Hot chips.2013. [18]JEONG S,YANG S,BURGSTALLER B.Lock Elision for Protected Objects Using Intel Transactional Synchronization Extensions[C]//Ada-Europe International Conference on Reliable Software Technologies.Cham:Springer,2017:121-136. [19]OLEKSENKO O,KUVAISKII D,BHATOTIA P,et al.Effi-cient Fault Tolerance using Intel MPX and TSX[C]//46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.2016. |
[1] | 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平. 基于时空注意力克里金的边坡形变数据插值方法 Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation 计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161 |
[2] | 金育妍, 余天豪, 王松波, 林伟伟, 潘宇聪. ARM架构云服务器的CPU功耗模型研究 CPU Power Model for ARM Architecture Cloud Servers 计算机科学, 2022, 49(10): 59-65. https://doi.org/10.11896/jsjkx.210800103 |
[3] | 姚建宇, 张祎维, 张广婷, 贾海鹏. 基于SIMD的三角函数高性能实现与优化 High Performance Implementation and Optimization of Trigonometric Functions Based on SIMD 计算机科学, 2021, 48(12): 29-35. https://doi.org/10.11896/jsjkx.201200135 |
[4] | 马俊成, 蒋慕蓉, 房素芹. 基于改进Marching Tetrahedra算法的锥体气象数据三维重建 Three-dimensional Reconstruction of Cone Meteorological Data Based on Improved MarchingTetrahedra Algorithm 计算机科学, 2021, 48(11A): 644-647. https://doi.org/10.11896/jsjkx.210200025 |
[5] | 高强, 高敬阳, 赵地. GNNI U-net:基于组归一化与最近邻插值的MRI左心室轮廓精准分割网络 GNNI U-net:Precise Segmentation Neural Network of Left Ventricular Contours for MRI Images Based on Group Normalization and Nearest Interpolation 计算机科学, 2020, 47(8): 213-220. https://doi.org/10.11896/jsjkx.190600026 |
[6] | 王一超, 廖秋承, 左思成, 谢锐, 林新华. 一种ARM处理器面向高性能计算的性能评估 Performance Evaluation of ARM-ISA SoC for High Performance Computing 计算机科学, 2019, 46(8): 95-99. https://doi.org/10.11896/j.issn.1002-137X.2019.08.015 |
[7] | 徐磊, 陈荣亮, 蔡小川. 基于非结构化网格的高可扩展并行有限体积格子 Scalable Parallel Finite Volume Lattice Boltzmann Method Based on Unstructured Grid 计算机科学, 2019, 46(8): 84-88. https://doi.org/10.11896/j.issn.1002-137X.2019.08.013 |
[8] | 宋刚, 杜宏伟, 王平, 刘新新, 韩慧健. 纹理细节保持的图像插值算法 Texture Detail Preserving Image Interpolation Algorithm 计算机科学, 2019, 46(6A): 169-176. |
[9] | 邓国强, 唐敏, 梁状昌. 求解稀疏多元多项式插值问题的分治算法 Divide-and-Conquer Algorithm for Sparse Polynomial Interpolation 计算机科学, 2019, 46(5): 298-303. https://doi.org/10.11896/j.issn.1002-137X.2019.05.046 |
[10] | 毛莺池, 曹海, 何进锋. 面向大坝变形监测的时空一体化预测算法 Spatio-Temporal Integrated Forecasting Algorithm for Dam Deformation 计算机科学, 2019, 46(2): 223-229. https://doi.org/10.11896/j.issn.1002-137X.2019.02.034 |
[11] | 张杰, 王刚, 姚小强, 宋亚飞, 郑康波. 双向RNN下的航迹拟合模型研究 Research on Track Fitting Model Under Two-way RNN 计算机科学, 2019, 46(11A): 58-61. |
[12] | 刘佩, 贾建, 陈莉, 安影. 基于快速自适应的二维经验模态分解的图像去噪算法 Image Denoising Algorithm Based on Fast and Adaptive Bidimensional Empirical Mode Decomposition 计算机科学, 2019, 46(11): 260-266. https://doi.org/10.11896/jsjkx.190400159 |
[13] | 江伟,陈羽中,黄启成,刘漳辉,刘耿耿. 一种云环境下的主机负载预测方法 Workload Forecasting Method in Cloud 计算机科学, 2018, 45(6A): 270-274. |
[14] | 钱江,王凡,郭庆杰. 二元非张量积型连分式插值 Bivariate Non-tensor-product-typed Continued Fraction Interpolation 计算机科学, 2018, 45(3): 83-91. https://doi.org/10.11896/j.issn.1002-137X.2018.03.014 |
[15] | 刘成志,韩旭里,李军成. 二次三角Hermite插值样条控制点的选取 Selection of Control Points of Quadratic-trigonometric Hermite Interpolation Splines 计算机科学, 2018, 45(3): 76-82. https://doi.org/10.11896/j.issn.1002-137X.2018.03.013 |
|