基于ARM的图像几何变换算法库实现和优化技术研究

doi:10.11896/jsjkx.220100128

计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 10-17.doi: 10.11896/jsjkx.220100128

基于ARM的图像几何变换算法库实现和优化技术研究

王麓涵^1,2, 贾海鹏¹, 张云泉¹, 张广婷¹

1 中国科学院计算技术研究所计算机体系结构国家重点实验室北京 100190
2 中国科学院大学计算机科学与技术学院北京 100049

收稿日期:2022-01-14 修回日期:2022-04-28 出版日期:2022-10-15 发布日期:2022-10-13
通讯作者: 贾海鹏(jiahaipeng@ict.ac.cn)
作者简介:(iswangluhan@foxmail.com)
基金资助:
国家重点研发计划(2017YFB0202105);国家自然科学基金(61972376);北京市自然科学基金(L182053)

Study on Implementation and Optimization of ARM-based Image Geometric Transformation Library

WANG Lu-han^1,2, JIA Hai-peng¹, ZHANG Yun-quan¹, ZHANG Guang-ting¹

1 State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
2 School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China

Received:2022-01-14 Revised:2022-04-28 Online:2022-10-15 Published:2022-10-13
About author:WANG Lu-han,born in 1999,postgra-duate.His main research interests include high performance computing and parallel software,etc.
JIA Hai-peng,born in 1983,Ph.D.His main research interests include high performance computing,many-core programming method and key optimization technologies for many-core platforms.
Supported by:
National Key R & D Program of China(2017YFB0202105),National Natural Science Foundation of China(61972376) and Natural Science Foundation of Beijing,China(L182053).

摘要/Abstract

摘要： 高性能原语基础算法库(Intel Integrated Performance Primitives,Intel IPP)是面向信号、图像处理领域的高性能多媒体加速库。然而,截至目前,暂时没有基于ARM架构的高性能IPP库。文中针对镜像变换、重映射、仿射、透视变换等基础图像几何变换算法,实现了一个基于ARM计算平台的高性能算法库PerfIPP,并通过SIMD汇编优化、内存对齐、数据预计算、高性能矩阵转置等优化技术,显著提升了上述算法的性能。同时,通过对比不同指令组合、不同指令排列、不同取数存储方式等所带来的性能差异,总结图像几何变换算法在ARM计算平台上实现与优化的关键技术。实验结果表明,在华为鲲鹏920平台上,相比开源计算机视觉库OpenCV,PerfIPP在满足精度要求的同时,在上述基础图像几何变换上获得了108.08%~435.5%的性能提升,并达到了在英特尔至强E5-2640处理器上Intel IPP库平均性能的83.79%。

关键词: IPP, ARM, NEON Intrinsic, 几何变换, 插值

Abstract: Intel integrated performance primitives is a high-performance multimedia acceleration library for signal and image processing.However,as of now,there is no high-performance IPP library based on the ARM architecture.This paper implements a high-performance algorithm library PerfIPP based on the ARM computing platform for basic image geometric transformation algorithms such as mirror,remap,and affine/perspective transformation.The PerfIPP,optimized through SIMD assembly,memory alignment,data pre-calculation,high-performance matrix optimization techniques,has significantly improved the performance of the above algorithms.At the same time,This paper summarizes the key technologies for the realization and optimization of image geometric transformation algorithms on the ARM computing platform by comparing the performance differences brought about by different instruction combinations,different instruction arrangements,and different access and storage methods.Experimental results show that,on the Huawei Kunpeng 920 platform,thePerfIPP proposed in this paper can achieve 108.08%~435.5% performance improvement in image transformation compared with the open source computer vision library while meeting accuracy.It also achieves 83.79% of the average performance of Intel IPP library on Intel Xeon E5-2640 processor.

Key words: IPP, ARM, NEON Intrinsic, Geometry Transforms, Interpolation

中图分类号:

TP391

王麓涵, 贾海鹏, 张云泉, 张广婷. 基于ARM的图像几何变换算法库实现和优化技术研究[J]. 计算机科学, 2022, 49(10): 10-17. https://doi.org/10.11896/jsjkx.220100128

WANG Lu-han, JIA Hai-peng, ZHANG Yun-quan, ZHANG Guang-ting. Study on Implementation and Optimization of ARM-based Image Geometric Transformation Library[J]. Computer Science, 2022, 49(10): 10-17. https://doi.org/10.11896/jsjkx.220100128

参考文献

[1]TENG S H,WANG F,ZHAO Z S,et al.Application of IntelIPP to comprehensive experiments for digital image processing [J].Laboratory Science,2016,19(5):76-79.
[2]LI J,WEI J,SHI J H.Design of MPEG-4 Video Transmission System based on IPP library [J].Microcomputer Information,2008,24(11):16-17,33.
[3]DEVIRANGALAKSHMI A,INABITHINI S R,VENKATARAMANA P.Realization of signal processing algorithms using Intel integrated performance primitives(IPP)[C]//2017 International Conference on Innovations in Information,Embedded and Communication Systems(ICIIECS).IEEE,2017:1-4.
[4]OME LANDR J R.Programming with intel ipp (integrated performance primitives) and intel opencv (open computer vision) under gnu linux[EB/OL].http://www4.comp.polyu.edu.hk/~csajaykr/myhome/teaching/biometrics/ippocv.pdf.
[5]CHEN T,LI Z H,JIA H P,et al.Implementation and optimization of multi-dimensional FFT based on ARMv8 platform [J].Chinese Journal of Computers,2019,42(11):2384-2402.
[6]STEPHENS N.ARMv8-A next-generation vector architecturefor HPC[C]//2016 IEEE Hot Chips 28 Symposium(HCS).IEEE,2016:1-31.
[7]GRISENTHWAITE R.Armv8 technology preview[C]//IEEE Conference.2011.
[8]FLUR S,GRAY K E,PULTE C,et al.Modelling the ARMv8 architecture,operationally:concurrency and ISA[C]//Procee-dings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages.2016:608-621.
[9]GUAN Y R,GUAN Y Q.Research and application of affinetransform based on OpenCV [J].Computer Technology and Development,2016,26(12):58-63.
[10]ROBERT C,ASEEM A,MANEESH A,Image warps for artistic perspective manipulation[J].ACM Transactions on Gra-phics,2010,29(4CD):127.1-127.9.
[11]HAMMAR L P,MARTINEZ A J,BAJWA A A,et al.Haswell:The fourth-generation intel core processor[J].IEEE Micro,2014,34(2):6-20.
[12]KURD N,CHOWDHURY M,BURTON E,et al.Haswell:A family of IA 22 nm processors[J].IEEE Journal of Solid-State Circuits,2014,50(1):49-58.
[13]HACKENBERG D,SCHÖNE R,ILSCHE T,et al.An energy efficiency feature survey of the intel haswell processor[C]//2015 IEEE International Parallel and Distributed Processing Symposium Workshop.IEEE,2015:896-904.
[14]MOLKA D,HACKENBERG D,SCHÖNE R,et al.Cache cohe-rence protocol and memory performance of the intel haswell-ep architecture[C]//2015 44th International Conference on Parallel Processing.IEEE,2015:739-748.
[15]KANTER D.Intel's haswell cpu microarchitecture[J/OL].Real World Technologies,2012.https://scholar.google.co.kr/citations-view_op=view_citation&hl=vi&user=jLyty0sAAAAJ&citation_for_view=jLyty0sAAAAJ:ufrVoPGSRksC.
[16]WATANABE H,NAKAGAWA K M.SIMD vectorization forthe Lennard-Jones potential with AVX2 and AVX-512 instructions[J].Computer Physics Communications,2019,237:1-7.
[17]HAMMARLUND P,MARTINEZ A J,BAJWA A,et al.4thgeneration Intel core processor,codenamed haswell[C]//Hot chips.2013.
[18]JEONG S,YANG S,BURGSTALLER B.Lock Elision for Protected Objects Using Intel Transactional Synchronization Extensions[C]//Ada-Europe International Conference on Reliable Software Technologies.Cham:Springer,2017:121-136.
[19]OLEKSENKO O,KUVAISKII D,BHATOTIA P,et al.Effi-cient Fault Tolerance using Intel MPX and TSX[C]//46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.2016.

相关文章 15

[1]	黎嵘繁, 钟婷, 吴劲, 周帆, 匡平. 基于时空注意力克里金的边坡形变数据插值方法 Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation 计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[2]	金育妍, 余天豪, 王松波, 林伟伟, 潘宇聪. ARM架构云服务器的CPU功耗模型研究 CPU Power Model for ARM Architecture Cloud Servers 计算机科学, 2022, 49(10): 59-65. https://doi.org/10.11896/jsjkx.210800103
[3]	姚建宇, 张祎维, 张广婷, 贾海鹏. 基于SIMD的三角函数高性能实现与优化 High Performance Implementation and Optimization of Trigonometric Functions Based on SIMD 计算机科学, 2021, 48(12): 29-35. https://doi.org/10.11896/jsjkx.201200135
[4]	马俊成, 蒋慕蓉, 房素芹. 基于改进Marching Tetrahedra算法的锥体气象数据三维重建 Three-dimensional Reconstruction of Cone Meteorological Data Based on Improved MarchingTetrahedra Algorithm 计算机科学, 2021, 48(11A): 644-647. https://doi.org/10.11896/jsjkx.210200025
[5]	高强, 高敬阳, 赵地. GNNI U-net：基于组归一化与最近邻插值的MRI左心室轮廓精准分割网络 GNNI U-net:Precise Segmentation Neural Network of Left Ventricular Contours for MRI Images Based on Group Normalization and Nearest Interpolation 计算机科学, 2020, 47(8): 213-220. https://doi.org/10.11896/jsjkx.190600026
[6]	王一超, 廖秋承, 左思成, 谢锐, 林新华. 一种ARM处理器面向高性能计算的性能评估 Performance Evaluation of ARM-ISA SoC for High Performance Computing 计算机科学, 2019, 46(8): 95-99. https://doi.org/10.11896/j.issn.1002-137X.2019.08.015
[7]	徐磊, 陈荣亮, 蔡小川. 基于非结构化网格的高可扩展并行有限体积格子 Scalable Parallel Finite Volume Lattice Boltzmann Method Based on Unstructured Grid 计算机科学, 2019, 46(8): 84-88. https://doi.org/10.11896/j.issn.1002-137X.2019.08.013
[8]	宋刚, 杜宏伟, 王平, 刘新新, 韩慧健. 纹理细节保持的图像插值算法 Texture Detail Preserving Image Interpolation Algorithm 计算机科学, 2019, 46(6A): 169-176.
[9]	邓国强, 唐敏, 梁状昌. 求解稀疏多元多项式插值问题的分治算法 Divide-and-Conquer Algorithm for Sparse Polynomial Interpolation 计算机科学, 2019, 46(5): 298-303. https://doi.org/10.11896/j.issn.1002-137X.2019.05.046
[10]	毛莺池, 曹海, 何进锋. 面向大坝变形监测的时空一体化预测算法 Spatio-Temporal Integrated Forecasting Algorithm for Dam Deformation 计算机科学, 2019, 46(2): 223-229. https://doi.org/10.11896/j.issn.1002-137X.2019.02.034
[11]	张杰, 王刚, 姚小强, 宋亚飞, 郑康波. 双向RNN下的航迹拟合模型研究 Research on Track Fitting Model Under Two-way RNN 计算机科学, 2019, 46(11A): 58-61.
[12]	刘佩, 贾建, 陈莉, 安影. 基于快速自适应的二维经验模态分解的图像去噪算法 Image Denoising Algorithm Based on Fast and Adaptive Bidimensional Empirical Mode Decomposition 计算机科学, 2019, 46(11): 260-266. https://doi.org/10.11896/jsjkx.190400159
[13]	江伟,陈羽中,黄启成,刘漳辉,刘耿耿. 一种云环境下的主机负载预测方法 Workload Forecasting Method in Cloud 计算机科学, 2018, 45(6A): 270-274.
[14]	钱江,王凡,郭庆杰. 二元非张量积型连分式插值 Bivariate Non-tensor-product-typed Continued Fraction Interpolation 计算机科学, 2018, 45(3): 83-91. https://doi.org/10.11896/j.issn.1002-137X.2018.03.014
[15]	刘成志,韩旭里,李军成. 二次三角Hermite插值样条控制点的选取 Selection of Control Points of Quadratic-trigonometric Hermite Interpolation Splines 计算机科学, 2018, 45(3): 76-82. https://doi.org/10.11896/j.issn.1002-137X.2018.03.013

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于ARM的图像几何变换算法库实现和优化技术研究

Study on Implementation and Optimization of ARM-based Image Geometric Transformation Library

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0