Computer Science ›› 2022, Vol. 49 ›› Issue (10): 10-17.doi: 10.11896/jsjkx.220100128

• High Perfonnance Computing • Previous Articles     Next Articles

Study on Implementation and Optimization of ARM-based Image Geometric Transformation Library

WANG Lu-han1,2, JIA Hai-peng1, ZHANG Yun-quan1, ZHANG Guang-ting1   

  1. 1 State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
    2 School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2022-01-14 Revised:2022-04-28 Online:2022-10-15 Published:2022-10-13
  • About author:WANG Lu-han,born in 1999,postgra-duate.His main research interests include high performance computing and parallel software,etc.
    JIA Hai-peng,born in 1983,Ph.D.His main research interests include high performance computing,many-core programming method and key optimization technologies for many-core platforms.
  • Supported by:
    National Key R & D Program of China(2017YFB0202105),National Natural Science Foundation of China(61972376) and Natural Science Foundation of Beijing,China(L182053).

Abstract: Intel integrated performance primitives is a high-performance multimedia acceleration library for signal and image processing.However,as of now,there is no high-performance IPP library based on the ARM architecture.This paper implements a high-performance algorithm library PerfIPP based on the ARM computing platform for basic image geometric transformation algorithms such as mirror,remap,and affine/perspective transformation.The PerfIPP,optimized through SIMD assembly,memory alignment,data pre-calculation,high-performance matrix optimization techniques,has significantly improved the performance of the above algorithms.At the same time,This paper summarizes the key technologies for the realization and optimization of image geometric transformation algorithms on the ARM computing platform by comparing the performance differences brought about by different instruction combinations,different instruction arrangements,and different access and storage methods.Experimental results show that,on the Huawei Kunpeng 920 platform,thePerfIPP proposed in this paper can achieve 108.08%~435.5% performance improvement in image transformation compared with the open source computer vision library while meeting accuracy.It also achieves 83.79% of the average performance of Intel IPP library on Intel Xeon E5-2640 processor.

Key words: IPP, ARM, NEON Intrinsic, Geometry Transforms, Interpolation

CLC Number: 

  • TP391
[1]TENG S H,WANG F,ZHAO Z S,et al.Application of IntelIPP to comprehensive experiments for digital image processing [J].Laboratory Science,2016,19(5):76-79.
[2]LI J,WEI J,SHI J H.Design of MPEG-4 Video Transmission System based on IPP library [J].Microcomputer Information,2008,24(11):16-17,33.
[3]DEVIRANGALAKSHMI A,INABITHINI S R,VENKATARAMANA P.Realization of signal processing algorithms using Intel integrated performance primitives(IPP)[C]//2017 International Conference on Innovations in Information,Embedded and Communication Systems(ICIIECS).IEEE,2017:1-4.
[4]OME LANDR J R.Programming with intel ipp (integrated performance primitives) and intel opencv (open computer vision) under gnu linux[EB/OL].http://www4.comp.polyu.edu.hk/~csajaykr/myhome/teaching/biometrics/ippocv.pdf.
[5]CHEN T,LI Z H,JIA H P,et al.Implementation and optimization of multi-dimensional FFT based on ARMv8 platform [J].Chinese Journal of Computers,2019,42(11):2384-2402.
[6]STEPHENS N.ARMv8-A next-generation vector architecturefor HPC[C]//2016 IEEE Hot Chips 28 Symposium(HCS).IEEE,2016:1-31.
[7]GRISENTHWAITE R.Armv8 technology preview[C]//IEEE Conference.2011.
[8]FLUR S,GRAY K E,PULTE C,et al.Modelling the ARMv8 architecture,operationally:concurrency and ISA[C]//Procee-dings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages.2016:608-621.
[9]GUAN Y R,GUAN Y Q.Research and application of affinetransform based on OpenCV [J].Computer Technology and Development,2016,26(12):58-63.
[10]ROBERT C,ASEEM A,MANEESH A,Image warps for artistic perspective manipulation[J].ACM Transactions on Gra-phics,2010,29(4CD):127.1-127.9.
[11]HAMMAR L P,MARTINEZ A J,BAJWA A A,et al.Haswell:The fourth-generation intel core processor[J].IEEE Micro,2014,34(2):6-20.
[12]KURD N,CHOWDHURY M,BURTON E,et al.Haswell:A family of IA 22 nm processors[J].IEEE Journal of Solid-State Circuits,2014,50(1):49-58.
[13]HACKENBERG D,SCHÖNE R,ILSCHE T,et al.An energy efficiency feature survey of the intel haswell processor[C]//2015 IEEE International Parallel and Distributed Processing Symposium Workshop.IEEE,2015:896-904.
[14]MOLKA D,HACKENBERG D,SCHÖNE R,et al.Cache cohe-rence protocol and memory performance of the intel haswell-ep architecture[C]//2015 44th International Conference on Parallel Processing.IEEE,2015:739-748.
[15]KANTER D.Intel's haswell cpu microarchitecture[J/OL].Real World Technologies,2012.https://scholar.google.co.kr/citations-view_op=view_citation&hl=vi&user=jLyty0sAAAAJ&citation_for_view=jLyty0sAAAAJ:ufrVoPGSRksC.
[16]WATANABE H,NAKAGAWA K M.SIMD vectorization forthe Lennard-Jones potential with AVX2 and AVX-512 instructions[J].Computer Physics Communications,2019,237:1-7.
[17]HAMMARLUND P,MARTINEZ A J,BAJWA A,et al.4thgeneration Intel core processor,codenamed haswell[C]//Hot chips.2013.
[18]JEONG S,YANG S,BURGSTALLER B.Lock Elision for Protected Objects Using Intel Transactional Synchronization Extensions[C]//Ada-Europe International Conference on Reliable Software Technologies.Cham:Springer,2017:121-136.
[19]OLEKSENKO O,KUVAISKII D,BHATOTIA P,et al.Effi-cient Fault Tolerance using Intel MPX and TSX[C]//46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.2016.
[1] LI Rong-fan, ZHONG Ting, WU Jin, ZHOU Fan, KUANG Ping. Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation [J]. Computer Science, 2022, 49(8): 33-39.
[2] ZHAO Dong-mei, WU Ya-xing, ZHANG Hong-bin. Network Security Situation Prediction Based on IPSO-BiLSTM [J]. Computer Science, 2022, 49(7): 357-362.
[3] TAN Ren-shen, XU Long-bo, ZHOU Bing, JING Zhao-xia, HUANG Xiang-sheng. Optimization and Simulation of General Operation and Maintenance Path Planning Model for Offshore Wind Farms [J]. Computer Science, 2022, 49(6A): 795-801.
[4] LIU Zhang-hui, ZHENG Hong-qiang, ZHANG Jian-shan, CHEN Zhe-yi. Computation Offloading and Deployment Optimization in Multi-UAV-Enabled Mobile Edge Computing Systems [J]. Computer Science, 2022, 49(6A): 619-627.
[5] ZHU Xu-hui, SHEN Guo-jiao, XIA Ping-fan, NI Zhi-wei. Model Based on Spirally Evolution Glowworm Swarm Optimization and Back Propagation Neural Network and Its Application in PPP Financing Risk Prediction [J]. Computer Science, 2022, 49(6A): 667-674.
[6] XU Ru-li, HUANG Zhang-can, XIE Qin-qin, LI Hua-feng, ZHAN Hang. Multi-threshold Segmentation for Color Image Based on Pyramid Evolution Strategy [J]. Computer Science, 2022, 49(6): 231-237.
[7] QIU Xu, BIAN Hao-bu, WU Ming-xiao, ZHU Xiao-rong. Study on Task Offloading Algorithm for Internet of Vehicles on Highway Based on 5G MillimeterWave Communication [J]. Computer Science, 2022, 49(6): 25-31.
[8] LI Xiao-dong, YU Zhi-yong, HUANG Fang-wan, ZHU Wei-ping, TU Chun-yu, ZHENG Wei-nan. Participant Selection Strategies Based on Crowd Sensing for River Environmental Monitoring [J]. Computer Science, 2022, 49(5): 371-379.
[9] LI Hao-dong, HU Jie, FAN Qin-qin. Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application [J]. Computer Science, 2022, 49(5): 212-220.
[10] YAN Lei, ZHANG Gong-xuan, WANG Tian, KOU Xiao-yong, WANG Guo-hong. Scheduling Algorithm for Bag-of-Tasks with Due Date Constraints on Hybrid Clouds [J]. Computer Science, 2022, 49(5): 244-249.
[11] PAN Yan-na, FENG Xiang, YU Hui-qun. Competitive-Cooperative Coevolution for Large Scale Optimization with Computation Resource Allocation Pool [J]. Computer Science, 2022, 49(2): 182-190.
[12] JIN Yu-yan, YU Tian-hao, WANG Song-bo, LIN Wei-wei, PAN Yu-cong. CPU Power Model for ARM Architecture Cloud Servers [J]. Computer Science, 2022, 49(10): 59-65.
[13] AO Tian-yu, LIU Quan. Upper Confidence Bound Exploration with Fast Convergence [J]. Computer Science, 2022, 49(1): 298-305.
[14] LUO Wen-cong, ZHENG Jia-li, QUAN Yi-xuan, XIE Xiao-de, LIN Zi-han. Optimized Deployment of RFID Reader Antenna Based on Improved Multi-objective Salp Swarm Algorithm [J]. Computer Science, 2021, 48(9): 292-297.
[15] QU Li-cheng, LYU Jiao, QU Yi-hua, WANG Hai-fei. Intelligent Assignment and Positioning Algorithm of Moving Target Based on Fuzzy Neural Network [J]. Computer Science, 2021, 48(8): 246-252.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!