Computer Science ›› 2022, Vol. 49 ›› Issue (11): 76-82.doi: 10.11896/jsjkx.211200252

• Computer Software • Previous Articles     Next Articles

Optimization Method of Streaming Storage Based on GCC Compiler

GAO Xiu-wu, HUANG Liang-ming, JIANG Jun   

  1. Jiang Institute of Computing Technology,Wuxi,Jiangsu 214083,China
  • Received:2021-12-22 Revised:2022-04-29 Online:2022-11-15 Published:2022-11-03
  • About author:GAO Xiu-wu,born in 1992,postgra-duate.His main research interests include architecture-oriented performance analysis and optimization,compiler optimization,etc.   
    HUANG Liang-ming,born in 1988,Ph.D,assistant professor,is a member of China Computer Federation.His main research interests include architecture-oriented performance analysis and optimization,compiler optimization,etc.
  • Supported by:
    National Key Research and Development Project(2020YFB0204602) and Comprehensive Research Project(Research on Compilation Optimization Improvement for Sunway Processor).

Abstract: To solve the problem of cache pollution and mandatory loss caused by streaming memory access,some high-perfor-mance general-purpose processor platforms provide a dedicated path and supporting instructions for accessing memory directly without accessing the cache.The overall performance of chip memory system can be improved by using direct memory access in common application scenarios such as streaming storage.However,it is a tedious and error-prone task for programmers to determine when direct access to main memory is beneficial,and an effective way is to implement it automatically through the compiler.Therefore,based on the in-depth analysis of the benefits of different types of access operations under the streaming storage access mode,this paper proposes a streaming storage optimization method based on GCC compiler.In the SSA-GIMPLE stage of GCC compiler,the continuous write or step write with stream access characteristics in the program loop is recognized,and optimization objects are screened according to the benefit analysis and dependency relationship.Finally,the direct access main memory instructions are generated by matching instruction templates at the back end of compiler.The continuous/step-write case and STREAM test set and their variants are used for experimental evaluation on SW domestic processor platform.The results show that the optimized method can significantly reduce the execution time of STREAM storage applications,and the average acceleration ratio of STREAM test set after optimization is 1.31.Additionally,in conjunction with loop unwinding optimization,the STREAM test set has an average acceleration ratio of 1.45.

Key words: GCC complier, Direct memory access, Compiler optimization, Code generation, Domestic processor

CLC Number: 

  • TP311
[1]WULF W A,MCKEE S A.Hitting the memory wall:implications of the obvious [J].ACM Sigarch Computer Architecture News,1995,23(1):20-24.
[2]NOWATZYK A,PONG F,SAULSBURY A.Missing the Memory Wall:The Case for Processor/Memory Integration [J].ACM Sigarch Computer Architecture News,1996,24(2):90-101.
[3]DENNING P J.The Locality Principle [J].Communications of the ACM,2005,48(7):19-24.
[4]PING L.Analysis and Development of the Locality Principle[J].Advances in Intelligent and Soft Computing,2012,133(7):211-214.
[5]BRYANT R,O’HALLARON D.Computer systems:a pro-grammer’s perspective [M].Upper Saddle River:Prentice Hall,2003.
[6]VENKATESAN R,KOZHIKKOTTU V J,SHARAD M,et al.Cache Design with Domain Wall Memory[J].IEEE Transactions on Computers,2016,65(4):1010-1024.
[7]BAER J L,CHEN T F.An effective on-chip preloading scheme to reduce data access penalty [C]//IEEE Conference on Supercomputing.ACM,1991.
[8]DONG Y S,LI C J.Mechanism and Capability of Data Prefet-ching in Intel©64 Architecture [J].Computer Science,2016,43(5):34-41.
[9]TIMOTHY S A,JONES M.Software Prefetching for IndirectMemory Accesses[C]//2017 IEEE/ACM International Symposium on Code Generation and Optimization(CGO).ACM,2017.
[10]WANG J H,LI J,LU D D,et al.Hardware prefetching mechanism based on double step data stream[J].Computer Enginee-ring,2019,45(6):115-118,126.
[11]JALEEL A,THEOBALD K B,STEELY S C,et al.High performance cache replacement using re-reference interval prediction(RRIP)[C]//International Symposium on Computer Architecture.ACM,2010.
[12]ZHUANG X T,LEE H.A hardware-based cache pollution filtering mechanism for aggressive prefetches[C]//2003 International Conference on Parallel Processing.IEEE,2003.
[13]PALANCA S,PENTKOVSKI V,TSAI S,et al.Method and apparatus for implementing Nontemporal stores.U.S.Patent 6,205,520[P].2001.
[14]SANDBERG A,EKLOV D,HAGERSTEN E.Reducing cache pollution through detection and elimination of Nontemporal memory accesses[C]//Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE Computer Society,2010:1-11.
[15]Intel.Intel©64 and IA-32 Architectures Software Developer’s Manuals,Volume 2B:Instruction Set Reference [Z].September 2016.
[16]ARM.ARM Architecture Reference Manual,ARMv8,for ARM-v8-Aarchitecture profile [Z].September 2016.
[17]KRISHNAIYER R,KULTURSAY E,CHAWLA P,et al.Compiler-Based Data Prefetching and Streaming Nontemporal Store Generation for the Intel(R) Xeon Phi(TM) Coprocessor[C]//2013 IEEE International Symposium on Parallel & Distributed Processing,Workshops and Phd Forum.IEEE,2013:1576-1586.
[18]Intel© C++ Compiler Classic Developer Guide and Reference [Z].Version 2021.1,December 2020.
[19]Free Software Foundation,Inc.GCC,the GNU compiler collection [EB/OL].(2017-05-02).
[20]MILLER D W,III D.Performance analysis of disk cache write policies [J].Microprocessors& Micro-systems,1995,19(3):121-130.
[21]SPEC CPU2006 [EB/OL].(2011-10-20).
[22]SPEC CPU2017 [EB/OL].(2021-04-07).
[23]MOWRY T C,LAM M S,GUPTA A.Design and Evaluation of a Compiler Algorithm for Prefetching[J/OL].Aplos,1992.
[1] LU Hao-song, HU Yong-hua, WANG Shu-ying, ZHOU Xin-lian, LI Hui-xiang. Study on Hybrid Resource Heuristic Loop Unrolling Factor Selection Method Based on Vector DSP [J]. Computer Science, 2022, 49(6A): 777-783.
[2] WANG Bo-yang, PANG Jian-min, XU Jin-long, ZHAO Jie, TAO Xiao-han, ZHU Yu. Matrix Multiplication Vector Code Generation Based on Polyhedron Model [J]. Computer Science, 2022, 49(10): 44-51.
[3] TANG Zhen, HU Yong-hua, LU Hao-song, WANG Shu-ying. Research on DSP Register Pairs Allocation Algorithm with Weak Assigning Constraints [J]. Computer Science, 2021, 48(6A): 587-595.
[4] CHEN Tao, SHU Hui, XIONG Xiao-bing. Study of Universal Shellcode Generation Technology [J]. Computer Science, 2021, 48(4): 288-294.
[5] HU Wei-fang, CHEN Yun, LI Ying-ying, SHANG Jian-dong. Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation [J]. Computer Science, 2021, 48(12): 49-58.
[6] YANG Ping, WANG Sheng-yuan. Analysis of Target Code Generation Mechanism of CompCert Compiler [J]. Computer Science, 2020, 47(9): 17-23.
[7] DING Rong, YU Qian-hui. Growth Framework of Autonomous Unmanned Systems Based on AADL [J]. Computer Science, 2020, 47(12): 87-92.
[8] LI Peng-yuan, ZHAO Rong-cai, GAO Wei and ZHANG Qing-hua. Effective Vectorization Technique for Interleaved Data with Constant Strides [J]. Computer Science, 2015, 42(5): 194-199.
[9] GE Hong-mei,XU Chao,CHEN Nian and LIAO Xi-mi. Low Power Optimization Method Oriented to Embedded System’s Bus [J]. Computer Science, 2013, 40(12): 31-36.
[10] JI Ying-hui,ZHANG Jian-dong,CAI Wei,CAI Hui-zhi. RapidIO User-level Communication Interface Realization Based on RDMA [J]. Computer Science, 2010, 37(6): 293-296.
[11] TIAN Zu-wei,SUN Guang. Research of Compiler Optimization Technology Based on Predicated Code [J]. Computer Science, 2010, 37(5): 130-133.
[12] ZHANG Li-yong CHEN Ping (Software Engineering Institute, Xidian University, Xi'an 710071, China ). [J]. Computer Science, 2008, 35(5): 284-287.
[13] TANG Wei, WU Cheng-Yong, ZHANG Zhao-Qing (Institute of Computing Technology,Chinese Academy of Sciences, Beijing 100080). [J]. Computer Science, 2006, 33(4): 250-252.
[14] . [J]. Computer Science, 2006, 33(2): 257-262.
Full text



No Suggested Reading articles found!