Computer Science ›› 2017, Vol. 44 ›› Issue (2): 70-74, 81.doi: 10.11896/j.issn.1002-137X.2017.02.008

Previous Articles     Next Articles

Method of Loop Distribution and Aggregation for Partial Vectorization

HAN Lin, XU Jin-long, LI Ying-ying and WANG Yang   

  • Online:2018-11-13 Published:2018-11-13

Abstract: There are a large number of loops which contain few unvectorizable statements and many vectorizable statements.Loop distribution separates these specific statements into different loops,and then partial vectorization can be achieved.Currently,the mainstream optimizing compiler just support loop distribution which is simple and aggressive,resulting in large loop overhead and bad reuse of register and cache.To solve these problems,a method of loop distribution and aggregation for partial vectorization was proposed.Firstly,two key issues were analyzed in loop distribution,which are grouping of statements and execution order of distributed loops.Secondly,a modified topological sorting method was presented to achieve better loop aggregation,which reduces the loop overhead.Finally,we evaluated the proposed method in the experimental section.The experimental results show that the proposed method can produce correct SIMD code,and can significantly improve the efficiency of implementation program.

Key words: Prtial vectorization,Loop distribution,Loop aggregation,Aggregation graph

[1] KENNEDY K,MCKINLEY K S.Loop distribution with arbitrary control flow[C]∥Proceedings of Supercomputing’90.IEEE,1990:407-416.
[2] MCKINLEY ,KEN K,KATHRYN S.Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution[M]∥Languages and Compilers for Parallel Computing.Springer Berlin Heidelberg,1997:301-320.
[3] LARSEN S,AMARASINGHE S.Exploiting superword levelparallelism with multimedia instruction sets[C]∥Proceedings of the SIGPLAN’00 Conference on Programming Language Design and Implementation,2000:145-156.
[4] PARK Y,SEO S,PRAK H,et al.Simd defragmenter:efficient ilp realization on data-parallel architectures[J].ACM SIGARCH Computer Architecture News.ACM,2012,40(1):363-374.
[5] BARIK R,ZHAO J,SARKAR V.Efficient selection of vectorinstructions using dynamic programming[C]∥2010 43rd An-nual IEEE/ACM International Symposium on Microarchitecture (MICRO).IEEE,2010:201-212.
[6] KIM S,HAN H.Efficient SIMD code generation for irregularkernels[J].ACM Sigplan Notices,2012,47(8):55-64.
[7] LIU J,ZHANG Y,JANG O,et al.A compiler framework for extracting superword level parallelism[J].ACM Sigplan Notices,2012,47(6):347-357.
[8] RAMANARAYANAN R,GUPTA M,C HAKRABORTY S S,et al.Harnessing partial vectorization in Open64 compiler[C]∥2014 IEEE International Advance Computing Conference (IACC).IEEE,2014:813-824.
[9] ALLEN R,KENNEDY K.Optimizing compilers for modern ar-chitectures:a dependence-based approach[M].San Francisco:Morgan Kaufmann,2002.
[10] GCC Team.Gcc,the gnu compiler collection.http://gcc.gnu.org.
[11] Intel Corporation.Intel C and C++ Compilers.https://software.intel.com/en-us/intel-compilers.
[12] Open64.Overview of the open64 Compiler Infrastructure[EB/OL].http://open64.sourceforge.net.
[13] CHEN K H,Shen B Y,Yang W.An automatic superword vectorization in LLVM[C]∥16th Workshop on Compiler Techniques for High-Performance and Embedded Computing.2010:19-27.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .