Computer Science ›› 2014, Vol. 41 ›› Issue (6): 12-17.doi: 10.11896/j.issn.1002-137X.2014.06.003

Previous Articles     Next Articles

Access Optimization Technique for Mathematical Library of Slave Processors on Heterogeneous Many-core Architectures

XU Jin-chen,GUO Shao-zhong,HUANG Yong-zhong and WANG Lei   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Due to the nature of mathematical function’s algorithms,there are a great deal of access operations remaining in reality.In the heterogeneous many-core architectures,which is becoming ubiquitous recently,the slave processors are equipped with shared memory to access data,thereby impacting the accessing rate heavily.Therefore,the performance of the mathematical library’s functions is not able to meet requirements of high performance computing.To efficiently solve this problem,this study proposesd a novel accessing instructions based scheduling strategy to cover the access delay with the necessary computation.With the help of the dynamic calling mode,an algorithm called ldm_call was introduced based on the LDM (local data memory) of the slave processors,which can speed up the accessing rate significantly.These two optimizing technologies both possess general applicability in the shared memory.At the same time,they can efficiently reduce the accessing frequency and speed up the accessing rate.The experimental results show that they can improve the functions’ performance 16.08% and 37.32% on average respectively.

Key words: Heterogeneous many-core,Mathematical library,Access optimization,Instruction-scheduling,Local data memory

[1] Zhou Hui-yang,Conte T M.Performance modeling of memorylatency hiding techniques[R].Technical report,ECE Department,N.C.State University,January 2003
[2] Lebeck A R,Koppanalil J,Li T,et al.A large,fast instruction window for tolerating cache misses[C]∥Proceedings of the 29th International Symposium on Computer Architecture(ISCA’02).Anchorage,Alaska,USA,IEEE Computer Society,2002:59-70
[3] Wang P H,Wang H,Collins J D,et al.Memory latency-tolerance approaches for itanium processors:out-of-order executionvs.speculative precomputation[C]∥Proceedings of the 8th International Symposium on High Performance Computer Architecture(HPCA’02).Boston,Massachusettes,USA:IEEE Computer Society,2002:187-196
[4] Beyls K,D’Hollander E.Compiler generated multithreading to alleviate memory latency[J].Journal of Universal Computer Science,2000,6(10):968-993
[5] 贺红,朱大铭,马绍汉.用神经网络求解时间依赖网络最短路径问题的新算法[J].复旦学报:自然科学版,2004,3(5):714-716
[6] Raman E,Hundt R,Mannarswamy S.Structure layout optimiza-tion for multithreaded programs[C]∥Proceedings of the International Symposium on Code Generation and Optimization(CGO’07).San Jose:IEEE Computer Society,2007:271-282
[7] Lattner C,Adve V.Auto-matic pool allocation:improving performance by controlling data structure layout in the heap[C]∥Proceedings of the 2005ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI’05) .Chicago,IL,USA:ACM Press,2005:129-142
[8] 黄安文,高军,张民选.多核处理器非一致Cache体系结构延迟优化技术研究综述[J].计算机研究与发展,2012,49(S1):118-124
[9] 李浩,谢伦国.片上多处理器末级Cache优化技术研究综述[J].计算机研究与发展,2012,49(Supp1):172-179
[10] 余磊,刘志勇,宋风龙.LU分解在众核结构仿真器上的指令级调度研究[J].系统仿真学报,2011,3(12):2603-2610
[11] Allen R,Kennedy K.Optimizing Compilers for Modern Archi-tectures,A Dependence-Based Approach[M]∥Elsevier Science,2004:47-374
[12] Zhao Jie,Zhao Rong-cai,Han Lin.A Nonlinear Array Subscripts Dependence Test[C]∥Proceedings of the 2012IEEE 14th International Conference on High Performance Computing and Communications(HPCC’12).Liverpool,IEEE Computer Society,2012:764-771
[13] Rau B R,Fisher J A.Instruction level parallel-processing:history,overview and perspective[J].The Journal of Supercompu-ting,1993,7(1):950
[14] Garey M R,Johnson D S.Computers and Intractability:A Guide to the Theory of NP-Completeness[M].Freeman W H.Co,San Francisco,1979

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!