计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 268-276.doi: 10.11896/jsjkx.240100126
韩林1, 王一帆2, 李嘉楠1, 高伟1
HAN Lin1, WANG Yifan2, LI Jianan1, GAO Wei1
摘要: 随着人工智能的迅猛发展,新型算子与硬件不断涌现,算子库的开发和维护面临着巨大的挑战,仅仅依靠手工优化已无法满足AI模型性能提升的需求。Ansor是一种基于TVM的算子自动调度技术,可以针对不同的后端搜索深度学习模型或算子的最佳调度方案,生成高性能代码而无需用户手动定义模板,但其巨大的搜索空间造成了搜索效率低下的问题。因此,提出了两种优化方案:1)基于强化学习的算法实现最佳性能草图的选择;2)基于机器学习模型的突变规则预测。两种优化方案旨在缩短最佳调度方案的搜索时间,快速生成高性能的算子。为评估优化方案的有效性,对Resnet-50等3种模型和conv2d等3种算子进行测试与评估。结果显示,优化后的Ansor只用70%~75%的搜索时间就可以生成性能与之前相同甚至更优的目标程序,并且在最佳迭代次数下,目标程序的推理速度最高可提升5%。
中图分类号:
[1]CHETLUR S,WOOLLEY C,VANDERMERSCH P,et al.cudnn:Efficient Primitives for Deep Learning[J].arXiv:1410.0759,2014. [2]KHAN J,FULTZ P,TAMAZOV A,et al.MIOpen:An OpenSource Library for Deep Learning Primitives[J].arXiv:1910.00078,2020. [3]LI M Z,LIU Y,LIU X Y,et al.The Deep Learning Compiler:A Comprehensive Survey[J].IEEE Transactions on Parallel and Distributed Systems,2020,32(3):708-727. [4]XING Y,WENG J,WANG Y S,et al.An In-depth Comparison of Compilers for Deep Neural Networks on Hardware[C]//2019 IEEE International Conference on Embedded Software and Systems(ICESS).IEEE,2019:1-8. [5]CHEN T Q,MOREAU T,JIANG Z H,et al.TVM:End-to-End Optimization Stack for Deep Learning[C]//Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation(OSDI’18).Carlsbad,USA:USENIX Association,2018:579-594. [6]ZHAO J,LI B J,WANG N,et al.AKG:Automatic Kernel Ge-neration for Neural Processing Units Using Polyhedral Transformations[C]//Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation(PLDI 2021).New York,USA:Association for Computing Machinery,2021:1233-1248. [7]LATTNER C,AMINI M,BONDHUGULA U,et al.MLIR:Scaling Compiler Infrastructure for Domain Specific Computation[C]//2021 IEEE/ACM International Symposium on Code Generation and Optimization(CGO).IEEE,2021:2-14. [8]ABADI M,BARHAM P,CHEN J M,et al.Tensorflow:Large-scale Machine Learning on Heterogeneous Distributed Systems[C]//Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation(OSDI’16).USA:USENIX Association,2016:265-283. [9]PASZKE A,GROSS S,MASSA F,et al.PyTorch:An Imperative Style,High-Performance Deep Learning Library[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.Red Hook,NY,USA:Curran Associates Inc.,2019:8026-8037. [10]GASKILL B.ONNX:the Open Neural Network Exchange Format[J].Linux Journal,2018,TN.285:157-161. [11]ROESCH J,LYUBOMIRSKY S,WEBER L,et al.Relay:ANew IR for Machine Learning Frameworks[C]// Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages(MAPL 2018).New York,NY,USA:Association for Computing Machinery,2018:58-68. [12]RAGAN-KELLEY J,BARNES C,ADAMS A,et al.Halide:A Language and Compiler For Optimizing Parallelism,Locality,And Recomputation in Image Processing Pipelines[J].ACM Sigplan Notices,2013,48(6):519-530. [13]CHEN T Q,ZHENG L M,YAN E,et al.Learning to Optimize Tensor Programs[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems(NIPS’18).USA:Curran Associates Inc.,2018:3393-3404. [14]ZHENG L M,JIA C F,SUN M M,et al.Ansor:Generating High-Performance Tensor Programs for Deep Learning[C]//Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation(OSDI’20).Carlsbad,CA,USA:USENIX Association,2020:863-789. [15]WU J J.A deployment method and device for heterogeneousplatforms based on TVM compiler:CN202010654954[P].2023-12-25. [16]CHEN T Q,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’16).New York,NY,USA:Association for Computing Machinery,2016:785-794. |
|