计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 268-276.doi: 10.11896/jsjkx.240100126

• 人工智能 • 上一篇    下一篇

一种基于TVM的自动调度搜索优化方法

韩林1, 王一帆2, 李嘉楠1, 高伟1   

  1. 1 郑州大学国家超级计算郑州中心 郑州 450001
    2 郑州大学计算机与人工智能学院 郑州 450001
  • 收稿日期:2024-01-12 修回日期:2024-06-23 出版日期:2025-03-15 发布日期:2025-03-07
  • 通讯作者: 高伟(yongwu22@126.com)
  • 作者简介:(hanlin@zzu.edu.cn)
  • 基金资助:
    河南省重大科技专项(221100210600)

Automatic Scheduling Search Optimization Method Based on TVM

HAN Lin1, WANG Yifan2, LI Jianan1, GAO Wei1   

  1. 1 National Supercomputing Center in Zhengzhou,Zhengzhou University,Zhengzhou 450001,China
    2 School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China
  • Received:2024-01-12 Revised:2024-06-23 Online:2025-03-15 Published:2025-03-07
  • About author:HAN Lin,born in 1978,Ph.D,associate professor,is a senior member of CCF(No.16416M).His main research interests include compiler optimization and high-performance computing.
    GAO Wei,born in 1988,Ph.D.His main research interests include AI complier optimation and advanced compilation technology.
  • Supported by:
    Major Science and Technology Special Project of Henan Province(221100210600).

摘要: 随着人工智能的迅猛发展,新型算子与硬件不断涌现,算子库的开发和维护面临着巨大的挑战,仅仅依靠手工优化已无法满足AI模型性能提升的需求。Ansor是一种基于TVM的算子自动调度技术,可以针对不同的后端搜索深度学习模型或算子的最佳调度方案,生成高性能代码而无需用户手动定义模板,但其巨大的搜索空间造成了搜索效率低下的问题。因此,提出了两种优化方案:1)基于强化学习的算法实现最佳性能草图的选择;2)基于机器学习模型的突变规则预测。两种优化方案旨在缩短最佳调度方案的搜索时间,快速生成高性能的算子。为评估优化方案的有效性,对Resnet-50等3种模型和conv2d等3种算子进行测试与评估。结果显示,优化后的Ansor只用70%~75%的搜索时间就可以生成性能与之前相同甚至更优的目标程序,并且在最佳迭代次数下,目标程序的推理速度最高可提升5%。

关键词: 自动调度, TVM编译器, 搜索速度优化, 机器学习, 强化学习, 深度学习模型

Abstract: With the rapid development of artificial intelligence and the continuous emergence of new operators and hardware,the development and maintenance of operator libraries face enormous challenges.Relying solely on manual optimization can no longer meet the needs of improving AI model performance.Ansor is an operator automatic scheduling technique based on TVM,which can search for the best scheduling schemes for different backend deep learning models or operators,generate high-performance code without the need for users to manually define templates.However,the huge search space results in low search efficiency.Therefore,two optimization schemes are proposed.One is to select the optimal performance sketch based on Reinforcement lear-ning algorithm,and the other is to predict mutation rules based on machine learning models.Two optimization schemes aim to reduce the search time for the optimal scheduling scheme and quickly generate high-performance operators.To evaluate the effectiveness of the optimization plan,three models such as Resnet-50 and three operators such as conv2d are tested and evaluated.The results show that the optimized Ansor can generate target programs with the same or even better performance as before in only 70%~75% search time.Moreover,under the optimal iteration number,the inference speed of the target program can be improved by up to 5%.

Key words: Auto schedule, TVM complier, Optimizing search speed, Machine learning, Reinforcement learning, Deep learning model

中图分类号: 

  • TP311
[1]CHETLUR S,WOOLLEY C,VANDERMERSCH P,et al.cudnn:Efficient Primitives for Deep Learning[J].arXiv:1410.0759,2014.
[2]KHAN J,FULTZ P,TAMAZOV A,et al.MIOpen:An OpenSource Library for Deep Learning Primitives[J].arXiv:1910.00078,2020.
[3]LI M Z,LIU Y,LIU X Y,et al.The Deep Learning Compiler:A Comprehensive Survey[J].IEEE Transactions on Parallel and Distributed Systems,2020,32(3):708-727.
[4]XING Y,WENG J,WANG Y S,et al.An In-depth Comparison of Compilers for Deep Neural Networks on Hardware[C]//2019 IEEE International Conference on Embedded Software and Systems(ICESS).IEEE,2019:1-8.
[5]CHEN T Q,MOREAU T,JIANG Z H,et al.TVM:End-to-End Optimization Stack for Deep Learning[C]//Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation(OSDI’18).Carlsbad,USA:USENIX Association,2018:579-594.
[6]ZHAO J,LI B J,WANG N,et al.AKG:Automatic Kernel Ge-neration for Neural Processing Units Using Polyhedral Transformations[C]//Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation(PLDI 2021).New York,USA:Association for Computing Machinery,2021:1233-1248.
[7]LATTNER C,AMINI M,BONDHUGULA U,et al.MLIR:Scaling Compiler Infrastructure for Domain Specific Computation[C]//2021 IEEE/ACM International Symposium on Code Generation and Optimization(CGO).IEEE,2021:2-14.
[8]ABADI M,BARHAM P,CHEN J M,et al.Tensorflow:Large-scale Machine Learning on Heterogeneous Distributed Systems[C]//Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation(OSDI’16).USA:USENIX Association,2016:265-283.
[9]PASZKE A,GROSS S,MASSA F,et al.PyTorch:An Imperative Style,High-Performance Deep Learning Library[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.Red Hook,NY,USA:Curran Associates Inc.,2019:8026-8037.
[10]GASKILL B.ONNX:the Open Neural Network Exchange Format[J].Linux Journal,2018,TN.285:157-161.
[11]ROESCH J,LYUBOMIRSKY S,WEBER L,et al.Relay:ANew IR for Machine Learning Frameworks[C]// Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages(MAPL 2018).New York,NY,USA:Association for Computing Machinery,2018:58-68.
[12]RAGAN-KELLEY J,BARNES C,ADAMS A,et al.Halide:A Language and Compiler For Optimizing Parallelism,Locality,And Recomputation in Image Processing Pipelines[J].ACM Sigplan Notices,2013,48(6):519-530.
[13]CHEN T Q,ZHENG L M,YAN E,et al.Learning to Optimize Tensor Programs[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems(NIPS’18).USA:Curran Associates Inc.,2018:3393-3404.
[14]ZHENG L M,JIA C F,SUN M M,et al.Ansor:Generating High-Performance Tensor Programs for Deep Learning[C]//Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation(OSDI’20).Carlsbad,CA,USA:USENIX Association,2020:863-789.
[15]WU J J.A deployment method and device for heterogeneousplatforms based on TVM compiler:CN202010654954[P].2023-12-25.
[16]CHEN T Q,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’16).New York,NY,USA:Association for Computing Machinery,2016:785-794.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!