计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 22-28.doi: 10.11896/jsjkx.230500220
杨恒1,2, 刘勤让2, 范旺2, 裴雪2, 魏帅2, 王轩1,2
YANG Heng1,2, LIU Qinrang2, FAN Wang2, PEI Xue2, WEI Shuai2, WANG Xuan1,2
摘要: 随着深度学习和硬件架构的快速发展,模型和硬件架构的多样性导致采用手工优化方式实现深度学习模型的高性能部署面临严峻的挑战,因此现有的AI编译器框架通常采用自动调度的方法来实现这一过程。但是已有的TVM自动调度优化方法中存在着代价模型数据集不平衡以及调度时间过长的问题,为了解决这些问题,提出了一种基于特征重要性的自动调度优化方法。首先采用xgboost算法对特征重要性进行分析,然后基于重要性系数降低特征维度并对数据标签值进行重分配,以实现提高代价模型精度和优化自动调度效率的目的。实验结果表明,应用所提优化方法,使3种深度学习模型的自动调度时间缩短了9.7%~17.2%,推理时间最多缩短了15%。
中图分类号:
[1]JIA Y,SHELHAMER E,DONAHUE J,et al.Caffe:Convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia.NewYork:Association for Computing Machinery,2014:675-678. [2]ABADI M,BARHAM P,CHEN J,et al.Tensorflow:A system for large-scale machine learning[C]//12th{USENIX} Sympo-sium on Operating Systems Design and Implementation({Osdi}16).Sacannah,GA,USA:USENIX Association,2016:265-283. [3]PASZKE A,GROSS S,MASSA F,et al.PyTorch:An Imperative Style,High-Performance Deep Learning Library[C]//Advances in Neural Information Processing Systems 32(NeurIPS 2019).Vancouver,Canada,2019:8024-8035. [4]CHEN T,LI M,LI Y,et al.MXNet:A Flexible and EfficientMachine Learning Library for Heterogeneous Distributed Systems[J].arXiv:1512.01274,2015. [5]CHETLUR S,WOOLLEY C,VANDERMERSCH P,et al.cu-DNN:Efficient Primitives for Deep Learning[J].arXiv:1410.0759,2014. [6]NVIDIA.TensorRT Github repository[EB/OL].[2020-02-04].https://github.com/NVIDIA/TensorRT. [7]GAO J,LIU S,HUANG Z Q,et al.Deep Neural Network Ope-rator Acceleration Library Optimization Based on Domestic Many-core Processor [J].Computer Science,2022,49(5):355-362. [8]CHEN T,MOREAU T,JIANG Z,et al.{TVM}:An automated end-to-end optimizing compiler for deep learning[C]//13th{USENIX} Symposium on Operating Systems Design and Implementation({OSDI}18).Berkeley:{USENIX}Association,2018:578-594. [9]ZHENG S,LIANG Y,WANG S,et al.FlexTensor:An Auto-matic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System[C]//ASPLOS'20:Architectural Support for Programming Languages and Operating Systems.NewYork:Association for Computing Machi-nery,2020:859-873. [10]ROTEM N,FIX J,ABDULRASOOL S,et al.Glow:GraphLowering Compiler Techniques for Neural Networks[J].ar-Xiv:1805.00907,2018. [11]CYPHERS S,BANSAL A K,BHIWANDIWALLA A,et al.Intel nGraph:An Intermediate Representation,Compiler,and Executor for Deep Learning[J].arXiv:1801.08058,2018. [12]CHEN T,ZHENG L,YAN E,et al.Learning to Optimize Tensor Programs[J].arXiv:1805.08166,2018. [13]ZHENG L,JIA C,SUN M,et al.Ansor:Generating High-Performance Tensor Programs for Deep Learning[J].arXiv:2006.06762,2020. [14]ROESCH J,LYUBOMIRSKY S,KIRISAME M,et al.Relay:AHigh-Level IR for Deep Learning[J].arXiv:1904.08368,2019. [15]ROESCH J,LYUBOMIRSKY S,WEBER L,et al.Relay:a new IR for machine learning frameworks[C]//Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages(MAPL 2018).New York:Association for Computing Machinery,2018:58-68. [16]VIKHAR P.A Evolutionary algorithms:A critical review andits future prospects[C]//International Conference on Global Trends in Signal Processing,Information Computing and Communication(ICGTSPICC).IEEE,2016:261-265. [17]LIU G H,LI Y,WANG X L.Optimization of Deep Learning Compiler Acceleration Technology for Aerospace Heterogeneous Platforms[J].Aerospace Control,2022,40(2):60-65. [18]CHEN T,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'16).New York:Association for Computing Machinery,2016:785-794. [19]ZHAO J Q.Research on Compiler auto-tuning Method Based on Deep Reinforce Learning[D].Xi'an:Northwest University,2022. [20]RYU J,SUNG H.MetaTune:Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks[J].arXiv:2102.04199,2021. [21]MU J,WANG M,LI L,et al.A history-based auto-tuningframework for fast and high-performance DNN design on GPU[C]//57th ACM/IEEE Design Automation Conference(DAC).IEEE Press,2020:1-6. [22]ZHENG L,LIU R,SHAO J,et al.TenSet:A Large-scale Program Performance Dataset for Learned Tensor Compilers[C]//Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track(Round 1).2021. |
|