计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 22-28.doi: 10.11896/jsjkx.230500220

• 计算机软件 • 上一篇    下一篇

基于特征重要性的深度学习自动调度优化研究

杨恒1,2, 刘勤让2, 范旺2, 裴雪2, 魏帅2, 王轩1,2   

  1. 1 郑州大学网络空间安全学院 郑州 450003
    2 信息工程大学信息技术研究所 郑州 450002
  • 收稿日期:2023-05-30 修回日期:2023-10-13 出版日期:2024-07-15 发布日期:2024-07-10
  • 通讯作者: 刘勤让(tsaliuqr@163.com)
  • 作者简介:(yh1423828145@163.com)
  • 基金资助:
    国家重点研发计划重点专项(2022YFB4401401);嵩山实验室项目(纳入河南省重大科技专项管理系)(221100211100-01)

Study on Deep Learning Automatic Scheduling Optimization Based on Feature Importance

YANG Heng1,2, LIU Qinrang2, FAN Wang2, PEI Xue2, WEI Shuai2, WANG Xuan1,2   

  1. 1 College of Cyberspace Security,Zhengzhou University,Zhengzhou 450003,China
    2 Institute of Information Technology,Information Engineering University,Zhengzhou 450002,China
  • Received:2023-05-30 Revised:2023-10-13 Online:2024-07-15 Published:2024-07-10
  • About author:YANG Heng,born in 1998,postgra-duate.His main research interest is deep learning compiler.
    LIU Qinrang,born in 1975,Ph.D,professor,Ph.D supervisor.His main research interests include cyberspace security and chip design.
  • Supported by:
    Major Project of National Key R & D Program of China(2022YFB4401401) and Program of Songshan Laboratory(included in the management of Major Science and Technology Program of Henan Province)(221100211100-01).

摘要: 随着深度学习和硬件架构的快速发展,模型和硬件架构的多样性导致采用手工优化方式实现深度学习模型的高性能部署面临严峻的挑战,因此现有的AI编译器框架通常采用自动调度的方法来实现这一过程。但是已有的TVM自动调度优化方法中存在着代价模型数据集不平衡以及调度时间过长的问题,为了解决这些问题,提出了一种基于特征重要性的自动调度优化方法。首先采用xgboost算法对特征重要性进行分析,然后基于重要性系数降低特征维度并对数据标签值进行重分配,以实现提高代价模型精度和优化自动调度效率的目的。实验结果表明,应用所提优化方法,使3种深度学习模型的自动调度时间缩短了9.7%~17.2%,推理时间最多缩短了15%。

关键词: AI编译器, 自动调度, xgboost, 特征重要性, 深度学习

Abstract: With the rapid development of deep learning and hardware architectures,the diversity of models and hardware architectures make the deployment for deep learning models with high performance manually become increasingly challenging.So current AI compiler framework often adopts automatic scheduling.Since the existing optimization to TVM automatic scheduling has such issues as unbalanced data sets in cost model and overlong scheduling time,an automatic scheduling optimization strategy based on feature importance is designed in this paper.First,the feature importance is analyzed through the xgboost algorithm.Then a stra-tegy that reduce the data feature dimensions based on the importance coefficient and reassign the data labels is adopted to improve the precision of the cost model and optimize the efficiency of the automatic scheduling.Experiment results show that the proposed optimization method can reduce the automatic scheduling time of three kinds of deep learning models by 9.7%~17.2%,and reduce the inference time by up to 15%.

Key words: AI compiler, Automatic scheduling, xgboost, Feature importance, Deep learning

中图分类号: 

  • TP302
[1]JIA Y,SHELHAMER E,DONAHUE J,et al.Caffe:Convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia.NewYork:Association for Computing Machinery,2014:675-678.
[2]ABADI M,BARHAM P,CHEN J,et al.Tensorflow:A system for large-scale machine learning[C]//12th{USENIX} Sympo-sium on Operating Systems Design and Implementation({Osdi}16).Sacannah,GA,USA:USENIX Association,2016:265-283.
[3]PASZKE A,GROSS S,MASSA F,et al.PyTorch:An Imperative Style,High-Performance Deep Learning Library[C]//Advances in Neural Information Processing Systems 32(NeurIPS 2019).Vancouver,Canada,2019:8024-8035.
[4]CHEN T,LI M,LI Y,et al.MXNet:A Flexible and EfficientMachine Learning Library for Heterogeneous Distributed Systems[J].arXiv:1512.01274,2015.
[5]CHETLUR S,WOOLLEY C,VANDERMERSCH P,et al.cu-DNN:Efficient Primitives for Deep Learning[J].arXiv:1410.0759,2014.
[6]NVIDIA.TensorRT Github repository[EB/OL].[2020-02-04].https://github.com/NVIDIA/TensorRT.
[7]GAO J,LIU S,HUANG Z Q,et al.Deep Neural Network Ope-rator Acceleration Library Optimization Based on Domestic Many-core Processor [J].Computer Science,2022,49(5):355-362.
[8]CHEN T,MOREAU T,JIANG Z,et al.{TVM}:An automated end-to-end optimizing compiler for deep learning[C]//13th{USENIX} Symposium on Operating Systems Design and Implementation({OSDI}18).Berkeley:{USENIX}Association,2018:578-594.
[9]ZHENG S,LIANG Y,WANG S,et al.FlexTensor:An Auto-matic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System[C]//ASPLOS'20:Architectural Support for Programming Languages and Operating Systems.NewYork:Association for Computing Machi-nery,2020:859-873.
[10]ROTEM N,FIX J,ABDULRASOOL S,et al.Glow:GraphLowering Compiler Techniques for Neural Networks[J].ar-Xiv:1805.00907,2018.
[11]CYPHERS S,BANSAL A K,BHIWANDIWALLA A,et al.Intel nGraph:An Intermediate Representation,Compiler,and Executor for Deep Learning[J].arXiv:1801.08058,2018.
[12]CHEN T,ZHENG L,YAN E,et al.Learning to Optimize Tensor Programs[J].arXiv:1805.08166,2018.
[13]ZHENG L,JIA C,SUN M,et al.Ansor:Generating High-Performance Tensor Programs for Deep Learning[J].arXiv:2006.06762,2020.
[14]ROESCH J,LYUBOMIRSKY S,KIRISAME M,et al.Relay:AHigh-Level IR for Deep Learning[J].arXiv:1904.08368,2019.
[15]ROESCH J,LYUBOMIRSKY S,WEBER L,et al.Relay:a new IR for machine learning frameworks[C]//Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages(MAPL 2018).New York:Association for Computing Machinery,2018:58-68.
[16]VIKHAR P.A Evolutionary algorithms:A critical review andits future prospects[C]//International Conference on Global Trends in Signal Processing,Information Computing and Communication(ICGTSPICC).IEEE,2016:261-265.
[17]LIU G H,LI Y,WANG X L.Optimization of Deep Learning Compiler Acceleration Technology for Aerospace Heterogeneous Platforms[J].Aerospace Control,2022,40(2):60-65.
[18]CHEN T,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'16).New York:Association for Computing Machinery,2016:785-794.
[19]ZHAO J Q.Research on Compiler auto-tuning Method Based on Deep Reinforce Learning[D].Xi'an:Northwest University,2022.
[20]RYU J,SUNG H.MetaTune:Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks[J].arXiv:2102.04199,2021.
[21]MU J,WANG M,LI L,et al.A history-based auto-tuningframework for fast and high-performance DNN design on GPU[C]//57th ACM/IEEE Design Automation Conference(DAC).IEEE Press,2020:1-6.
[22]ZHENG L,LIU R,SHAO J,et al.TenSet:A Large-scale Program Performance Dataset for Learned Tensor Compilers[C]//Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track(Round 1).2021.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!