Computer Science ›› 2025, Vol. 52 ›› Issue (5): 58-66.doi: 10.11896/jsjkx.240100018
• High Performance Computing • Previous Articles Next Articles
GAO Wei1, WANG Lei1,2, LI Jianan1,2, LI Shuailong1,2, HAN Lin1
CLC Number:
[1]ABADI M,BARHAM P,CHEN J,et al.TensorFlow:a system for large-scale machine learning[C]//12th USENIX Symposium on Operating Systems Design and Implementation(OSDI 16).Savannah,GA,USA,Berkeley,CA,USA:USENIX Association,2016:265-283. [2]PASZKE A,GROSS S,MASSA F,et al.Pytorch:an imperative style,high-performance deep learning library[J].Advances in Neural Information Processing Systems,2019,32:8026-8037. [3]LI M Z,LIU Y,LIU X Y,et al.The deep learning compiler:a comprehensive survey[J].IEEE Transactions on Parallel and Distributed Systems,2021,32(3):708-727. [4]ZHANG H B,XING M J,WU Y J,et al.Compiler technologies in deep learning co-design:a survey[J/OL].https://spj.science.org/doi/10.34133/icomputing.0040. [5]LATTNER C,AMINI M,BONDHUGULA U,et al.MLIR:a compiler infrastructure for the end of Moore's law[J].arXiv:2002.11054,2020. [6]CHEN T Q,MOREAU T,JIANG Z H,et al.TVM:an automa-ted end-to-end optimizing compiler for deep learning[C]//13th USENIX Symposium on Operating Systems Design and Implementation(OSDI 18).Carlsbad,CA,USA,Berkeley,CA,USA:USENIX Association,2018:578-594. [7]ROESCH J,LYUBOMIRSKY S,WEBER L,et al.Relay:a newIR for machine learning frameworks[C]//Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages(MAPL 2018).Philadelphia PA,USA,New York,NY,USA:Association for Computing Machinery,2018:58-68. [8]FENG S Y,HOU B H,JIN H Y,et al.TensorIR:an abstraction for automatic tensorized program optimization[C]//Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems,Volume 2(ASPLOS 2023).Vancouver,BC,Canada,New York,NY,USA:Association for Computing Machinery,2023:804-817. [9]RAGAN-KELLEY J,BARNES C,ADAMS A,et al.Halide:alanguage and compiler for optimizing parallelism,locality,and recomputation in image processing pipelines [C]//Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI'13).Seattle,WA,USA,New York,NY,USA:Association for Computing Machinery,2013:519-530. [10]SNIDER D,LIANG R F.Operator fusion in XLA:analysis and evaluation[J].arXiv:2301.13062,2023. [11]JIA Z H,PADON O,THOMAS J,et al.TASO:optimizing deep learning computation with automatic generation of graph substitutions[C]//Proceedings of the 27th ACM Symposium on Operating Systems Principles(SOSP'19).Huntsville,Ontario,Canada,New York,NY,USA:Association for Computing Machinery,2019:47-62. [12]NIU W,GUAN J X,WANG Y Z,et al.DNNFusion:accelera-ting deep neural networks execution with advanced operator fusion[C]//Proceedings of the 42nd ACM SIGPLAN InternationalConference on Programming Language Design and Implementation(PLDI 2021),Vitural Canada.New York,NY,USA:Association for Computing Machinery,2021:883-898. [13]JIA Z H,THOMAS J,WARSZAWSKI T,et al.OptimizingDNN computation with relaxed graph substitutions [J].Proceedings of Machine Learning and Systems,2019,1:27-39. [14]FANG J Z,SHEN Y Y,WANG Y,et al.Optimizing DNN computation graph using graph gubstitutions[J].Proceedings of the VLDB Endowment,2020,13(12):2734-2746. [15]CAI X Y,WANG Y,ZHANG L.Optimus:an operator fusion framework for deep neural networks[J].ACM Transactions on Embedded Computing Systems,2022,22(1):1-26. [16]ZHENG Z,YANG X D,ZHAO P Z,et al.AStitch:enabling anew multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures[C]//Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS 2022).Lausanne Switzerland,New York,NY,USA:Association for Computing Machinery,2022:359-373. [17]ZHAO J,GAO X,XIA R J,et al.Apollo:automatic partition-based operator fusion through layer by layer optimization[J].Proceedings of Machine Learning and Systems,2022,4:1-19. [18]MA L X,XIE Z Q,YANG Z,et al.RAMMER:enabling holistic deep learning compiler optimizations with rtasks[C]//14th USENIX Conference on Operating Systems Design and Implementation(OSDI 20).Berkeley,CA,USA:USENIX Association,2020:881-897. |
[1] | DAI Hanwen, CHEN Changbo. Accelerating Batched Matrix Multiplication for Variable Small Sizes Based on TVM andApplications [J]. Computer Science, 2025, 52(5): 25-40. |
[2] | HAN Lin, WANG Yifan, LI Jianan, GAO Wei. Automatic Scheduling Search Optimization Method Based on TVM [J]. Computer Science, 2025, 52(3): 268-276. |
[3] | XU Jinlong, GUI Zhonghua, LI Jia'nan, LI Yingying, HAN Lin. FP8 Quantization and Inference Memory Optimization Based on MLIR [J]. Computer Science, 2024, 51(9): 112-120. |
[4] | CHEN Yunliang, LIU Hao, ZHU Guishui, HUANG Xiaohui, CHEN Xiaodao, WANG Lizhe. Study on Supervised Learning Model for Optimal Histogram Solution [J]. Computer Science, 2023, 50(9): 145-151. |
[5] | CHEN Ying, HAO Ying-guang, WANG Hong-yu, WANG Kun. Dynamic Programming Track-Before-Detect Algorithm Based on Local Gradient and Intensity Map [J]. Computer Science, 2022, 49(8): 150-156. |
[6] | LIN Bao-ling, JIA Ri-heng, LIN Fei-long, ZHENG Zhong-long, LI Ming-lu. Multi-armed Bandit Model Based on Time-variant Budgets [J]. Computer Science, 2022, 49(11A): 210800212-6. |
[7] | MA Xin-yu, JIANG Chun-mao, HUANG Chun-mei. Optimal Scheduling of Cloud Task Based on Three-way Clustering [J]. Computer Science, 2022, 49(11A): 211100139-7. |
[8] | LI Shuang-gang, ZHANG Shuang, WANG Xing-wei. Cloud Resource Scheduling Mechanism Based on Adaptive Virtual Machine Migration [J]. Computer Science, 2020, 47(9): 238-245. |
[9] | ZHU Ying,XIA Yi-li,PEI Wen-jiang. Fusion of Infrared and Color Visible Images Based on Improved BEMD [J]. Computer Science, 2020, 47(3): 124-129. |
[10] | WANG Zheng-li, XIE Tian, HE Kun and JIN Yan. 0-1 Knapsack Variant with Time Scheduling [J]. Computer Science, 2018, 45(4): 53-59. |
[11] | ZHANG Xun, GU Chun-hua, LUO Fei, CHANG Yao-hui and WEN Geng. Virtual Machine Placement Strategy Based on Dynamic Programming [J]. Computer Science, 2017, 44(8): 54-59. |
[12] | ZHOU Huan-huan and JIANG Ying. Test Configuration Method Based on Dynamic Programming under Cloud Environment [J]. Computer Science, 2014, 41(9): 215-219. |
[13] | LIU Kun-liang,ZHANG Da-kun and WU Ji-gang. Improved Algorithm for Finding Weight-constrained Maximum-density Path [J]. Computer Science, 2014, 41(8): 122-124. |
[14] | . Solving Dynamic 0-1 Knapsack Problems Based on Dynamic Programming Algorithm [J]. Computer Science, 2012, 39(7): 237-241. |
[15] | . Decentralized Multi-Agent Based Cooperative Path Planning for Multi-UAVs [J]. Computer Science, 2012, 39(1): 219-222. |
|