Computer Science ›› 2024, Vol. 51 ›› Issue (6): 52-60.doi: 10.11896/jsjkx.230800049
• Computer Software • Previous Articles Next Articles
LIU Lei1, ZHOU Zhide1, LIU Xingxiang2, CHE Haoyang3, YAO Lei3, JIANG He1
CLC Number:
[1]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826. [2]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901. [3]CAESAR H,BANKITI V,LANG A H,et al.nuscenes:A multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11621-11631. [4]Nvidia.NVIDIA Tensor Cores[OL].https://www.nvidia.com/enus/datacenter/tensorcore/. [5]LIAO H,TU J,XIA J,et al.DaVinci:A Scalable Architecture for Neural Network Computing[C]//Hot Chips Symposium.2019:1-44. [6]JOUPPI N P,YOUNG C,PATIL N,et al.In-datacenter per-formance analysis of a tensor processing unit[C]//Proceedings of the 44th Annual International Symposiumon Computer Architecture.2017:1-12. [7]CHEN T,MOREAU T,JIANG Z,et al.TVM:An automatedEnd-to-End optimizing compiler for deep learning[C]//13th USENIX Symposium on Operating Systems Design and Implementation(OSDI 18).2018:578-594. [8]VASILACHE N,ZINENKO O,THEODORIDIS T,et al.Tensor comprehensions:Framework-agnostic high-performance machine learning abstractions[J].arXiv:1802.04730,2018. [9]CYPHERS S,BANSAL A K,BHIWANDIWALLA A,et al.Intel ngraph:An intermediate representation,compiler,and executor for deep learning[J].arXiv:1801.08058,2018. [10]Google(2022).XLA:Domain-specific compiler for linear algebra to optimize tensorflow computations[OL].https://www.tensorflow.org/xla/jit. [11]ZHENG S,CHEN R,WEI A,et al.AMOS:enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction[C]//Proceedings of the 49th Annual International Symposium on Computer Architecture.2022:874-887. [12]FENG S,HOU B,JIN H,et al.Tensorir:An abstraction for automatic tensorized program optimization[C]//Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.2023:804-817. [13]JOO Y M,MCKEOWN N.Doubling memory bandwidth for network buffers[C]//The Conference on Computer Communications.Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies(INFOCOM’98).IEEE,1998,2:808-815. [14]WENG J,JAIN A,WANG J,et al.UNIT:Unifying tensorized instruction compilation[C]//2021 IEEE/ACM International Symposium on Code Generation and Optimization(CGO).IEEE,2021:77-89. [15]HE K,ZHANG X,REN S,et al.Deep residual learning for imagerecognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [16]RAGAN-KELLEY J,BARNES C,ADAMS A,et al.Halide:a language and compiler for optimizing parallelism,locality,and recomputation in image processing pipelines[J].ACM Sigplan Notices,2013,48(6):519-530. [17]BAGHDADI R,RAY J,ROMDHANE M B,et al.Tiramisu:A polyhedral compiler for expressing fast and portable code[C]//2019 IEEE/ACM International Symposium on Code Generation and Optimization(CGO).IEEE,2019:193-205. [18]ZHAO J,LI B,NIE W,et al.AKG:automatic kernel generation for neural processing units using polyhedral transformations[C]//Proceedings of the 42nd ACM SIG-PLAN International Conference on Programming Language Design and Implementation.2021:1233-1248. [19]LIU Y,WANG Y,YU R,et al.Optimizing CNN model inference on CPUs[C]//2019 USENIX Annual Technical Conference(USENIX ATC 19).2019:1025-1040. [20]LI R,XU Y,SUKUMARAN-RAJAM A,et al.Analytical cha-racterization and design space exploration for optimization of CNNs[C]//Proceedings of the 26th ACM International Confe-rence on Architectural Support for Programming Languages and Operating Systems.2021:928-942. [21]DING Y,YU C H,ZHENG B,et al.Hidet:Task-mapping programming paradigm for deep learning tensor programs[C]//Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.2023:370-384. [22]HU Y,WANG Y,DAN X,et al.Cost-Aware TVM(CAT) Tensorization for Modern Deep Learning Accelerators[C]//2022 IEEE 40th International Conference on Computer Design(ICCD).IEEE,2022:352-359. [23]CHEN T,ZHENG L,YAN E,et al.Learning to optimize tensor programs[J].Advances in Neural Information Processing Systems,2018,31:3393-3404. [24]ZHENG L,JIA C,SUN M,et al.Ansor:Generating High-Performance tensor programs for deep learning[C]//14th USENIX Symposium on Operating Systems Design and Implementation(OSDI 20).2020:863-879. [25]ZHU H,WU R,DIAO Y,et al.ROLLER:Fast and efficient tensor compilation for deep learning[C]//16th USENIX Sympo-sium on Operating Systems Design and Implementation(OSDI 22).2022:233-248. |
[1] | LIANG Jiali, HUA Baojian, SU Shaobo. Tensor Instruction Generation Optimization Fusing with Loop Partitioning [J]. Computer Science, 2023, 50(2): 374-383. |
|