Computer Science ›› 2024, Vol. 51 ›› Issue (1): 50-59.doi: 10.11896/jsjkx.230600051

• Special Issue on the 55th Anniversary of Computer Science • Previous Articles     Next Articles

Study on Model Migration of Natural Language Processing for Domestic Deep Learning Platform

GE Huibin1, WANG Dexin1, ZHENG Tao2, ZHANG Ting3, XIONG Deyi1   

  1. 1 College of Intelligence and Computing,Tianjin University,Tianjin 300350,China
    2 Nanjing Research Institute,Huawei Technologies Co.Ltd.,Nanjing 210000,China
    3 Global Tone Communication Technology Co.,Ltd.,Beijing 100131,China
  • Received:2023-06-06 Revised:2023-10-07 Online:2024-01-15 Published:2024-01-12
  • About author:GE Huibin,born in 1997,postgraduate.His main research interests include deep learning and nature language processing.
    XIONG Deyi,born in 1979,Ph.D,professor,Ph.D supervisor.His main research interests include natural language processing and machine translation.
  • Supported by:
    Huawei Technologies Co., Ltd. and Tianjin University NRE Cooperation Project(20101300441922C),National Key R & D Program of China(2020AAA0108000)and Key R & D Program of Yunnan Province(202203AA080004).

Abstract: Deep learning platformplays an essential role in the development of the new generation of artificial intelligence.In recent years,the domestic artificial intelligence high-performance software and hardware system of China represented by the Ascend platform has developed rapidly,which opens up a new way for the deep learning platform in China.At the same time,in order to explore and solve the potential loopholes in the Ascend system,the platform developers of Ascend actively carries out the migration of commonly used deep learning models with researchers.These efforts are further promoted from the perspective of natural language processingaiming at how to refine the domestic deep learning platform.Four natural language processing tasks arehighlighted,neural machine translation,machine reading comprehension,sequence labeling and text classification,along with four classical neural models,Albert,RNNSearch,BERT-CRF and TextING.They are migrated on the Ascend platform in details.Based on the above model migration research,this paper integrates the deficiencies of the architecture design of the Ascend platform in the research and business in natural language processing.In conclusion,these deficiencies are sorted out as four essential aspects:1)the lack of the dynamic space allocation characteristics of computing graph nodes;2)incompatibility for the sinking of resource operators on the acceleration-deviceside;3)the fusion of graphics and computing which is not flexible to handle unseen model structures,and 4)the defects of the mixed-precision training strategy.To overcome these problems,this paper puts forward the avoidance methods or solutions.Finally,constructive suggestions are provided for,including but not limited to,the deep-learning platforms in China.

Key words: Natural language processing, Ascend, Deep learning, Model migration, Platform architecture

CLC Number: 

  • TP183
[1]LING S,NGUYEN K,ROUX-LANGLOIS A,et al.A lattice-based group signature scheme with verifier-local revocation [J].Theoretical Computer Science,2018,730(19):1-20.
[2]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.Human Language Technologies,Volume 1,Minneapolis,2019:4171-4186.
[3]JIANG C,CHEN T S.Chinese AI starts from “core”[J].Zhong Guan Cun,2019(2):48-51.
[4]Anonymous.Huawei Released AI Processor Ascend 910[J].Office Information,2019,24(19):25.
[5]RAN Y L.AI Chip Industry And Trends[J].Big Data Time,2019(4):40-45.
[6]Anonymous.Alibaba Released Self-Developed AI Chip Han-Guang 800[J].Intelligent Building & Smart City,2019(10):5.
[7]MA Y J,YU D H,WU T,et al.PaddlePaddle:An Open-Source Deep Learning Platform From Industrial Practice[J].Frontiers of Data and Domputing,2019,1(1):105-115.
[8]YU F.Research on the Next-Generation Deep Learning Framework[J].Big Data Research,2020,6(4):69-78.
[9]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align And Translate[C]//Procee-dings of the International Conference on Learning Representations,2015.
[10]LAN Z Z,CHEN M,GOODMAN S,et al.ALBERT:A Lite BERT for Self-Supervised Learning of Language Representations[C]//Proceedings of the International Conference on Learning Representations.Addis Ababa,2020:1-17.
[11]ZHANG Y F,YU X L,CUI Z Y,et al.Every Document Owns Its Structure:Inductive Text Classification via Graph Neural Networks[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:334-339.
[12]SUTSKEVER I,VINYALS O,LE Q.Sequence to Sequence Learning with Neural Networks[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge,2014:3104-3112.
[13]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[14]CHUNG J Y,GÜLCEHRE C,CHO K,et al.Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[C]//Proceedings of Advances in Neural Information Proces-sing Systems Deep Learning and Representation Learning Workshop.2014.
[15]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,2016:770-778.
[16]MICIKEVICIUS P,NARANG S,ALBEN J,et al.Mixed precision training[C]//Proceedings of International Conference on Learning Representations.Vancouver,2018.
[17]RAJPURKAR P,JIA R,LIANG P.Know What You Don’t Know:Unanswerable Questions for SQuAD[C]//Proceedings of the 56th Annual Meeting of the Association for Computa-tional Linguistics.Melbourne,2018:784-789.
[18]WANG A,SINGH A,MICHAEL J,et al.GLUE:A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[C]//Proceedings of the 2018 EMNLP Workshop:Analyzing and Interpreting Neural Networks for NLP.Brussels,2018:353-355.
[19]PANG B,LEE L.A Sentimental Education:Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C]//Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics.2004:271-278.
[20]SANG E F T K,MEULDER F D.Introduction to the CoNLL-2003 Shared Task:Language-Independent Named Entity Recognition[C]//Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL.2003:142-147.
[21]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:A Method for Automatic Evaluation of Machine Translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computa-tional Linguistics.2002:311-318.
[1] XU Jinlong, GUI Zhonghua, LI Jia'nan, LI Yingying, HAN Lin. FP8 Quantization and Inference Memory Optimization Based on MLIR [J]. Computer Science, 2024, 51(9): 112-120.
[2] DU Yu, YU Zishu, PENG Xiaohui, XU Zhiwei. Padding Load:Load Reducing Cluster Resource Waste and Deep Learning Training Costs [J]. Computer Science, 2024, 51(9): 71-79.
[3] GUO Zhiqiang, GUAN Donghai, YUAN Weiwei. Word-Character Model with Low Lexical Information Loss for Chinese NER [J]. Computer Science, 2024, 51(8): 272-280.
[4] CHEN Siyu, MA Hailong, ZHANG Jianhui. Encrypted Traffic Classification of CNN and BiGRU Based on Self-attention [J]. Computer Science, 2024, 51(8): 396-402.
[5] SUN Yumo, LI Xinhang, ZHAO Wenjie, ZHU Li, LIANG Ya’nan. Driving Towards Intelligent Future:The Application of Deep Learning in Rail Transit Innovation [J]. Computer Science, 2024, 51(8): 1-10.
[6] KONG Lingchao, LIU Guozhu. Review of Outlier Detection Algorithms [J]. Computer Science, 2024, 51(8): 20-33.
[7] TANG Ruiqi, XIAO Ting, CHI Ziqiu, WANG Zhe. Few-shot Image Classification Based on Pseudo-label Dependence Enhancement and NoiseInterferenceReduction [J]. Computer Science, 2024, 51(8): 152-159.
[8] XIAO Xiao, BAI Zhengyao, LI Zekai, LIU Xuheng, DU Jiajin. Parallel Multi-scale with Attention Mechanism for Point Cloud Upsampling [J]. Computer Science, 2024, 51(8): 183-191.
[9] ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan. Diversified Label Matrix Based Medical Image Report Generation [J]. Computer Science, 2024, 51(8): 200-208.
[10] GUO Fangyuan, JI Genlin. Video Anomaly Detection Method Based on Dual Discriminators and Pseudo Video Generation [J]. Computer Science, 2024, 51(8): 217-223.
[11] GAN Run, WEI Xianglin, WANG Chao, WANG Bin, WANG Min, FAN Jianhua. Backdoor Attack Method in Autoencoder End-to-End Communication System [J]. Computer Science, 2024, 51(7): 413-421.
[12] YANG Heng, LIU Qinrang, FAN Wang, PEI Xue, WEI Shuai, WANG Xuan. Study on Deep Learning Automatic Scheduling Optimization Based on Feature Importance [J]. Computer Science, 2024, 51(7): 22-28.
[13] LI Jiaying, LIANG Yudong, LI Shaoji, ZHANG Kunpeng, ZHANG Chao. Study on Algorithm of Depth Image Super-resolution Guided by High-frequency Information ofColor Images [J]. Computer Science, 2024, 51(7): 197-205.
[14] SHI Dianxi, GAO Yunqi, SONG Linna, LIU Zhe, ZHOU Chenlei, CHEN Ying. Deep-Init:Non Joint Initialization Method for Visual Inertial Odometry Based on Deep Learning [J]. Computer Science, 2024, 51(7): 327-336.
[15] FAN Yi, HU Tao, YI Peng. Host Anomaly Detection Framework Based on Multifaceted Information Fusion of SemanticFeatures for System Calls [J]. Computer Science, 2024, 51(7): 380-388.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!