Computer Science ›› 2024, Vol. 51 ›› Issue (1): 50-59.doi: 10.11896/jsjkx.230600051

• Special Issue on the 55th Anniversary of Computer Science • Previous Articles     Next Articles

Study on Model Migration of Natural Language Processing for Domestic Deep Learning Platform

GE Huibin1, WANG Dexin1, ZHENG Tao2, ZHANG Ting3, XIONG Deyi1   

  1. 1 College of Intelligence and Computing,Tianjin University,Tianjin 300350,China
    2 Nanjing Research Institute,Huawei Technologies Co.Ltd.,Nanjing 210000,China
    3 Global Tone Communication Technology Co.,Ltd.,Beijing 100131,China
  • Received:2023-06-06 Revised:2023-10-07 Online:2024-01-15 Published:2024-01-12
  • About author:GE Huibin,born in 1997,postgraduate.His main research interests include deep learning and nature language processing.
    XIONG Deyi,born in 1979,Ph.D,professor,Ph.D supervisor.His main research interests include natural language processing and machine translation.
  • Supported by:
    Huawei Technologies Co., Ltd. and Tianjin University NRE Cooperation Project(20101300441922C),National Key R & D Program of China(2020AAA0108000)and Key R & D Program of Yunnan Province(202203AA080004).

Abstract: Deep learning platformplays an essential role in the development of the new generation of artificial intelligence.In recent years,the domestic artificial intelligence high-performance software and hardware system of China represented by the Ascend platform has developed rapidly,which opens up a new way for the deep learning platform in China.At the same time,in order to explore and solve the potential loopholes in the Ascend system,the platform developers of Ascend actively carries out the migration of commonly used deep learning models with researchers.These efforts are further promoted from the perspective of natural language processingaiming at how to refine the domestic deep learning platform.Four natural language processing tasks arehighlighted,neural machine translation,machine reading comprehension,sequence labeling and text classification,along with four classical neural models,Albert,RNNSearch,BERT-CRF and TextING.They are migrated on the Ascend platform in details.Based on the above model migration research,this paper integrates the deficiencies of the architecture design of the Ascend platform in the research and business in natural language processing.In conclusion,these deficiencies are sorted out as four essential aspects:1)the lack of the dynamic space allocation characteristics of computing graph nodes;2)incompatibility for the sinking of resource operators on the acceleration-deviceside;3)the fusion of graphics and computing which is not flexible to handle unseen model structures,and 4)the defects of the mixed-precision training strategy.To overcome these problems,this paper puts forward the avoidance methods or solutions.Finally,constructive suggestions are provided for,including but not limited to,the deep-learning platforms in China.

Key words: Natural language processing, Ascend, Deep learning, Model migration, Platform architecture

CLC Number: 

  • TP183
[1]LING S,NGUYEN K,ROUX-LANGLOIS A,et al.A lattice-based group signature scheme with verifier-local revocation [J].Theoretical Computer Science,2018,730(19):1-20.
[2]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.Human Language Technologies,Volume 1,Minneapolis,2019:4171-4186.
[3]JIANG C,CHEN T S.Chinese AI starts from “core”[J].Zhong Guan Cun,2019(2):48-51.
[4]Anonymous.Huawei Released AI Processor Ascend 910[J].Office Information,2019,24(19):25.
[5]RAN Y L.AI Chip Industry And Trends[J].Big Data Time,2019(4):40-45.
[6]Anonymous.Alibaba Released Self-Developed AI Chip Han-Guang 800[J].Intelligent Building & Smart City,2019(10):5.
[7]MA Y J,YU D H,WU T,et al.PaddlePaddle:An Open-Source Deep Learning Platform From Industrial Practice[J].Frontiers of Data and Domputing,2019,1(1):105-115.
[8]YU F.Research on the Next-Generation Deep Learning Framework[J].Big Data Research,2020,6(4):69-78.
[9]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align And Translate[C]//Procee-dings of the International Conference on Learning Representations,2015.
[10]LAN Z Z,CHEN M,GOODMAN S,et al.ALBERT:A Lite BERT for Self-Supervised Learning of Language Representations[C]//Proceedings of the International Conference on Learning Representations.Addis Ababa,2020:1-17.
[11]ZHANG Y F,YU X L,CUI Z Y,et al.Every Document Owns Its Structure:Inductive Text Classification via Graph Neural Networks[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:334-339.
[12]SUTSKEVER I,VINYALS O,LE Q.Sequence to Sequence Learning with Neural Networks[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge,2014:3104-3112.
[13]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[14]CHUNG J Y,GÜLCEHRE C,CHO K,et al.Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[C]//Proceedings of Advances in Neural Information Proces-sing Systems Deep Learning and Representation Learning Workshop.2014.
[15]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,2016:770-778.
[16]MICIKEVICIUS P,NARANG S,ALBEN J,et al.Mixed precision training[C]//Proceedings of International Conference on Learning Representations.Vancouver,2018.
[17]RAJPURKAR P,JIA R,LIANG P.Know What You Don’t Know:Unanswerable Questions for SQuAD[C]//Proceedings of the 56th Annual Meeting of the Association for Computa-tional Linguistics.Melbourne,2018:784-789.
[18]WANG A,SINGH A,MICHAEL J,et al.GLUE:A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[C]//Proceedings of the 2018 EMNLP Workshop:Analyzing and Interpreting Neural Networks for NLP.Brussels,2018:353-355.
[19]PANG B,LEE L.A Sentimental Education:Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C]//Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics.2004:271-278.
[20]SANG E F T K,MEULDER F D.Introduction to the CoNLL-2003 Shared Task:Language-Independent Named Entity Recognition[C]//Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL.2003:142-147.
[21]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:A Method for Automatic Evaluation of Machine Translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computa-tional Linguistics.2002:311-318.
[1] HOU Jing, DENG Xiaomei, HAN Pengwu. Survey on Domain Limited Relation Extraction [J]. Computer Science, 2024, 51(1): 252-265.
[2] YAN Zhihao, ZHOU Zhangbing, LI Xiaocui. Survey on Generative Diffusion Model [J]. Computer Science, 2024, 51(1): 273-283.
[3] MAO Xin, LEI Zhanyao, QI Zhengwei. Automated Kaomoji Extraction Based on Large-scale Danmaku Texts [J]. Computer Science, 2024, 51(1): 284-294.
[4] GU Shiwei, LIU Jing, LI Bingchun, XIONG Deyi. Survey of Unsupervised Sentence Alignment [J]. Computer Science, 2024, 51(1): 60-67.
[5] JING Yeyiran, YU Zeng, SHI Yunxiao, LI Tianrui. Review of Unsupervised Domain Adaptive Person Re-identification Based on Pseudo-labels [J]. Computer Science, 2024, 51(1): 72-83.
[6] JIN Yu, CHEN Hongmei, LUO Chuan. Interest Capturing Recommendation Based on Knowledge Graph [J]. Computer Science, 2024, 51(1): 133-142.
[7] SUN Shukui, FAN Jing, SUN Zhongqing, QU Jinshuai, DAI Tingting. Survey of Image Data Augmentation Techniques Based on Deep Learning [J]. Computer Science, 2024, 51(1): 150-167.
[8] WANG Weijia, XIONG Wenzhuo, ZHU Shengjie, SONG Ce, SUN He, SONG Yulong. Method of Infrared Small Target Detection Based on Multi-depth Feature Connection [J]. Computer Science, 2024, 51(1): 175-183.
[9] CHEN Tianyi, XUE Wen, QUAN Yuhui, XU Yong. Raindrop In-Situ Captured Benchmark Image Dataset and Evaluation [J]. Computer Science, 2024, 51(1): 190-197.
[10] SHI Dianxi, LIU Yangyang, SONG Linna, TAN Jiefu, ZHOU Chenlei, ZHANG Yi. FeaEM:Feature Enhancement-based Method for Weakly Supervised Salient Object Detection via Multiple Pseudo Labels [J]. Computer Science, 2024, 51(1): 233-242.
[11] ZHOU Wenhao, HU Hongtao, CHEN Xu, ZHAO Chunhui. Weakly Supervised Video Anomaly Detection Based on Dual Dynamic Memory Network [J]. Computer Science, 2024, 51(1): 243-251.
[12] ZHAO Mingmin, YANG Qiuhui, HONG Mei, CAI Chuang. Smart Contract Fuzzing Based on Deep Learning and Information Feedback [J]. Computer Science, 2023, 50(9): 117-122.
[13] LI Haiming, ZHU Zhiheng, LIU Lei, GUO Chenkai. Multi-task Graph-embedding Deep Prediction Model for Mobile App Rating Recommendation [J]. Computer Science, 2023, 50(9): 160-167.
[14] HUANG Hanqiang, XING Yunbing, SHEN Jianfei, FAN Feiyi. Sign Language Animation Splicing Model Based on LpTransformer Network [J]. Computer Science, 2023, 50(9): 184-191.
[15] ZHU Ye, HAO Yingguang, WANG Hongyu. Deep Learning Based Salient Object Detection in Infrared Video [J]. Computer Science, 2023, 50(9): 227-234.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!