计算机科学 ›› 2024, Vol. 51 ›› Issue (1): 50-59.doi: 10.11896/jsjkx.230600051

• 创刊五十周年特别专题 • 上一篇    下一篇

面向国产深度学习平台的自然语言处理模型迁移研究

葛慧斌1, 王德鑫1, 郑涛2, 张婷3, 熊德意1   

  1. 1 天津大学智能与计算学部 天津300350
    2 华为技术有限公司南京研究所 南京210000
    3 中译语通科技股份有限公司 北京100131
  • 收稿日期:2023-06-06 修回日期:2023-10-07 出版日期:2024-01-15 发布日期:2024-01-12
  • 通讯作者: 熊德意(dyxiong@tju.edu.cn)
  • 作者简介:(gehuibin@tju.edu.cn)
  • 基金资助:
    华为技术有限公司与天津大学NRE合作项目(20101300441922C);国家重点研发计划(2020AAA0108000);云南省重点研发计划(202203AA080004)

Study on Model Migration of Natural Language Processing for Domestic Deep Learning Platform

GE Huibin1, WANG Dexin1, ZHENG Tao2, ZHANG Ting3, XIONG Deyi1   

  1. 1 College of Intelligence and Computing,Tianjin University,Tianjin 300350,China
    2 Nanjing Research Institute,Huawei Technologies Co.Ltd.,Nanjing 210000,China
    3 Global Tone Communication Technology Co.,Ltd.,Beijing 100131,China
  • Received:2023-06-06 Revised:2023-10-07 Online:2024-01-15 Published:2024-01-12
  • About author:GE Huibin,born in 1997,postgraduate.His main research interests include deep learning and nature language processing.
    XIONG Deyi,born in 1979,Ph.D,professor,Ph.D supervisor.His main research interests include natural language processing and machine translation.
  • Supported by:
    Huawei Technologies Co., Ltd. and Tianjin University NRE Cooperation Project(20101300441922C),National Key R & D Program of China(2020AAA0108000)and Key R & D Program of Yunnan Province(202203AA080004).

摘要: 深度学习平台在新一代人工智能的发展中扮演着重要的角色。近年来,以昇腾平台为代表的国产人工智能软硬件系统快速发展,为国产深度学习平台的发展开辟出了新的道路。与此同时,为了发现并解决昇腾系统存在的潜在漏洞,昇腾平台积极开展常用深度学习模型的迁移工作。从自然语言处理算法角度切入,针对机器阅读理解、神经机器翻译、序列标注和文本分类四大自然语言处理任务,以昇腾平台的高性能硬件芯片为基础,探究迁移ALBERT,RNNSearch,BERT-CRF和TextING这4类典型的自然语言处理模型。基于以上迁移研究,发现和整理了昇腾平台架构设计在自然语言处理研究与业务上的主要不足,即计算图节点动态空间的分配特性、资源算子下沉设备侧、图算融合以及混合精度训练4个方面的问题,并为以上问题提出了相应的解决方案,并进行了实验验证。最后,为国产深度学习平台的发展提出未来优化的方向和相关建议。

关键词: 自然语言处理, 昇腾, 深度学习, 模型迁移, 平台构架

Abstract: Deep learning platformplays an essential role in the development of the new generation of artificial intelligence.In recent years,the domestic artificial intelligence high-performance software and hardware system of China represented by the Ascend platform has developed rapidly,which opens up a new way for the deep learning platform in China.At the same time,in order to explore and solve the potential loopholes in the Ascend system,the platform developers of Ascend actively carries out the migration of commonly used deep learning models with researchers.These efforts are further promoted from the perspective of natural language processingaiming at how to refine the domestic deep learning platform.Four natural language processing tasks arehighlighted,neural machine translation,machine reading comprehension,sequence labeling and text classification,along with four classical neural models,Albert,RNNSearch,BERT-CRF and TextING.They are migrated on the Ascend platform in details.Based on the above model migration research,this paper integrates the deficiencies of the architecture design of the Ascend platform in the research and business in natural language processing.In conclusion,these deficiencies are sorted out as four essential aspects:1)the lack of the dynamic space allocation characteristics of computing graph nodes;2)incompatibility for the sinking of resource operators on the acceleration-deviceside;3)the fusion of graphics and computing which is not flexible to handle unseen model structures,and 4)the defects of the mixed-precision training strategy.To overcome these problems,this paper puts forward the avoidance methods or solutions.Finally,constructive suggestions are provided for,including but not limited to,the deep-learning platforms in China.

Key words: Natural language processing, Ascend, Deep learning, Model migration, Platform architecture

中图分类号: 

  • TP183
[1]LING S,NGUYEN K,ROUX-LANGLOIS A,et al.A lattice-based group signature scheme with verifier-local revocation [J].Theoretical Computer Science,2018,730(19):1-20.
[2]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.Human Language Technologies,Volume 1,Minneapolis,2019:4171-4186.
[3]JIANG C,CHEN T S.Chinese AI starts from “core”[J].Zhong Guan Cun,2019(2):48-51.
[4]Anonymous.Huawei Released AI Processor Ascend 910[J].Office Information,2019,24(19):25.
[5]RAN Y L.AI Chip Industry And Trends[J].Big Data Time,2019(4):40-45.
[6]Anonymous.Alibaba Released Self-Developed AI Chip Han-Guang 800[J].Intelligent Building & Smart City,2019(10):5.
[7]MA Y J,YU D H,WU T,et al.PaddlePaddle:An Open-Source Deep Learning Platform From Industrial Practice[J].Frontiers of Data and Domputing,2019,1(1):105-115.
[8]YU F.Research on the Next-Generation Deep Learning Framework[J].Big Data Research,2020,6(4):69-78.
[9]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align And Translate[C]//Procee-dings of the International Conference on Learning Representations,2015.
[10]LAN Z Z,CHEN M,GOODMAN S,et al.ALBERT:A Lite BERT for Self-Supervised Learning of Language Representations[C]//Proceedings of the International Conference on Learning Representations.Addis Ababa,2020:1-17.
[11]ZHANG Y F,YU X L,CUI Z Y,et al.Every Document Owns Its Structure:Inductive Text Classification via Graph Neural Networks[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:334-339.
[12]SUTSKEVER I,VINYALS O,LE Q.Sequence to Sequence Learning with Neural Networks[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge,2014:3104-3112.
[13]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[14]CHUNG J Y,GÜLCEHRE C,CHO K,et al.Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[C]//Proceedings of Advances in Neural Information Proces-sing Systems Deep Learning and Representation Learning Workshop.2014.
[15]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,2016:770-778.
[16]MICIKEVICIUS P,NARANG S,ALBEN J,et al.Mixed precision training[C]//Proceedings of International Conference on Learning Representations.Vancouver,2018.
[17]RAJPURKAR P,JIA R,LIANG P.Know What You Don’t Know:Unanswerable Questions for SQuAD[C]//Proceedings of the 56th Annual Meeting of the Association for Computa-tional Linguistics.Melbourne,2018:784-789.
[18]WANG A,SINGH A,MICHAEL J,et al.GLUE:A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[C]//Proceedings of the 2018 EMNLP Workshop:Analyzing and Interpreting Neural Networks for NLP.Brussels,2018:353-355.
[19]PANG B,LEE L.A Sentimental Education:Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C]//Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics.2004:271-278.
[20]SANG E F T K,MEULDER F D.Introduction to the CoNLL-2003 Shared Task:Language-Independent Named Entity Recognition[C]//Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL.2003:142-147.
[21]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:A Method for Automatic Evaluation of Machine Translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computa-tional Linguistics.2002:311-318.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!