计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 136-141.doi: 10.11896/jsjkx.210100025
蒋琪1, 苏伟1, 谢莹2, 周弘安平2, 张久文1, 蔡川1
JIANG Qi1, SU Wei1, XIE Ying2, ZHOUHONG An-ping2, ZHANG Jiu-wen1, CAI Chuan1
摘要: 汉字到盲文自动转换是改善我国1700万视障人群生活学习和贯彻落实国家信息无障碍建设的重要问题。现有汉盲转换方法均采用多步转换方法,先对汉字文本进行盲文分词连写,再对汉字进行标调,最后结合分词和标调信息合成盲文文本。该文提出一种基于编码器-解码器模型Transformer的端到端汉盲转换方法,利用汉字-盲文对照语料库训练Transformer模型。基于《人民日报》六个月约1200万字中文语料,该文构建了国家通用盲文、现行盲文、双拼盲文三种对照汉盲语料库。实验结果表明,该文提出的方法可将汉字一步转换为盲文,并在国家通用盲文、现行盲文、双拼盲文分别有80.25%,79.08%和79.29%的BLEU值。相比现有汉盲转换方法,该方法所需语料库的建设难度较小,且工程复杂度较低。
中图分类号:
[1]GB/T 15720-2008中国盲文[S].北京,2008. [2]GF 0019-2018国家通用盲文方案[S].北京,2018. [3]ZHONG J H.Analysis of the characteristics of Chinese common Braille Scheme[J].Modern Special Education,2018(23):23-25. [4]GUO L H.Research on the current situation and development trend of Braille Publishing[J].Media Forum,2019,2(11):121-122. [5]LI N.The current situation and trend of Braille Publishing[J].Modern Publishing,2016,(5):30-33. [6]ZEGHIDOUR N,USUNIER N,SYNNAEVE G,et al.End-to-End speech recognition from the raw waveform[C]//Interspeech.2018:781-785. [7]DABRE R,CHU C,KUNCHUKUTTAN A.A Survey of Multilingual Neural Machine Translation[J].ACM Computing Surveys,2020,53(5):1-38. [8]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Advancesin Neural Information Processing Systems.2014:3104-3112. [9]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems.2017:6000-6010. [10]HUANG H Y,CHEN Z X,HUANG J.Chinese-Braille Translation Approach Based on Multi-Knowledge Analysis[C]//The 7th China Joint Conference on Computational Linguistics.2003:607-613. [11]WANG X,YANG Y,LIU H,et al.Chinese-Braille translation based on Braille corpus[J].International Journal of Advanced Pervasive & Ubiquitous Computing,2016,8(2):56-63. [12]WANG X,YANG Y,ZHANG J,et al.Chinese to Braille translation based on Braille word segmentation using statistical model[J].Journal of Shanghai Jiaotong University(Science),2017,22(1):82-86. [13]LI Z,WANG R,ZHANG T,et al.Intelligent Braille conversion system of Chinese characters based on Markov model[C]//Proceedings of IEEE 3rd Information Technology,Networking,Electronic and Automation Control Conference(ITNEC).2019:1283-1287. [14]CAI J,WANG X D,TANG L Z,et al.A Deep Learning Method for Chinese-Braille Conversion Based on Parallel Corpora[J].Journal of Chinese Information Processing,2019,33(4):60-67. [15]MA J,GANCHEV K,WEISS D.State-of-the-art Chinese word segmentation with BiLSTMs[C]//The 2018 Conference on Empirical Methods in Natural Language Processing.2018:4902-4908. [16]PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a method for automatic evaluation of machine translation[C]//ACL.2002:311-318. [17]GAMBHIR M,GUPTA V.Recent automatic text summarization techniques:a survey[J].Artificial Intelligence Review,2017,47(1):1-66. [18]KOEHN P,KNOWLES R.Six challenges for neural machine translation[C]//The First Workshop on Neural Machine Translation.2017:28-39. [19]YANG S H,WANG Y X,CHU X W.A Survey of Deep Learning Techniques for Neural Machine Translation[J].arXiv:2002.07526,2020. |
[1] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[2] | 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩. 基于Transformer和LSTM的药物相互作用预测 Drug-Drug Interaction Prediction Based on Transformer and LSTM 计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150 |
[3] | 张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构 Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023 |
[4] | 赵小虎, 叶圣, 李晓. 多算法融合的骨骼重建信息动作分类方法 Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction 计算机科学, 2022, 49(6): 269-275. https://doi.org/10.11896/jsjkx.210500070 |
[5] | 陆亮, 孔芳. 面向对话的融入知识的实体关系抽取 Dialogue-based Entity Relation Extraction with Knowledge 计算机科学, 2022, 49(5): 200-205. https://doi.org/10.11896/jsjkx.210300198 |
[6] | 杨慧敏, 马廷淮. 融合检索与生成的复合对话模型 Compound Conversation Model Combining Retrieval and Generation 计算机科学, 2021, 48(8): 234-239. https://doi.org/10.11896/jsjkx.200700162 |
[7] | 杨进才, 曹元, 胡泉, 沈显君. 基于Transformer模型与关系词特征的汉语因果类复句关系自动识别 Relation Classification of Chinese Causal Compound Sentences Based on Transformer Model and Relational Word Feature 计算机科学, 2021, 48(6A): 295-298. https://doi.org/10.11896/jsjkx.200500019 |
[8] | 霍帅, 庞春江. 基于Transformer和多通道卷积神经网络的情感分析研究 Research on Sentiment Analysis Based on Transformer and Multi-channel Convolutional Neural Network 计算机科学, 2021, 48(6A): 349-356. https://doi.org/10.11896/jsjkx.200800004 |
[9] | 邱嘉作, 熊德意. 神经问题生成前沿综述 Frontiers in Neural Question Generation:A Literature Review 计算机科学, 2021, 48(6): 159-167. https://doi.org/10.11896/jsjkx.201100013 |
|