计算机科学 ›› 2025, Vol. 52 ›› Issue (5): 227-234.doi: 10.11896/jsjkx.240400035
何春辉, 葛斌, 张翀, 徐浩
HE Chunhui, GE Bin, ZHANG Chong, XU Hao
摘要: 四字成语作为一类特殊词语,在中文使用中非常流行。随着中文纠错任务的发展,中文成语的智能纠错已经成为自然语言处理领域的一个研究热点。针对现有方法在中文成语智能纠错任务上准确率偏低的问题,提出了一种融合定长Seq2Seq网络的中文成语智能纠错模型。它在底层通过融合Seq2Seq网络架构和注意力机制,并结合混合数据集构造方法,共同训练得到输入和输出端序列长度固定的Seq2Seq模型,用来完成中文四字成语智能纠错任务。在大型公开中文成语纠错数据集上的实验结果表明,定长Seq2Seq模型优于现有方法,能够实现同一个模型同时兼容乱序、缺字和错字3种不同的中文成语智能纠错目标。它的综合纠错准确率可以达到91.3%,比最优基线模型高出11.73%。
中图分类号:
[1]XU H,HE C H,ZHANG C,et al.A Multi-channel ChineseText Correction Method Based on Grammatical Error Diagnosis[C]//2022 8th International Conference on Big Data and Information Analytics(BigDIA).2022:396-401. [2]SUN Q J,LIANG J G,LI S,Chinese grammatical error correction model based on bidirectional and auto-regressive transfor-mers noiser[J].Journal of Computer Applications,2022,42(3):860-866. [3]YOO Y.An Analysis on Four-character idiom in the Contempo-rary Chinese Dictionary[J].Journal of Chinese Humanities,2010(46):93-109. [4]WANG Y,WANG Y,DANG K,et al.A comprehensive survey of grammatical error correction[J].ACM Transactions on Intelligent Systems and Technology(TIST),2021,12(5):1-51. [5]WU C H,LIU C H,HARRIS M,et al.Sentence correction incorporating relative position and parse template language models[J].IEEE Transactions on Audio Speech & Language Proces-sing,2010,18(6):1170-1181. [6]YU C H,CHEN H H.Detecting word ordering errors in Chi-nese sentences for learning Chinese as a foreign language[C]//Proceedings of COLING.2012. [7]CHENG S M,YU C H,CHEN H H.Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners[C]//The 25th International Conference on Computational Linguistics.2014:279-289. [8]FU K,HUANG J,DUAN Y.Youdao's winning solution to the NLPCC-2018 task 2 challenge:a neural machine translation approach to Chinese grammatical error correction[C]//NLPCC2018.Cham:Springer,2018:341-350. [9]SALHAB M,ABU-KHZAM F.AraSpell:A Deep Learning Approach for Arabic Spelling Correction[J].arXiv:2405.06981,2024. [10]HUANG Y,ZENG Q,LEI Q,et al.Smartphone heading correc-tion method based on LSTM neural network[C]//China Satellite Navigation Conference.Singapore:Springer Nature Singapore,2022:415-425. [11]ZHANG C,JIANG D,GAO Y,et al.A hierarchical tensor error correction GRU model[J].Information Sciences,2023,642:119156. [12]WANG N,LI Z.Short term power load forecasting based onBES-VMD and CNN-Bi-LSTM method with error correction[J].Frontiers in Energy Research,2023,10:1076529. [13]LI J,GUO J,ZHU Y,et al.Sequence-to-action:Grammatical error correction with action guided sequence generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:10974-10982. [14]ZHU C,YING Z,ZHANG B,et al.MDCSpell:A multi-task detector-corrector framework for Chinese spelling correction[C]//Findings of the Association for Computational Linguistics:ACL 2022.2022:1244-1253. [15]HOKAMP C,LIU Q.Lexically Constrained Decoding for Se-quence Generation Using Grid Beam Search[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(ACL).Vancouver,Canada,2017:1535-1546. [16]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[J].arXiv:1409.3215,2014. [17]CHO K,MERRIENBOER B,GULCEHRE C,et al.LearningPhrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[C]//EMNLP.2014. [18]GEMECHU E,KANAGACHIDAMBARESAN G R.Text-Text Neural Machine Translation:A Survey[J].Optical Memory and Neural Networks,2023,32(2):59-72. [19]DAS B,MAJUMDER M,PHADIKAR S,et al.Automatic question generation and answer assessment:a survey[J/OL].https://telrp.springeropen.com/counter/pdf/10.1186/s41039-021-00151-1.pdf. [20]ZHAO S,LI Q,HE T J,et al.A Step-by-Step Gradient Penalty with Similarity Calculation for Text Summary Generation[J].Neural Processing Letters,2022,55(4):4111-4126. [21]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014. [22]LUONG M T,PHAM H,MANNING C D.Effective approaches to attention-based neural machine translation[J].arXiv:1508.04025,2015. [23]PODDA M,BONECHI S,PALLADINO A,et al.Classification of Neisseria meningitidis genomes with a bag-of-words approach and machine learning[J].iScience,2024,27(3):1-15. [24]WANG H,KUROSAWA M,KATSUMATA S,et al.Chinese grammatical correction using BERT-based pre-trained model[J].arXiv:2011.02093,2020. [25]XU M.Pycorrector:Text error correction tool [EB/OL].(2024-02-03).https://github.com/shibing624/pycorrector. [26]CUI Y,CHE W,LIU T,et al.Pre-training with whole word masking for Chinese bert[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514. [27]CZHANG Y,LI Z,BAO Z,et al.MuCGEC:a Multi-ReferenceMulti-Source Evaluation Dataset for Chinese Grammatical Error Correction[J].arXiv:2204.10994,2022. [28]TIAN T,SONG C,TING J,et al.A French-to-English Machine Translation Model Using Transformer Network[J].Procedia Computer Science,2022,199:1438-1443. |
|