融合定长Seq2Seq网络的中文成语智能纠错模型

doi:10.11896/jsjkx.240400035

Abstract

Abstract: As a special kind of words,four-character idioms are very popular in Chinese.With the development of Chinese error correction task,intelligent error correction for Chinese idioms has become a research hotspot in natural language processing(NLP) domain.For the low accuracy of the existing methods in intelligent error correction task for Chinese idioms,this paper proposes an intelligent error correction model for Chinese idioms fused with fixed-length Seq2Seq network.In the bottom layer,Seq2Seq network architecture and attention mechanism are combined with hybrid dataset construction method to train Seq2Seq model with fixed input and output sequence length,which is used to solve intelligent error correction task for Chinese four-character idioms.Experimental results on a large public Chinese idiom error correction dataset show that the performance of fixed-length Seq2Seq model is better than the existing methods,and it can achieve the goal of intelligent error correction of three diffe-rent Chinese idioms:out-of-order,missing character and wrong character.Its comprehensive error correction accuracy can reach 91.3%,which is 11.73% higher than the optimal baseline model.

Key words: Idioms error correction, Fixed length Seq2Seq, BiGRU, Attention mechanism

CLC Number:

TP391

HE Chunhui, GE Bin, ZHANG Chong, XU Hao. Intelligent Error Correction Model for Chinese Idioms Fused with Fixed-length Seq2Seq Network[J].Computer Science, 2025, 52(5): 227-234.

References

[1]XU H,HE C H,ZHANG C,et al.A Multi-channel ChineseText Correction Method Based on Grammatical Error Diagnosis[C]//2022 8th International Conference on Big Data and Information Analytics(BigDIA).2022:396-401.
[2]SUN Q J,LIANG J G,LI S,Chinese grammatical error correction model based on bidirectional and auto-regressive transfor-mers noiser[J].Journal of Computer Applications,2022,42(3):860-866.
[3]YOO Y.An Analysis on Four-character idiom in the Contempo-rary Chinese Dictionary[J].Journal of Chinese Humanities,2010(46):93-109.
[4]WANG Y,WANG Y,DANG K,et al.A comprehensive survey of grammatical error correction[J].ACM Transactions on Intelligent Systems and Technology(TIST),2021,12(5):1-51.
[5]WU C H,LIU C H,HARRIS M,et al.Sentence correction incorporating relative position and parse template language models[J].IEEE Transactions on Audio Speech & Language Proces-sing,2010,18(6):1170-1181.
[6]YU C H,CHEN H H.Detecting word ordering errors in Chi-nese sentences for learning Chinese as a foreign language[C]//Proceedings of COLING.2012.
[7]CHENG S M,YU C H,CHEN H H.Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners[C]//The 25th International Conference on Computational Linguistics.2014:279-289.
[8]FU K,HUANG J,DUAN Y.Youdao's winning solution to the NLPCC-2018 task 2 challenge:a neural machine translation approach to Chinese grammatical error correction[C]//NLPCC2018.Cham:Springer,2018:341-350.
[9]SALHAB M,ABU-KHZAM F.AraSpell:A Deep Learning Approach for Arabic Spelling Correction[J].arXiv:2405.06981,2024.
[10]HUANG Y,ZENG Q,LEI Q,et al.Smartphone heading correc-tion method based on LSTM neural network[C]//China Satellite Navigation Conference.Singapore:Springer Nature Singapore,2022:415-425.
[11]ZHANG C,JIANG D,GAO Y,et al.A hierarchical tensor error correction GRU model[J].Information Sciences,2023,642:119156.
[12]WANG N,LI Z.Short term power load forecasting based onBES-VMD and CNN-Bi-LSTM method with error correction[J].Frontiers in Energy Research,2023,10:1076529.
[13]LI J,GUO J,ZHU Y,et al.Sequence-to-action:Grammatical error correction with action guided sequence generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:10974-10982.
[14]ZHU C,YING Z,ZHANG B,et al.MDCSpell:A multi-task detector-corrector framework for Chinese spelling correction[C]//Findings of the Association for Computational Linguistics:ACL 2022.2022:1244-1253.
[15]HOKAMP C,LIU Q.Lexically Constrained Decoding for Se-quence Generation Using Grid Beam Search[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(ACL).Vancouver,Canada,2017:1535-1546.
[16]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[J].arXiv:1409.3215,2014.
[17]CHO K,MERRIENBOER B,GULCEHRE C,et al.LearningPhrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[C]//EMNLP.2014.
[18]GEMECHU E,KANAGACHIDAMBARESAN G R.Text-Text Neural Machine Translation:A Survey[J].Optical Memory and Neural Networks,2023,32(2):59-72.
[19]DAS B,MAJUMDER M,PHADIKAR S,et al.Automatic question generation and answer assessment:a survey[J/OL].https://telrp.springeropen.com/counter/pdf/10.1186/s41039-021-00151-1.pdf.
[20]ZHAO S,LI Q,HE T J,et al.A Step-by-Step Gradient Penalty with Similarity Calculation for Text Summary Generation[J].Neural Processing Letters,2022,55(4):4111-4126.
[21]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[22]LUONG M T,PHAM H,MANNING C D.Effective approaches to attention-based neural machine translation[J].arXiv:1508.04025,2015.
[23]PODDA M,BONECHI S,PALLADINO A,et al.Classification of Neisseria meningitidis genomes with a bag-of-words approach and machine learning[J].iScience,2024,27(3):1-15.
[24]WANG H,KUROSAWA M,KATSUMATA S,et al.Chinese grammatical correction using BERT-based pre-trained model[J].arXiv:2011.02093,2020.
[25]XU M.Pycorrector:Text error correction tool [EB/OL].(2024-02-03).https://github.com/shibing624/pycorrector.
[26]CUI Y,CHE W,LIU T,et al.Pre-training with whole word masking for Chinese bert[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514.
[27]CZHANG Y,LI Z,BAO Z,et al.MuCGEC:a Multi-ReferenceMulti-Source Evaluation Dataset for Chinese Grammatical Error Correction[J].arXiv:2204.10994,2022.
[28]TIAN T,SONG C,TING J,et al.A French-to-English Machine Translation Model Using Transformer Network[J].Procedia Computer Science,2022,199:1438-1443.

Related Articles 15

[1]	PENG Jiao, HE Yue, SHANG Xiaoran, HU Saier, ZHANG Bo, CHANG Yongjuan, OU Zhonghong, LU Yanyan, JIANG dan, LIU Yaduo. Text-Dynamic Image Cross-modal Retrieval Algorithm Based on Progressive Prototype Matching [J]. Computer Science, 2025, 52(9): 276-281.
[2]	GAO Long, LI Yang, WANG Suge. Sentiment Classification Method Based on Stepwise Cooperative Fusion Representation [J]. Computer Science, 2025, 52(9): 313-319.
[3]	LIU Jian, YAO Renyuan, GAO Nan, LIANG Ronghua, CHEN Peng. VSRI:Visual Semantic Relational Interactor for Image Caption [J]. Computer Science, 2025, 52(8): 222-231.
[4]	LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102.
[5]	LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[6]	ZHUANG Jianjun, WAN Li. SCF U²-Net:Lightweight U²-Net Improved Method for Breast Ultrasound Lesion SegmentationCombined with Fuzzy Logic [J]. Computer Science, 2025, 52(7): 161-169.
[7]	ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[8]	WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[9]	KONG Yinling, WANG Zhongqing, WANG Hongling. Study on Opinion Summarization Incorporating Evaluation Object Information [J]. Computer Science, 2025, 52(7): 233-240.
[10]	GUAN Xin, YANG Xueyong, YANG Xiaolin, MENG Xiangfu. Tumor Mutation Prediction Model of Lung Adenocarcinoma Based on Pathological [J]. Computer Science, 2025, 52(6A): 240700010-8.
[11]	TAN Jiahui, WEN Chenyan, HUANG Wei, HU Kai. CT Image Segmentation of Intracranial Hemorrhage Based on ESC-TransUNet Network [J]. Computer Science, 2025, 52(6A): 240700030-9.
[12]	CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[13]	LI Daicheng, LI Han, LIU Zheyu, GONG Shiheng. MacBERT Based Chinese Named Entity Recognition Fusion with Dependent Syntactic Information and Multi-view Lexical Information [J]. Computer Science, 2025, 52(6A): 240600121-8.
[14]	HUANG Bocheng, WANG Xiaolong, AN Guocheng, ZHANG Tao. Transmission Line Fault Identification Method Based on Transfer Learning and Improved YOLOv8s [J]. Computer Science, 2025, 52(6A): 240800044-8.
[15]	WU Zhihua, CHENG Jianghua, LIU Tong, CAI Yahui, CHENG Bang, PAN Lehao. Human Target Detection Algorithm for Low-quality Laser Through-window Imaging [J]. Computer Science, 2025, 52(6A): 240600069-6.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Intelligent Error Correction Model for Chinese Idioms Fused with Fixed-length Seq2Seq Network

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0