Computer Science ›› 2024, Vol. 51 ›› Issue (11): 248-254.doi: 10.11896/jsjkx.231000096
• Artificial Intelligence • Previous Articles Next Articles
LIU Xiaofeng, ZHENG Yucheng, LI Dongyang
CLC Number:
[1] SCHWENK H,WENZEK G,EDUNOV S,et al.CCMatrix:Mining billions of high-quality parallel sentences on the Web[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics.2021:6490-6500. [2] EL-KISHKY A,CHAUDHARY V,GUZMAN F,et al.CCAligned:A massive collection of cross-lingual web-document pairs[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:5960-5969. [3] LISON P,TIEDEMANN J.OpenSubtitles2016:Extracting large parallel corpora from movie and TV subtitles[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation.2016:923-929. [4] ZIEMSKI M,JUNCZYS-DOWMUNT M,POULIQUEN B.The United Nations parallel corpus v1.0[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation.2016:3530-3534. [5] KOEHN P.Europarl:A parallel corpus for statistical machinetranslation[C]//Proceedings of Machine Translation Summit.2005:79-86. [6] MORISHITA M,CHOUSA K,SUZUKI J,et al.JParaCrawl v3.0:A Large-scale English-Japanese Parallel Corpus[C]//Proceedings of International Conference on Language Resources and Evaluation.2022:6704-6710. [7] ESPLÀ-GOMIS M,FORCADA M,RAMÍREZ-SÁNCHEZ G,et al.Paracrawl:Web-scale parallel corpora for the languages of the EU[C]//Proceedings of Machine Translation Summit.2019:118-119. [8] JUSSA C,CROSS M,ÇELEBI J,et al.No Language Left Behind:Scaling Human-Centered Machine Translation[J].arXiv:2207.04672,2022. [9] TUFIS D,ION R,DANIEL S,et al.Wikipedia as an SMT trai-ning corpus[C]//Proceedings of the International Conference Recent Advances in Natural Language Processing.2013:702-709. [10] SCHWENK H,CHAUDHARY V,SUN S,et al.Wikimatrix:Mining 135m parallel sentences in 1620 language pairs from Wikipedia[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.2021:1351-1361. [11] JOHNSON J,DOUZE M,JÉGOU H.Billion-scale similaritysearch with gpus[J].IEEE Transactions on Big Data,2019,7(3):535-547. [12] ARTETXE M,SCHWENK H.Margin-based parallel corpusmining with multilingual sentence embeddings[C]//Procee-dings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:3197-3203. [13] KVAPILÍKOVÁ I,ARTETXE M,LABAKA G,et al.Unsupervised multilingual sentence embeddings for parallel corpus mi-ning[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:255-262. [14] RESNIK P.Mining the Web for Bilingual Text[C]//Procee-dings of the 37th Annual Meeting of the Association for Computational Linguistics.1999:527-534. [15] BUCK C,KOEHN P.Findings of the WMT 2016 bilingual document alignment shared task[C]//Proceedings of the First Conference on Machine Translation.2016:554-563. [16] FANG X Y,YANG Y F,CER D,et al.Language-agnosticBERT Sentence Embedding[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.2022:878-889. [17] KOEHN P,KHAYRALLAH H,HEAFIELD K,et al.Findings of the WMT 2018 shared task on parallel corpus filtering[C]//Proceedings of the Third Conference on Machine Translation.2018:726-739. [18] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010. [19] KUDO T,RICHARDSON J.SentencePiece:A simple and language independent subword tokenizer and detokenizer for Neural Text Processing[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:66-71. [20] POST M.A Call for Clarity in Reporting BLEU Scores[C]//Proceedings of the Third Conference on Machine Translation.2018:186-191. |
[1] | YANG Binxia, LUO Xudong, SUN Kaili. Recent Progress on Machine Translation Based on Pre-trained Language Models [J]. Computer Science, 2024, 51(6A): 230700112-8. |
[2] | GU Shiwei, LIU Jing, LI Bingchun, XIONG Deyi. Survey of Unsupervised Sentence Alignment [J]. Computer Science, 2024, 51(1): 60-67. |
[3] | SUN Kaili, LUO Xudong , Michael Y.LUO. Survey of Applications of Pretrained Language Models [J]. Computer Science, 2023, 50(1): 176-184. |
[4] | DONG Zhen-heng, REN Wei-ping, YOU Xin-dong, LYU Xue-qiang. Machine Translation Method Integrating New Energy Terminology Knowledge [J]. Computer Science, 2022, 49(6): 305-312. |
[5] | LIU Jun-peng, SU Jin-song, HUANG De-gen. Incorporating Language-specific Adapter into Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 17-23. |
[6] | YU Dong, XIE Wan-ying, GU Shu-hao, FENG Yang. Similarity-based Curriculum Learning for Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 24-30. |
[7] | HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40. |
[8] | LIU Yan, XIONG De-yi. Construction Method of Parallel Corpus for Minority Language Machine Translation [J]. Computer Science, 2022, 49(1): 41-46. |
[9] | LIU Chuang, XIONG De-yi. Survey of Multilingual Question Answering [J]. Computer Science, 2022, 49(1): 65-72. |
[10] | NING Qiu-yi, SHI Xiao-jing, DUAN Xiang-yu, ZHANG Min. Unsupervised Domain Adaptation Based on Style Aware [J]. Computer Science, 2022, 49(1): 271-278. |
[11] | LIU Xiao-die. Recognition and Transformation for Complex Noun Phrases Based on Boundary Perception [J]. Computer Science, 2021, 48(6A): 299-305. |
[12] | GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng. Review of Sign Language Recognition, Translation and Generation [J]. Computer Science, 2021, 48(3): 60-70. |
[13] | ZHOU Xiao-shi, ZHANG Zi-wei, WEN Juan. Natural Language Steganography Based on Neural Machine Translation [J]. Computer Science, 2021, 48(11A): 557-564. |
[14] | QIAO Bo-wen,LI Jun-hui. Neural Machine Translation Combining Source Semantic Roles [J]. Computer Science, 2020, 47(2): 163-168. |
[15] | JI Ming-xuan, SONG Yu-rong. New Machine Translation Model Based on Logarithmic Position Representation and Self-attention [J]. Computer Science, 2020, 47(11A): 86-91. |
|