多语言问答研究综述

doi:10.11896/jsjkx.210900003

Abstract

Abstract: Multilingual question answering is one of the research hotspots in the field of natural language processing,which aims to enable the model to return a correct answer based on understanding of the given questions and texts in different languages.With the rapid development of machine translation technology and the wide application of multilingual pre-training technology in the field of natural language processing,multilingual question answering has also achieved a relatively rapid development.This paper first systematically reviews the current work of multilingual question answering methods,and divides them into feature-based methods,translation-based methods,pre-training-based methods and dual encoding-based methods,and introduces the use and characteristics of each method respectively.Meanwhile,it also discusses the current work related to multilingual question answe-ring tasks,and divides them into text-based and multi-modal-based tasks and gives the basic definition of each one.Moreover,this paper summarizes the dataset statistics,evaluation metrics and multilingual question answering methods involved in these tasks.Finally,it proposes the future research prospect of multilingual question answering.

Key words: Machine translation, Multilingual pre-training techniques, Multilingual question answering, Multi-modal-based multilingual question answering, Text-based multilingual question answering

CLC Number:

TP391

LIU Chuang, XIONG De-yi. Survey of Multilingual Question Answering[J].Computer Science, 2022, 49(1): 65-72.

References

[1]HERMANN K M,KOCISKY T,GREFENSTETTER E,et al.Teaching machines to read and comprehend[J].Advances in Neural Information Processing Systems,2015,28:1693-1701.
[2]HILL F,BORDES A,CHOPRA S,et al.The goldilocks principle:Reading children's books with explicit memory representations[J].arXiv:1511.02301,2015.
[3]XIE Q,LAI G,DAI Z,et al.Large-scale Cloze Test Dataset Crea-ted by Teachers[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:2344-2356.
[4]CUI Y,LIU T,CHEN Z,et al.Consensus Attention-based Neural Networks for Chinese Reading Comprehension[C]//Proceedings of COLING 2016,the 26th International Conference on Computational Linguistics:Technical Papers.2016:1777-1786.
[5]KEMBHAVI A,SEO M,SCHWENK D,et al.Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4999-5007.
[6]WELBL J,LIU N F,GARDNER M.Crowdsourcing MultipleChoice Science Questions[C]//Proceedings of the 3rd Workshop on Noisy User-generated Text.2017:94-106.
[7]OSTERMANN S,MODI A,ROTH M,et al.MCScript:A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge[C]//Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).2018.
[8]LAI G,XIE Q,LIU H,et al.RACE:Large-scale Reading Comprehension Dataset From Examinations[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:785-794.
[9]CLARK P,COWHEY I,ETZIONI O,et al.Think you havesolved question answering? try arc,the ai2 reasoning challenge[J].arXiv:1803.05457,2018.
[10]YANG Y,YIH W,MEEK C.Wikiqa:A challenge dataset foropen-domain question answering[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Proces-sing.2015:2013-2018.
[11]RAJPURKAR P,ZHANG J,LOPYREV K,et al.SQuAD:100 000＋Questions for Machine Comprehension of Text[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:2383-2392.
[12]TRISCHLER A,WANG T,YUAN X,et al.NewsQA:A Machine Comprehension Dataset[C]//Proceedings of the 2nd Workshop on Representation Learning for NLP.2017:191-200.
[13]DUNN M,SAGUN L,HIGGINS M,et al.Searchqa:A new q&a dataset augmented with context from a search engine[J].arXiv:1704.05179,2017.
[14]JOSHI M,CHOI E,WELD D S,et al.TriviaQA:A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.2017:1601-1611.
[15]RAJPURKAR P,JIA R,LIANG P.Know What You Don't Know:Unanswerable Questions for SQuAD[C]//Proceedings of the 56th Annual Meeting of the Association for Computatio-nal Linguistics.2018:784-789.
[16]HE W,LIU K,LIU J,et al.DuReader:a Chinese Machine Rea-ding Comprehension Dataset from Real-world Applications[C]//Proceedings of the Workshop on Machine Reading for Question Answering.2018:37-46.
[17]NGUYEN T,ROSENBERG M,SONG X,et al.MS MARCO:A Human Generated MAchine Reading COmprehension Dataset[J].arXiv:1611.09268,2016.
[18]KOCISKY T,SCHWARZ J,BLUNSOM P,et al.The narra-tiveqa reading comprehension challenge[J].Transactions of the Association for Computational Linguistics,2018,6:317-328.
[19]IYYER M,YIH W,CHANG M W.Search-based neural structured learning for sequential question answering[C]//Procee-dings of the 55th Annual Meeting of the Association for Computational Linguistics.2017:1821-1831.
[20]SAHA A,PAHUJA V,KHAPRA M,et al.Complex Sequential Question Answering:Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[21]TALMOR A,BERANT J.The Web as a Knowledge-Base forAnswering Complex Questions[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:641-651.
[22]REDDY S,CHEN D,MANNING C D.Coqa:A conversational question answering challenge[J].Transactions of the Association for Computational Linguistics,2019,7:249-266.
[23]CHOI E,HE H,IYYER M,et al.QuAC:Question Answering in Context[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:2174-2184.
[24]YANG Z,QI P,ZHANG S,et al.HotpotQA:A Dataset for Diverse,Explainable Multi-hop Question Answering[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:2369-2380.
[25]D'HOFFSCHMIDT M,BELBLIDIA W,HEINRICH Q,et al.FQuAD:French Question Answering Dataset[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing:Findings.2020:1193-1208.
[26]EFIMOV P,CHERTOK A,BOYTSOV L,et al.SberQuAD-Russian reading comprehension dataset:Description and analysis[C]//International Conference of the Cross-Language Eva-luation Forum for European Languages.Cham:Springer,2020:3-15.
[27]LIM S,KIM M,LEE J.KorQuAD1.0:Korean QA dataset for machine reading comprehension[J].arXiv:1909.07005,2019.
[28]JING Y,XIONG D,YAN Z.BiPaR:A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:2452-2462.
[29]LIU J,LIN Y,LIU Z,et al.XQA:A cross-lingual open-domain question answering dataset[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:2358-2368.
[30]LIU P,DENG Y,ZHU C,et al.XCMRC:Evaluating cross-lingual machine reading comprehension[C]//CCF International Conference on Natural Language Processing and Chinese Computing.Cham:Springer,2019:552-564.
[31]ARTETXE M,RUDER S,YOGATAMA D.On the Cross-lingual Transferability of Monolingual Representations[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:4623-4637.
[32]LEWIS P,OGUZ B,RINOTT R,et al.MLQA:EvaluatingCross-lingual Extractive Question Answering[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7315-7330.
[33]HARDALOV M,MIHAYLOV T,ZLATKOVA D,et al.EX-AMS:A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).2020:5427-5444.
[34]ROY U,CONSTANT N,AL-RFOU R,et al.LAReQA:Language-agnostic Answer Retrieval from a Multilingual Pool[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).2020:5919-5930.
[35]CLARK J H,CHOI E,COLLINS M,et al.TyDi QA:A benchmark for information-seeking question answering in typologically diverse languages[J].Transactions of the Association for Computational Linguistics,2020,8:454-470.
[36]ASAI A,KASAI J,CLARK J H,et al.XOR QA:Cross-lingual Open-Retrieval Question Answering[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2021:547-564.
[37]GAO H,MAO J,ZHOU J,et al.Are you talking to a machine? Dataset and methods for multilingual image question answering[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2.2015:2296-2304.
[38]RAMNATH K,SARI L,HASEGAWA-JOHNSON M,et al.Worldly Wise (WoW)-Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering[C]//Procee-dings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2021:1908-1919.
[39]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.2019:4171-4186.
[40]CONNEAU A,LAMPLE G.Cross-lingual language model pretraining[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:7059-7069.
[41]CONNEAU A,KHANDELWAL K,GOYAL N,et al.Unsuper-vised Cross-lingual Representation Learning at Scale[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:8440-8451.
[42]KAKWANI D,KUNCHUKUTTAN A,GOLLA S,et al.inlp-suite:Monolingual corpora,evaluation benchmarks and pre-trained multilingual language models for indian languages[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing:Findings.2020:4948-4961.
[43]KHANUJA S,BANSAL D,MEHTANI S,et al.Muril:Multi-lingual representations for indian languages[J].arXiv:2103.10730,2021.
[44]LUO F,WANG W,LIU J,et al.Veco:Variable encoder-decoder pre-training for cross-lingual understanding and generation[J].arXiv:2010.16046,2020.
[45]CHI Z,DONG L,WEI F,et al.InfoXLM:An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Trai-ning[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2021:3576-3588.
[46]PHANG J,CALIXTO I,HTUT P M,et al.English Interme-diate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too[C]//Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing.2020:557-575.
[47]OUYANG X,WANG S,PANG C,et al.Ernie-m:Enhancedmultilingual representation by aligning cross-lingual semantics with monolingual corpora[J].arXiv:2012.15674,2020.
[48]HU J,JOHNSON M,FIRAT O,et al.Explicit Alignment Objectives for Multilingual Bidirectional Encoders[C]//Procee-dings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2021:3633-3643.
[49]CHUNG H W,FEVRY T,TSAI H,et al.Rethinking Embedding Coupling in Pre-trained Language Models[C]//International Conference on Learning Representations.2020.
[50]ROBERTSON S,ZARAGOZA H,TAYLOR M.Simple BM25 extension to multiple weighted fields[C]//Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management.2004:42-49.
[51]BRILL E,LIN J,BANKO M,et al.Data-intensive Question Answering[J].NIST Special Publication,2002(500-250):393-400.
[52]ATTARDI G,CISTERNINO A,FORMICA F,et al.PIQASso:PIsa Question Answering System[C]//Text Retrieval Confe-rence 10.NIST,2001:566-607.
[53]ALFONSECA E,DE BONI M,JARA-VALENCIA J L,et al.A prototype question answering system using syntactic and semantic information for answer retrieval[J].NIST Special Publication,2002(500-250):680-685.
[54]KATZ B,BORCHARDT G,FELSHIN S.Syntactic and semantic decomposition strategies for question answering from multiple resources[C]//Proceedings of the AAAI 2005 Workshop on Inference for Textual Question Answering.Menlo Park,CA:AAAI Press,2005:35-41.
[55]CUI H,SUN R,LI K,et al.Question answering passage retrie-val using dependency relations[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2005:400-407.
[56]HOVY E,GERBER L,HERMJAKOB U,et al.Toward semantics-based answer pinpointing[C]//Proceedings of the First International Conference on Human Language Technology Research.2001:1-7.
[57]MORTON T S.Using coreference for question answering[C]//Proceedings of the Workshop on Coreference and its Applications.1999:85-89.
[58]KOLOMIYETS O,MOENS M F.A survey on question answe-ring technology from an information retrieval perspective[J].Information Sciences,2011,181(24):5412-5434.
[59]KO J,SI L,NYBERG E,et al.Probabilistic models for answer-ranking in multilingual question-answering[J].ACM Transactions on Information Systems (TOIS),2010,28(3):1-37.
[60]TURE F,BOSCHEE E.Learning to Translate for Multilingual Question Answering[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:573-584.
[61]NI M,HUANG H,SU L,et al.M3p:Learning universal repre-sentations via multitask multilingual multimodal pre-training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:3977-3986.
[62]PIRES T,SCHLINGER E,GARRETTE D.How Multilingual is Multilingual BERT?[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:4996-5001.
[63]CUI Y,CHE W,LIU T,et al.Cross-Lingual Machine Reading Comprehension[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.2019:1586-1595.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Survey of Multilingual Question Answering

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	DONG Zhen-heng, REN Wei-ping, YOU Xin-dong, LYU Xue-qiang. Machine Translation Method Integrating New Energy Terminology Knowledge [J]. Computer Science, 2022, 49(6): 305-312.
[2]	LIU Jun-peng, SU Jin-song, HUANG De-gen. Incorporating Language-specific Adapter into Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 17-23.
[3]	YU Dong, XIE Wan-ying, GU Shu-hao, FENG Yang. Similarity-based Curriculum Learning for Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 24-30.
[4]	HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[5]	LIU Yan, XIONG De-yi. Construction Method of Parallel Corpus for Minority Language Machine Translation [J]. Computer Science, 2022, 49(1): 41-46.
[6]	NING Qiu-yi, SHI Xiao-jing, DUAN Xiang-yu, ZHANG Min. Unsupervised Domain Adaptation Based on Style Aware [J]. Computer Science, 2022, 49(1): 271-278.
[7]	LIU Xiao-die. Recognition and Transformation for Complex Noun Phrases Based on Boundary Perception [J]. Computer Science, 2021, 48(6A): 299-305.
[8]	GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng. Review of Sign Language Recognition, Translation and Generation [J]. Computer Science, 2021, 48(3): 60-70.
[9]	ZHOU Xiao-shi, ZHANG Zi-wei, WEN Juan. Natural Language Steganography Based on Neural Machine Translation [J]. Computer Science, 2021, 48(11A): 557-564.
[10]	QIAO Bo-wen,LI Jun-hui. Neural Machine Translation Combining Source Semantic Roles [J]. Computer Science, 2020, 47(2): 163-168.
[11]	JI Ming-xuan, SONG Yu-rong. New Machine Translation Model Based on Logarithmic Position Representation and Self-attention [J]. Computer Science, 2020, 47(11A): 86-91.
[12]	WANG Kun, DUAN Xiang-yu. Neural Machine Translation Inclined to Close Neighbor Association [J]. Computer Science, 2019, 46(5): 198-202.
[13]	WANG Qi, DUAN Xiang-yu. Neural Machine Translation Based on Attention Convolution [J]. Computer Science, 2018, 45(11): 226-230.
[14]	YANG Zhi-zhuo. Supervised WSD Method Based on Context Translation [J]. Computer Science, 2017, 44(4): 252-255.
[15]	LI Jin-ting, HOU Hong-xu, WU Jing, WANG Hong-bin and FAN Wen-ting. Effect of Preprocessing on Corpus of Mongolian-Chinese Statistical Machine Translation [J]. Computer Science, 2017, 44(10): 259-264.