Computer Science ›› 2022, Vol. 49 ›› Issue (1): 73-79.doi: 10.11896/jsjkx.210900036

• Multilingual Computing Advanced Technology • Previous Articles     Next Articles

Improving Low-resource Dependency Parsing Using Multi-strategy Data Augmentation

XIAN Yan-tuan, GAO Fan-ya, XIANG Yan, YU Zheng-tao, WANG Jian   

  1. Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
    Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China
  • Received:2021-09-03 Revised:2021-10-13 Online:2022-01-15 Published:2022-01-18
  • About author:XIAN Yan-tuan,born in 1981,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include information retrieval and natural language processing.
    XIANG Yan,born in 1979,Ph.D,associate professor,is a member of China Computer Federation.Her main research interests include text mining and sentiment analysis.
  • Supported by:
    National Natural Science Foundation of China(61732005,61972186),Yunnan Provincial Major Science and Technology Special Plan Projects(202002AD080001,202103AA080015) and Yunnan High and New Technology Industry Project(201606).

Abstract: Dependency parsing aims to identify syntactic dependencies between words in a sentence.Dependency parsing can provide syntactic features and improve model performance for tasks such as information extraction,automatic question answering and machine translation.The training data size has an significant impact on the performance of the dependency parsing model.The lack of training data will cause serious unknown word problems and model over-fitting problems.This paper proposes various data augment strategies for the problem of low-resource dependency parsing.The proposed method effectively expands the training data by synonym substitution and alleviates the unknown words problem.The data augment strategies of multiple Mixups effectively alleviate the model overfitting problem and improve the generalization ability of the model.Experimental results on the universal dependencies treebanks(UD treebanks) dataset show that the proposed methods effectively improve the performance of Thai,Vietnamese and English dependency parsing under small-scale training corpus conditions.

Key words: Dependency parsing, Low-resource language, Mixup data augmentation, Multi-strategy, Synonym substitution

CLC Number: 

  • TP391
[1]TU K W,LI J.A Survey of Recent Developments in Syntactic Parsing[J].Journal of Chinese Information Processing,2020,34(7):30-41.
[2]MAO C L,MAN Z B,YU Z T,et al.A Burmese Dependency Parsing Method Based on Transfer Learning[C]//2020 International Conference on Asian Language Processing (IALP).IEEE,2020:92-97.
[3]CHEN D,MANNING C D.A fast and accurate dependency parser using neural networks [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),Stroudsburg,PA:Association for Computational Linguistics.2014:740-750.
[4]DYER C,BALLESTEROS M,LING W,et al.Transition-based dependency parsing with stack long short-term memory[J].ar-Xiv:1505.08075,2015.
[5]ANDOR D,ALBERTI C,WEISS D,et al.Globally Normalized Transition-Based Neural Networks [C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers).2016:2442-2452.
[6]NALLANI S,SHRIVASTAVA M,SHARMA D M.A Simple and Effective Dependency parser for Telugu[C]//Proceedings of the 58th Annual Meeting of the Association for Computatio-nal Linguistics:Student Research Workshop.2020:143-149.
[7]KIPERWASSER E,GOLDBERG Y.Simple and accurate de-pendency parsing using bidirectional LSTM feature representations[J].Transactions of the Association for Computational Linguistics,2016,4:313-327.
[8]DOZAT T,MANNING C D.Deep biaffine attention for neural dependency parsing[J].arXiv:1611.01734,2016.
[9]SINGKUL S,WORARATPANYA K.Thai dependency parsing with character embedding[C]//2019 11th International Confe-rence on Information Technology and Electrical Engineering (ICITEE).IEEE,2019:1-5.
[10]KULMIZEV A,DE-LHONEUX M,GONTRUM J,et al.DeepContextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing-A Tale of Two Parsers Revi-sited[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:2755-2768.
[11]DELHONEUX M,BALLESTEROS M,NIVRE J.Recursivesubtree composition in LSTM-based dependency parsing[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Association for ComputationalLinguistics,2019:1566-1576.
[12]FALENSKA A,KUHN J.The (non-)utility of structural fea-tures in BiLSTM-based dependency parsers [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:117-128.
[13]ZHANG Z,MA X,HOVY E.An empirical investigation ofstructured output modeling for graph-based neural dependency parsing[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:5592-5598.
[14]ZHANG X,ZHAO J,LECUN Y.Character-level convolutional networks for text classification[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.2015:649-657.
[15]XIE Q,DAI Z,HOVY E,et al.Unsupervised Data Augmentation for Consistency Training[J].Advances in Neural Information Processing Systems,2020,33:6256-6268.
[16]COULOMBE C.Text data augmentation made simple by leveraging nlp cloud apis[J].arXiv:1812.04718,2018.
[17]ZHANG H,CISSE M,DAUPHIN Y N,et al.Mixup:BeyondEmpirical Risk Minimization[C]//International Confe-rence on Learning Representations.2018.
[18]GUO H,MAO Y,ZHANG R.Augmenting data with mixup for sentence classification:An empirical study[J].arXiv:1905.08941,2019.
[19]ZHANG R,YU Y,ZHANG C.SeqMix:Augmenting Active Sequence Labeling via Sequence Mixup[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).2020:8566-8579.
[20]WEI J,ZOU K.EDA:Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Confe-rence on Natural Language Processing (EMNLP-IJCNLP).2019:6382-6388.
[1] YU Ying, CHEN Ke, SHOU Li-dan, CHEN Gang, WU Xiao-fan. Sentiment Analysis of User Comments Based on Extraction of Key Words and Key Sentences [J]. Computer Science, 2019, 46(10): 19-26.
[2] LI Ying, HAO Xiao-yan and WANG Yong. N-ary Chinese Open Entity-relation Extraction [J]. Computer Science, 2017, 44(Z6): 80-83.
[3] ZHANG Ling and FENG Xin. Extracting Sentiment Element from Chinese Micro-blog Based on POS Template Library and Dependency Parsing [J]. Computer Science, 2015, 42(Z6): 474-478.
[4] SU Xiang-dong,GAO Guang-lai and YAN Xue-liang. Dependency Parsing for Traditional Mongolian [J]. Computer Science, 2014, 41(8): 97-100.
[5] QIU Yun-fei,BAO Li and SHAO Liang-shan. Term Importance Identification Method Based on Classification [J]. Computer Science, 2013, 40(11): 242-247.
[6] FU Jian-feng, LIU Zong-tian,FU Xue-feng, ZHOU Wen,ZHONG Zhao-man. Dependency Parsing Based Event Recognition [J]. Computer Science, 2009, 36(11): 217-219.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!