计算机科学 ›› 2021, Vol. 48 ›› Issue (11): 319-326.doi: 10.11896/jsjkx.201000099

• 人工智能 • 上一篇    下一篇

基于知识蒸馏的隐式篇章关系识别

俞亮, 魏永丰, 罗国亮, 邬昌兴   

  1. 华东交通大学软件学院 南昌330013
  • 收稿日期:2020-10-18 修回日期:2021-03-10 出版日期:2021-11-15 发布日期:2021-11-10
  • 通讯作者: 邬昌兴(wuchangxing@ecjtu.edu.cn)
  • 作者简介:yu3liang@qq.com
  • 基金资助:
    国家自然科学基金项目(61866012);江西省自然科学基金项目(20181BAB202012);江西省教育厅科学技术研究项目(GJJ180329)

Knowledge Distillation Based Implicit Discourse Relation Recognition

YU Liang, WEI Yong-feng, LUO Guo-liang, WU Chang-xing   

  1. School of Software,East China Jiaotong University,Nanchang 330013,China
  • Received:2020-10-18 Revised:2021-03-10 Online:2021-11-15 Published:2021-11-10
  • About author:YU Liang,born in 1996,postgraduate,is a member of China Computer Federation.His main research interests include natural language processing and deep learning.
    WU Chang-xing,born in 1981.Ph.D,lecturer,is a member of China Compu-ter Federation.His main research interests include nature language processing and deep learning.
  • Supported by:
    National Natural Science Foundation of China(61866012), Natural Science Foundation of Jiangxi Province(20181BAB202012) and Science and Technology Research Project of Jiangxi Education Department(GJJ180329).

摘要: 由于缺少连接词信息,隐式篇章关系识别模型需要基于两个论元(子句或者句子)的语义来推导它们之间的篇章关系,但目前性能还比较低。对于语料标注人员而言,隐式篇章关系的标注是很困难的,他们通常先插入一个合适的连接词用于辅助隐式篇章关系的标注。基于上述情况,文中提出了一种基于知识蒸馏的隐式篇章关系识别方法,其目的是利用语料标注时插入的连接词信息来提高识别的性能。具体地,先构建一个连接词增强的模型用于融合连接词信息,然后基于知识蒸馏的方式把连接词增强模型学到的知识迁移到隐式篇章关系识别模型中。实验结果表明,在常用的PDTB数据集上,所提方法取得了比同类基准方法更好的识别性能。

关键词: 连接词, 篇章结构分析, 深度学习, 隐式篇章关系识别, 知识蒸馏

Abstract: Due to the lack of connectives,implicit discourse relation recognition models infer the semantic relations (e.g.,causal) between two arguments (clauses or sentences) based on their semantics.The performance of these models is still relatively low.It is also very difficult for corpus annotators to annotate implicit discourse relations.They usually insert an appropriate connective to assist the annotation of an implicit discourse relation instance.Considering the above,a knowledge distillation based method is proposed for implicit discourse relation recognition to take use of the connectives inserted during corpus annotating.Specifically,a connective-enhanced model is constructed to integrate the connective information,and then the integrated connective information is transferred to the implicit discourse relation recognition model via knowledge distillation.Experimental results on the commonly used PDTB dataset show that the proposed method achieves better performance than the baselines.

Key words: Connective, Deep learning, Discourse structure analysis, Implicit discourse relation recognition, Knowledge distillation

中图分类号: 

  • TP391.1
[1]LI Y,FENG W,SUN J,et al.Building Chinese Discourse Corpuswith Connective-driven Dependency Tree Structure[C]//Proceedings of EMNLP 2014.2014:2105-2114.
[2]ZHANG L,XING Y,KONG F,et al.A Top-down Neural Architecture towards Text-level Parsing of Discourse Rhetorical Structure[C]//Proceedings of ACL 2020.2020:6386-6395.
[3]HU C W,YANG Y L,WU C X.An Overview of Implicit Discourse Relation Recognition Based on Deep Learning[J].Computer Science,2020,47(4):157-163.
[4]PITLER E,NENKOVA A.Using Syntax to Disambiguate Explicit Discourse Connectives in Text[C]//Proceedings of ACL-IJCNLP 2009.2009:13-16.
[5]KISHIMOTO Y,MURAWAKI Y,KUROHASHI S.AdaptingBERT to Implicit Discourse Relation Classification with a Focus on Discourse Connectives[C]//Proceedings of the 12th Language Resources and Evaluation Conference.2020:1152-1158.
[6]PRASAD R,DINESH N,LEE A,et al.The Penn DiscourseTreeBank 2.0[C]//Proceedings of the Sixth International Conference on Language Resources and Evaluation.2008.
[7]ZHOU Z M,XU Y,NIU Z Y,et al.Predicting Discourse Connectives for Implicit Discourse Relation Recognition[C]//Proceedings of COLING 2010.2010:1507-1514.
[8]QIN L,ZHANG Z,ZHAO H,et al.Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification[C]//Proceedings of ACL 2017.2017:1006-1017.
[9]BAI H,ZHAO H.Deep Enhanced Representation for ImplicitDiscourse Relation Recognition[C]//Proceedings of COLING 2018.2018:571-583.
[10]NGUYEN L T,NGO L V,THAN K,et al.Employing the Correspondence of Relations and Connectives to Identify Implicit Discourse Relations via Label Embeddings[C]//Proceedings of ACL 2019.2019:4201-4207.
[11]WU C,HU C,LI R,et al.Hierarchical Multi-task Learning with CRF for Implicit Discourse Relation Recognition[J].Know-ledge-Based Systems,2020,195(5-6).
[12]ZENG J,LIU Y,SU J,et al.Iterative Dual Domain Adaptation for Neural Machine Translation[C]//Proceedings of EMNLP 2019.2019:845-855.
[13]LIU Y,CHEN K,LIU C,et al.Structured Knowledge Distillation for Semantic Segmentation[C]//Proceedings of CVPR 2019.2019:2604-2613.
[14]HINTON G,VINYALS O,DEAN J.Distilling the Knowledgein a Neural Network[C]//Proceedings of NIPS 2014 Deep Learning Workshop.2015:1-9.
[15]PITLER E,LOUIS A,NENKOVA A.Automatic Sense Prediction for Implicit Discourse Relations in Text[C]//Proceedings of ACL 2009.2009:683-691.
[16]LI S,KONG F,ZHOU G D.Implicit Discourse Relation Recognition Based on PDTB System[J].Journal of Chinese Information Processing,2016,30(4):81-89.
[17]LIN Z,KAN M Y,NG H T.Recognizing Implicit Discourse Relations in the Penn Discourse Treebank[C]//Proceedings of EMNLP 2009.2009:343-351.
[18]LOUIS A,JOSHI A,PRASAD R,et al.Using Entity Features to Classify Implicit Discourse Relations[C]//Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue.2010:59-62.
[19]RUTHERFORD A,XUE N.Discovering Implicit Discourse Relations through Brown Cluster Pair Representation andCorefe-rence Patterns[C]//Proceedings of EACL 2014.2014:645-654.
[20]ZHANG B,SU J,XIONG D,et al.Shallow Convolutional Neural Network for Implicit Discourse Relation Recognition[C]//Proceedings of EMNLP 2015.2015:2230-2235.
[21]JI Y,EISENSTEIN J.One Vector is Not Enough:Entity-Augmented Distributed Semantics for Discourse Relations[J].Transactions of the Association for Computational Linguistics,2015,3:329-344.
[22]DAI Z,HUANG R.Improving Implicit Discourse Relation Classification by Modeling Inter-dependencies of Discourse Units in a Paragraph[C]//Proceedings of NAACL 2018.2018:141-151.
[23]FAN Z W,ZHANG M,LI Z H.Implicit Discourse RelationClassification Based on BiLSTM Combined with Self -Attention Mechanism and Syntactic Information[J].Computer Science,2019,46(5):221-227.
[24]ZHANG B,XIONG D,SU J,et al.Learning Better Discourse Representation for Implicit Discourse Relation Recognition via Attention Networks[J].Neurocomputing,2018,275:1241-1249.
[25]CHEN J,ZHANG Q,LIU P,et al.Implicit Discourse Relation Detection via a Deep Architecture with Gated Relevance Network[C]//Proceedings of ACL 2016.2016:1726-1735.
[26]LIU Y,LI S.Recognizing Implicit Discourse Relations via Repeated Reading:Neural Networks with Multi-Level Attention[C]//Proceedings of EMNLP 2016.2016:1224-1233.
[27]LEI W,WANG X,LIU M,et al.SWIM:A Simple Word Interaction Model for Implicit Discourse Relation Recognition[C]//Proceedings of IJCAI 2017.2017:4026-4032.
[28]GUO F,HE R,JIN D,et al.Implicit Discourse Relation Recognition Using Neural Tensor Network with Interactive Attention and Sparse Learning[C]//Proceedings of COLING 2018.2018:547-558.
[29]GUO F Y,HE R F,DANG J W.Implicit Discourse Relation Recognition Based on Context Interaction Perception and Pattern Selection[J].Chinese Journal of Computers,2020,43(5):901-915.
[30]MARCU D,ECHIHABI A.An Unsupervised Approach to Re-cognizing Discourse Relations[C]//Proceedings of ACL 2002.2002:368-375.
[31]SPORLEDER C,LASCARIDES A.Using Automatically La-belled Examples to Classify Rhetorical Relations:An Assessment[J].Natural Language Engineering,2008,14(3):369-416.
[32]WU C,SHI X,CHEN Y,et al.Bilingually-constrained Synthetic Data for Implicit Discourse Relation Recognition[C]//Procee-dings of EMNLP 2016.2016:2306-2312.
[33]LAN M,WANG J,WU Y,et al.Multi-task Attention-basedNeural Networks for Implicit Discourse Relationship Representation and Identification[C]//Proceedings of EMNLP 2017.2017:1310-1319.
[34]WU C,SHI X,SU J,et al.Co-training for Implicit Discourse Relation Recognition Based on Manual and Distributed Features[J].Neural Processing Letters,2017,46(1):233-250.
[35]XU Y,HONG Y,RUAN H,et al.Using Active Learning to Expand Training Data for Implicit Discourse Relation Recognition[C]//Proceedings of EMNLP 2018.2018:725-731.
[36]BRAUD C,DENIS P.Learning Connective-based Word Representations for Implicit Discourse Relation Identification[C]//Proceedings of EMNLP 2016.2016:203-213.
[37]WU C,SU J,CHEN Y,et al.Boosting Implicit Discourse Relation Recognition with Connective-based Word Embeddings[J].Neurocomputing,2019,369:39-49.
[38]YIM J,JOO D,BAE J,et al.A Gift from Knowledge Distillation:Fast Optimization,Network Minimization and Transfer Learning[C]//Proceeding of CVPR 2017.2017:7130-7138.
[39]ZHANG Y,XIANG T,HOSPEDALES T M,et al.Deep Mutual Learning[C]//Proceeding of CVPR 2018.2018:4320-4328.
[40]LIU X,LIU K,LI X,et al.An Iterative Multi-Source MutualKnowledge Transfer Framework for Machine Reading Comprehension[C]//Proceedings of IJCAI 2020.2020:3794-3800.
[42]PARIKH A,TÄCKSTRÖM O,DAS D,et al.A Decomposable Attention Model for Natural Language Inference[C]//Procee-dings of EMNLP 2016.2016:2249-2255.
[43]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:A Simple Way to Prevent Neural Networks from Overfitting[J].The Journal of Machine Learning Research,2014,15(1):1929-1958.
[44]PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]//Proceedings of EMNLP 2014.2014:1532-1543.
[45]CAI D,ZHAO H.Pair-Aware Neural Sentence Modeling for Implicit Discourse Relation Classification[C]//Proceedings of Advances in Artificial Intelligence:From Theory to Practice 2017.2017:458-466.
[46]GUO F,HE R,DANG J,et al.Working Memory-Driven Neural Networks with a Novel Knowledge Enhancement Paradigm for Implicit Discourse Relation Recognition[C]//Proceeding of AAAI 2020.2020:10-18.
[47]PETERS M,NEUMANN M,IYYER M,et al.Deep Contextua-lized Word Representations[C]//Proceedings of NAACL 2018.2018:2227-2237.
[48]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT.2019:4171-4186.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[5] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[6] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[7] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[8] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[9] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[10] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[11] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[12] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[13] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[14] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15] 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩.
基于Transformer和LSTM的药物相互作用预测
Drug-Drug Interaction Prediction Based on Transformer and LSTM
计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!