计算机科学 ›› 2020, Vol. 47 ›› Issue (10): 207-214.doi: 10.11896/jsjkx.191200183
高李政1, 周刚1, 黄永忠2, 罗军勇1, 王树伟1
GAO Li-zheng1, ZHOU Gang1, HUANG Yong-zhong2, LUO Jun-yong1, WANG Shu-wei1
摘要: 事件抽取是自然语言处理(Natural Language Processing,NLP)领域的一个研究热点。现有的事件抽取模型大多基于小规模训练集,无法应用于大规模开放领域。针对大规模开放域事件抽取中事件表征困难的问题,提出了一种基于Zipf's共生矩阵分解的事件向量计算方法。首先,从开放语料中提取事件元组作为事件标签,并对事件元组进行抽象、剪枝和消歧。然后,利用Zipf's共生矩阵表示事件的上下文分布,利用主成分分析(Principal Component Analysis,PCA)对共生矩阵进行分解,得到初始事件向量,并利用自编码器对初始事件向量进行非线性变换。采用最近邻检测和事件检测两种任务对事件向量的性能进行测试,结果表明,基于Zipf's共生矩阵分解得到的事件向量能够对事件之间的相似性和相关性信息进行全局性表征,避免编码过细而造成语义偏移。
中图分类号:
[1]CHEN Y,LIU S,ZHANG X,et al.Automatically labeled data generation for large scale event extraction[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Vancouver,Canada:ACL,2017:409-419. [2]HARRIS Z S.Distributional structure[J].Word,1954,10(2/3):146-162. [3]GAO L,ZHOU G,LUO J,et al.Word Embedding With Zipf's Context[J].IEEE Access,2019,7:168934-168943. [4]MIKOLOV T,CHEN K,CORRADO G,et al.Linguistic regularities in continuous space word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Seattle,USA:ACL,2013:746-751. [5]PENNINGTON J,SOCHER R,MANNING C.Glove:Globalvectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).Doha,Qatar:ACL,2014:1532-1543. [6]BOJANOWSKI P,GRAVE E,JOULIN A,et al.Enrichingword vectors with subword information[J].Transactions of the Association for Computational Linguistics.Vancouver,Canada:ACL,2017:135-146. [7]NGUYEN T H,GRISHMAN R.Event detection and domainadaptation with convolutional neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.Beijing,China:ACL,2015:365-371. [8]CHEN Y,XU L,LIU K,et al.Event extraction via dynamic multi-pooling convolutional neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.Beijing,China:ACL,2015:167-176. [9]NGUYEN T H,GRISHMAN R.Graph convolutional networks with argument-aware pooling for event detection[C]//Thirty-Second AAAI Conference on Artificial Intelligence.Louisiana,USA:AAAI,2018. [10]SHA L,QIAN F,CHANG B,et al.Jointly Extracting EventTriggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction[C]//Thirty-Second AAAI Conference on Artificial Intelligence.Louisiana,USA:AAAI,2018. [11]ORR J W,TADEPALLI P,FERN X.Event Detection withNeural Networks:A Rigorous Empirical Evaluation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels,Belgium:ACL,2018:999-1004. [12]ZHANG J,QIN Y,ZHANG Y,et al.Extracting entities andevents as a single task using a transition-based neural model[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence.Hawaii,USA:AAAI,2019:5422-5428. [13]NGUYEN T M,NGUYEN T H.One for all:Neural joint modeling of entities and events[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Hawaii,USA:AAAI,2019:6851-6858. [14]LIU J,CHEN Y,LIU K.Exploiting the Ground-Truth:An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Hawaii,USA:AAAI,2019,33:6754-6761. [15]HUANG L,CASSIDY T,FENG X,et al.Liberal event extraction and event schema induction[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers).Berlin,Germany:ACL,2016:258-268. [16]DING X,ZHANG Y,LIU T,et al.Deep learning for event-driven stock prediction[C]//Twenty-fourth international joint conference on artificial intelligence.Menlo Park,CA:AAAI,2015. [17]DING X,ZHANG Y,LIU T,et al.Knowledge-driven event embedding for stock prediction[C]//26th International Conference on Computational Linguistics:Technical Papers.Osaka,Japan:COLING,2016:2133-2142. [18]ARORA S,LIANG Y,MA T.A simple but tough-to-beat baseline for sentence embeddings[C]//Proc. ICLR.Toulon,France,2017. [19]BULLINARIA J A,LEVY J P.Extracting semantic representations from word co-occurrence statistics:A computational study[J].Behavior Research Methods,2007,39(3):510-526. [20]BAKER C F,FILLMORE C J,LOWE J B.The berkeley fra-menet project[C]//Proceedings of the 17th International Conference on Computational Linguistics.Association for Computational Linguistics.Quebec,Canada:ACL,1998:86-90. [21]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proc.NAACL.2019:4171-4186. [22]RANA D S,MISHRA P K.Paraphrase Detection using Dependency Tree Recursive Autoencoder[C]//2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS).IEEE,2019:678-683. |
[1] | 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112 |
[2] | 王明, 武文芳, 王大玲, 冯时, 张一飞. 生成链接树:一种高数据真实性的反事实解释生成方法 Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity 计算机科学, 2022, 49(9): 33-40. https://doi.org/10.11896/jsjkx.220300158 |
[3] | 张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131 |
[4] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[5] | 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲. 基于无监督集群级的科技论文异质图节点表示学习方法 Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level 计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196 |
[6] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[7] | 郑文萍, 刘美麟, 杨贵. 一种基于节点稳定性和邻域相似性的社区发现算法 Community Detection Algorithm Based on Node Stability and Neighbor Similarity 计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146 |
[8] | 吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇. 基于异质信息网的短文本特征扩充方法 Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network 计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241 |
[9] | 徐天慧, 郭强, 张彩明. 基于全变分比分隔距离的时序数据异常检测 Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance 计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174 |
[10] | 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130 |
[11] | 曹晓雯, 梁美玉, 鲁康康. 基于细粒度语义推理的跨媒体双路对抗哈希学习模型 Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model 计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011 |
[12] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[13] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[14] | 曲倩文, 车啸平, 曲晨鑫, 李瑾如. 基于信息感知的虚拟现实用户临场感研究 Study on Information Perception Based User Presence in Virtual Reality 计算机科学, 2022, 49(9): 146-154. https://doi.org/10.11896/jsjkx.220500200 |
[15] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
|