Computer Science ›› 2020, Vol. 47 ›› Issue (10): 207-214.doi: 10.11896/jsjkx.191200183

• Artificial Intelligence • Previous Articles     Next Articles

Open Domain Event Vector Algorithm Based on Zipf's Co-occurrence Matrix Factorization

GAO Li-zheng1, ZHOU Gang1, HUANG Yong-zhong2, LUO Jun-yong1, WANG Shu-wei1   

  1. 1 State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450001,China
    2 School of Computer Science and Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541000,China
  • Received:2019-12-31 Revised:2020-04-28 Online:2020-10-15 Published:2020-10-16
  • About author:GAO Li-zheng,born in 1990,doctoral student.His research interests include information extraction and data mining.
    ZHOU Gang,born in 1974,Ph.D,research fellow,professor.His research interests include big data and data mining.
  • Supported by:
    National Natural Science Foundation of China (61602508,61866008)

Abstract: Event extraction is one of the hot topics of natural language processing (NLP).Existing event extraction models are mostly trained on small-scale corpora and are unable to be applied to open domain event extraction.To alleviate the difficulty of event representation in large-scale open domain event extraction,we propose a method for event embedding based on Zipf's co-occurrence matrix factorization.We firstly extract event tuples from large-scale open domain corpora and then proceed with tuple abstraction,pruning and disambiguation.We use Zipf's co-occurrence matrix to represent the context distribution of events.The built co-occurrence matrix is then factorized by principal component analysis (PCA) to generate event vectors.Finally,we construct an autoencoder to transform the vectors nonlinearly.We test the generated vectors on the task of nearest neighbors and event identification.The experimental results prove that our method can capture the information of event similarity and relativity globally and avoids the semantic deviation caused by the too fine granularity of encoding.

Key words: Context distribution, Event representation, Open domain event extraction, Zipf's co-occurrence matrix

CLC Number: 

  • TP391
[1]CHEN Y,LIU S,ZHANG X,et al.Automatically labeled data generation for large scale event extraction[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Vancouver,Canada:ACL,2017:409-419.
[2]HARRIS Z S.Distributional structure[J].Word,1954,10(2/3):146-162.
[3]GAO L,ZHOU G,LUO J,et al.Word Embedding With Zipf's Context[J].IEEE Access,2019,7:168934-168943.
[4]MIKOLOV T,CHEN K,CORRADO G,et al.Linguistic regularities in continuous space word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Seattle,USA:ACL,2013:746-751.
[5]PENNINGTON J,SOCHER R,MANNING C.Glove:Globalvectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).Doha,Qatar:ACL,2014:1532-1543.
[6]BOJANOWSKI P,GRAVE E,JOULIN A,et al.Enrichingword vectors with subword information[J].Transactions of the Association for Computational Linguistics.Vancouver,Canada:ACL,2017:135-146.
[7]NGUYEN T H,GRISHMAN R.Event detection and domainadaptation with convolutional neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.Beijing,China:ACL,2015:365-371.
[8]CHEN Y,XU L,LIU K,et al.Event extraction via dynamic multi-pooling convolutional neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.Beijing,China:ACL,2015:167-176.
[9]NGUYEN T H,GRISHMAN R.Graph convolutional networks with argument-aware pooling for event detection[C]//Thirty-Second AAAI Conference on Artificial Intelligence.Louisiana,USA:AAAI,2018.
[10]SHA L,QIAN F,CHANG B,et al.Jointly Extracting EventTriggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction[C]//Thirty-Second AAAI Conference on Artificial Intelligence.Louisiana,USA:AAAI,2018.
[11]ORR J W,TADEPALLI P,FERN X.Event Detection withNeural Networks:A Rigorous Empirical Evaluation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels,Belgium:ACL,2018:999-1004.
[12]ZHANG J,QIN Y,ZHANG Y,et al.Extracting entities andevents as a single task using a transition-based neural model[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence.Hawaii,USA:AAAI,2019:5422-5428.
[13]NGUYEN T M,NGUYEN T H.One for all:Neural joint modeling of entities and events[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Hawaii,USA:AAAI,2019:6851-6858.
[14]LIU J,CHEN Y,LIU K.Exploiting the Ground-Truth:An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Hawaii,USA:AAAI,2019,33:6754-6761.
[15]HUANG L,CASSIDY T,FENG X,et al.Liberal event extraction and event schema induction[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers).Berlin,Germany:ACL,2016:258-268.
[16]DING X,ZHANG Y,LIU T,et al.Deep learning for event-driven stock prediction[C]//Twenty-fourth international joint conference on artificial intelligence.Menlo Park,CA:AAAI,2015.
[17]DING X,ZHANG Y,LIU T,et al.Knowledge-driven event embedding for stock prediction[C]//26th International Conference on Computational Linguistics:Technical Papers.Osaka,Japan:COLING,2016:2133-2142.
[18]ARORA S,LIANG Y,MA T.A simple but tough-to-beat baseline for sentence embeddings[C]//Proc. ICLR.Toulon,France,2017.
[19]BULLINARIA J A,LEVY J P.Extracting semantic representations from word co-occurrence statistics:A computational study[J].Behavior Research Methods,2007,39(3):510-526.
[20]BAKER C F,FILLMORE C J,LOWE J B.The berkeley fra-menet project[C]//Proceedings of the 17th International Conference on Computational Linguistics.Association for Computational Linguistics.Quebec,Canada:ACL,1998:86-90.
[21]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proc.NAACL.2019:4171-4186.
[22]RANA D S,MISHRA P K.Paraphrase Detection using Dependency Tree Recursive Autoencoder[C]//2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS).IEEE,2019:678-683.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] WANG Ming, WU Wen-fang, WANG Da-ling, FENG Shi, ZHANG Yi-fei. Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity [J]. Computer Science, 2022, 49(9): 33-40.
[3] ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[4] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[5] SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[6] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[7] ZHENG Wen-ping, LIU Mei-lin, YANG Gui. Community Detection Algorithm Based on Node Stability and Neighbor Similarity [J]. Computer Science, 2022, 49(9): 83-91.
[8] LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi. Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network [J]. Computer Science, 2022, 49(9): 92-100.
[9] XU Tian-hui, GUO Qiang, ZHANG Cai-ming. Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance [J]. Computer Science, 2022, 49(9): 101-110.
[10] NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[11] CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang. Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model [J]. Computer Science, 2022, 49(9): 123-131.
[12] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[13] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[14] QU Qian-wen, CHE Xiao-ping, QU Chen-xin, LI Jin-ru. Study on Information Perception Based User Presence in Virtual Reality [J]. Computer Science, 2022, 49(9): 146-154.
[15] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!