Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 211000161-7.doi: 10.11896/jsjkx.211000161

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Noise Event Classification Model Based on Multimodal Attention

WU He-xiang, WANG Zhong-qing, LI Pei-feng   

  1. School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:WU He-xiang,born in 1998,postgra-duate.His main research interests include natural language processing and so on.
    WANG Zhong-qing,born in 1987,Ph.D,lecturer,is a member of China Compu-ter Federation.His main research in-terest is natural language processing.
  • Supported by:
    National Natural Science Foundation of China(61806137,61702518,61836007),Natural Science Foundation of Jiangsu Province(18KJB520043) and A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Abstract: Social media is nowadays one of the main channels for people to obtain news and learn about real-time events due to its low cost,easy access and rapid dissemination.Social media provides a variety of modal information including text and images for analyzing specific events,which contains abundant irrelevant events and false information.To this end,this paper combines the text-image pairs to determine whether the text and image provide information related to specific events,so as to find out irrelevant noise events from the sentence-level of the text.Motivated by the observation that the description in the text is often associated with the scene in the corresponding image,this paper proposes a method of combining text and image information to classify events based on attention mechanism,which can effectively attend to the important information in text and image and promote information interaction in different modalities.Experimental results on CrisisMMD show that our model outperforms six strong baselines,and it can effectively fuse features of different modality to obtain a superior joint representation.

Key words: Attention mechanism, Multimodal fusion, Noise event classification

CLC Number: 

  • TP391
[1]ALAM F,OFLI F,IMRAN M.CrisisMMD:Multimodal twitter datasets from natural disasters[C]//Proceedings of the Twelfth International Conference on Web and Social Media.California,USA:2018:465-473.
[2]OFLI F,ALAM F,IMRAN M.Analysis of social media datausing multimodal deep learning for disaster response[C]//Proceedings of the 17th ISCRAM Conference.Blacksburg,VA,USA:2020.
[3]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association of Computational Linguistics:Human Language Technologies.Minneapolis,Minnesota:Association for Computational Linguistics,2019:4181-4186.
[4]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference on Learning Representations.San Diego,CA,USA:2015.
[5]YANG Z,HE X,GAO J,et al.Stacked attention networks forimage question answering[C]//Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition.Las Vegas,NV,USA,2016:21-29.
[6]LIAO S,GRISHMAN R.Using document level cross-event inference to improve event extraction[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.Uppsala,Sweden,2010:789-797.
[7]HONG Y,ZHANG J,MA B,et al.Using cross-entity inference to improve event extraction[C]//Proceedings of the 49th AnnualMeeting of the Association for Computational Linguistics.Portland,Oregon,USA,2011:1127-1136.
[8]LI Q,JI H,HUANG L.Joint event extraction via structured prediction with global features[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics.Sofia,Bulgaria,2013:73-82.
[9]CHEN Y,XU L,LIU K,et al.Event extraction via dynamic multi-pooling convolutional neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computa-tional Linguistics and the 7th International Joint Conference on Natural Language Processing.2015:167-176.
[10]NGUYEN T H,CHO K,GRISHMAN R.Joint event extraction via recurrent neural networks[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin,Germany,2016:300-309.
[11]YANG S,FENG D,QIAO L,et al.Exploring pretrained lan-guage models for event extraction and generation[C]//Procee-dings of the 57th Annual Meeting of the Association for Computational Linguistics.Florence,Italy,2019:5284-5294.
[12]SHA L,QIAN F,CHANG B,et al.Jointly extracting event triggers and arguments by dependency-bridge RNN and tensor-based argument interaction[C]//Proceedings of the Thirty-Se-cond AAAI Conference on Artificial Intelligence.New Orleans,Louisiana,USA,2018:5916-5923.
[13]HOCHREITER S,SCHMIDHUBER J.Long short-term Memory[J].Neural Computation,1997,9(8):1735-1780.
[14]LIU X,LUO Z,HUANG H.Jointly multiple events extraction via attention-based graph information Aggregation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels,Belgium,2018:1247-1256.
[15]WANG X,HAN X,LIU Z,et al.Adversarial training for weakly supervised event detection[C]//Proceedings of the 2019Confe-rence of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Min-neapolis,MN,USA,2019:998-1008.
[16]TONG M,XU B,WANG S,et al.Improving event detection via open-domain trigger knowledge[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Online,ACL,2020:5887-5897.
[17]ZHANG T,WHITEHEAD S,ZHANG H,et al.ImprovingEvent Extraction via Multimodal Integration[C]//Proceedings of the 2017 ACM on Multimedia Conference.Mountain View,CA,USA,2017:270-278.
[18]LI M,ZAREIAN A,ZENG Q,et al.Cross-media structuredcommon space for multimedia event extraction[C]//Procee-dings of the 58th Annual Meeting of the Association for Computational Linguistics.Online,2020:2557-2568.
[19]TONG M,WANG S,CAO Y,et al.Image enhanced event detection in news articles[C]//Proceedings of the 34th AAAI Confe-rence on Artificial Intelligence.New York,NY,USA,2020:9040-9047.
[20]D’MELLO S K,KORY J M.A review and meta-analysis of multimodal affect detection systems[J].ACM Computing Surveys,2015,47(3):43:1-43:36.
[21]MORVANT E,HABRARD A,AYACHE S.Majority vote of diverse classifiers for late fusion[C]//Structural,Syntactic,and Statistical Pattern Recognition-Joint IAPR International Workshop.Joensuu,Finland,2014:153-162.
[22]WANG Y,HUANG W,SUN F,et al.Deep multimodal fusion by channel exchanging[C]//Proceedings of the Thirty-fourth Conference on Neural Information Processing Systems.Vir-tual,2020.
[23]PEREZ-RUA J M,VIELZEUF V,PATEUX S,et al.MFAS:Multimodal fusion architecture search[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA,2019:6959-6968.
[24]ZADEH A,CHEN M,PORIA S,et al.Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Proces-sing.Copenhagen,Denmark,2017:1103-1114.
[25]SAHU G,VECHTOMOVA O.Adaptive fusion techniques for multimodal data[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:Main Volume.2021:3156-3166.
[26]HORI C,HORI T,LEE T Y,et al.Attention-based multimodal fusion for video description[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy,2017:4203-4212.
[27]KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha,Qatar,2014:1746-1751.
[28]LU J,YANG J.BATRA D,et al.Hierarchical question-imageco-attention for visual question answering[C]//Proceedings of the Thirtieth Annual Conference on Neural Information Processing Systems.Barcelona,Spain,2016:289-297.
[29]NAM H,HA J W,KIM J.Dual attention networks for multimodal reasoning and matching[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA,2017:2156-2164.
[30]MAJUMDAR A,SHRIVASTAVA A,LEE S,et al.Improving vision-and-language navigation with image-text pairs from the web[C]//Proceedings of the 2020 European Conference on Computer Vision.Glasgow,UK,ECCV,2020:259-274.
[31]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Miami,Florida,USA,2009:248-255.
[32]ALI S R,HOSSEIN A,JOSEPHINE S,et al.CNN features off-the-shelf:An astounding baseline for recognition[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition,CVPR Workshops 2014.Columbus,OH,USA,2014:512-519.
[33]KINGMA D P,BA J.Adam:A method for stochastic optimization[C]//Proceedings of the 3rd International Conference on Learning Representations,ICLR 2015.San Diego,Ca,USA.
[34]KIELA D,BHOOSHAN S,FIROOZ H,et al.Supervised multimodal bitransformers for classifying images and text[J].arXiv:1909.02950.
[35]LI X,YIN X,LI C,et al.Oscar:Object-semantic aligned pre-training for vision-language tasks[C]//Proceedings of the 16th European Conference on Computer Vision.Glasgow,UK,2020:121-137.
[1] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[2] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[3] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[4] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[5] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[6] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[7] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[8] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[9] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[10] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[11] ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.
[12] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[13] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[14] MENG Yue-bo, MU Si-rong, LIU Guang-hui, XU Sheng-jun, HAN Jiu-qiang. Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism [J]. Computer Science, 2022, 49(7): 142-147.
[15] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!