Computer Science ›› 2022, Vol. 49 ›› Issue (9): 132-138.doi: 10.11896/jsjkx.220600022

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification

ZHOU Xu1, QIAN Sheng-sheng2, LI Zhang-ming2, FANG Quan2, XU Chang-sheng2   

  1. 1 Henan Institute of Advanced Technology,Zhengzhou University,Zhengzhou 450000,China
    2 National Key Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China
  • Received:2022-06-02 Revised:2022-07-05 Online:2022-09-15 Published:2022-09-09
  • About author:ZHOU Xu,born in 1997, postgraduate.His main research interests include na-tural language processing and multi-media computing analysis.
    XU Chang-sheng,born in 1969,Ph.D,professor.His main research interests include computer vision and multimedia computing analysis.
  • Supported by:
    National Natural Science Foundation of China(61936005).

Abstract: The rapid development of the Internet and the continuous expansion of social media have brought a wealth of social event information,and the task of social event classification has become increasingly challenging.Making full use of image-level and text-level information is the key to social event classification.However,most of existing methods have the following limitations:1) Most of the existing multi-modal methods have an ideal assumption that the samples of each modality are sufficient and complete,but in real applications this assumption does not always hold and there will be cases where a certain modality of events is missing;2) Most methods simply concatenate image features and text features of social events to obtain multi-modal features to classify social events.To address these challenges,this paper proposes a dual variational multi-modal attention network(DVMAN) for social event classification to address the limitations of these existing methods.In the DVMAN network,this paper proposes a novel dual variational autoencoders network to generate public representations of social events and further reconstruct the missing modal information in incomplete social event learning.Through distribution alignment and cross-reconstruction alignment,image and text latent representations are doubly aligned to mitigate the gap between different modalities,and for the mis-sing modality information,a generative model is utilized to synthesize its latent representations.In addition,this paper designs a multi-modal fusion module to integrate the fine-grained information of images and texts of social events,so as to realize the complementation and enhancement of information between modalities.This paper conducts extensive experiments on two publicly available event datasets,compared with the existing advanced methods,the accuracy of DVMAN improves by more than 4%.It demonstrates the superior performance of the proposed method for social event classification.

Key words: Multi-modal, Social event classification, Social media, Incomplete data learning

CLC Number: 

  • TP391
[1]GOOLSBY R.Social media as crisis platform:The future ofcommunity maps/crisis maps[J].ACM Transactions on Intelligent Systems and Technology(TIST),2010,1(1):1-11.
[2]KUMAR S,BARBIER G,ABBASI M,et al.Tweettracker:An analysis tool for humanitarian and disaster relief[C]//Procee-dings of the International AAAI Conference on Web and Social Media.2011:661-662.
[3]IRINA S,LEYSIA P,JEANNETTE al.Finding community through information and communication technology in disaster response[C]//Proceedings of the ACM conference on Computer Supported Cooperative Work.2008:127-136.
[4]ABAVISANI M,WU L W,HU S L,et al.Multimodal categorization of crisis events in social media[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:14679-14689.
[5]DOUWE K,SUVRAT B,HAMED F,et al.Supervised multi-modal bitransformers for classifying images and text[J].arXiv:1909.02950,2019.
[6]FERDA O,FIROJ A,MUHAMMAD I.Analysis of social media data using multimodal deep learning for disaster response[J].arXiv:2004.11838,2020.
[7]XUKUN L DOINA C.Improving Disaster-related Tweet Classification with a Multimodal Approach[C]//Social Media for Disaster Response and Resilience Proceedings of the 17th ISCRAM Conference.2020:893-902.
[8]MAO Y D,JIANG Q P,CONG R M,et al.Cross-Modality Fusion and Progressive Integration Network for Saliency Prediction on Stereoscopic 3D Images[J].IEEE Transactions on Multimedia,2022,24:2435-2448.
[9]FIROJ A,FERDA O,MUHAMMAD I.Crisismmd:Multimodal twitter datasets from natural disasters[J].arXiv:1805.00713,2018.
[10]ELENA K,MARIA L,ARKAITZ Z.All-in-one:Multi-task lear-ning for rumour verification[J].arXiv:1806.03713,2018.
[11]BHARATH S,DAVE F,ENGIN D,et al.Short text classifica-tion in twitter to improve information filtering[C]//Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2010:841-842.
[12]TOMAS M,ILYA S,KAI C,et al.Distributed representations of words and phrases and their compositionality[J].arXiv:1310.4546,2013.
[13]BEVERLY E P,PROCESO L F,JAIME T B.Automatic classi-fication of disaster-related tweets[C]//Proceedings of International Conference on Innovative Engineering Technologies(ICIET).2014:62.
[14]LEE K,PALSETIA D,NAYAYANAN R,et al.Twitter trending topic classification[C]//IEEE 11th International Confe-rence on Data Mining Workshops.IEEE,2011:251-258.
[15]KELLY S,ZHANG X B ,AHMAD K.Mining multimodal information on social media for increased situational awareness[C]//Proceedings of the 14th International Conference on Information Systems for Crisis Response And Management.2017:613-622.
[16]MOZANNAR H,RIZK Y,AWAD M.Damage Identification in Social Media Posts using Multimodal Deep Learning[C]//The 15th International Conference on Information Systems for Crisis Response and Management(ISCRAM).2018.
[17]WU Y,ZHAN P,ZHANG Y,et al.Multimodal Fusion withCo-Attention Networks for Fake News Detection[C]//Fin-dings of the Association for Computational Linguistics.2021:2560-2569.
[18]QI P,CAO J,LI X,et al.Improving Fake News Detection by Using an Entity-enhanced Framework to Fuse Diverse Multimodal Clues[C]//Proceedings of the 29th ACM International Confe-rence on Multimedia.2021:1212-1220.
[19]IAN G,JEAN P A,MEHDI M,et al.Generative adversarialnets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2(NIPS'14).2014:2672-2680.
[20]PAN Y S, LIU M X,LIAN C F,et al.Spatially-constrained Fisher representation for brain disease identification with incomplete multi-modal neuroimages[J].IEEE Transactions on Medical Imaging 2020,39(9):2965-2975.
[21]APOORVA S,JITENDER S V,DEEPTI R B,et al.MRI toPET Cross-Modality Translation using Globally and Locally AwareGAN(GLA-GAN) for Multi-Modal Diagnosis of Alzheimer's Disease[J].arXiv:2108.02160,2021.
[22]WANG Y.Survey on deep multi-modal data analytics:collaboration,rivalry,and fusion[J].ACM Transactions on Multi-media Computing,Communications,and Applications(TOMM),2021,17(1s):1-25.
[23]GU Y C,ZHANG L,LIU Y,et al.Generalized zero-shot lear-ning via VAE-conditioned generative flow[J].arXiv:2009.00303,2020.
[24]GUO J,ZHU W.Collective affinity learning for partial cross-modal hashing[J].IEEE Transactions on Image Processing,2019,29:1344-1355.
[25]TSAI Y,HUANG L K,SALAKHUTDINOV R.Learning ro-bust visual-semantic embeddings[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:3571-3580.
[26]MUKHERJEE T,YAMADA M,M HOSPEDALES T.Deepmatching autoencoders[J].arXiv:1711.06047,2017.
[27]SUZUKI M,NAKAYAMA K,MATSUO Y.Improving bi-directional generation between different modalities with varia-tional autoencoders[J].arXiv:1801.08702,2018.
[28]ANDRÉS M,RAFAEL S R,LLUÍS G,et al.StacMR:Scene-Text Aware Cross-Modal Retrieval[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2021:2220-2230.
[29]ZHU Y,WU Y,HUGO L,et al.2021.Learning audio-visual correlations from variational cross-modal generation[C]//2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2021).IEEE,2021:4300-4304.
[30]DIEDERIK P K,MAX W.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[31]AKIRA F,DONG H P,DAYLEN Y,et al.Multimodal compact bilinear pooling for visual question answering and visual groun-ding[J].arXiv:1606.01847,2016.
[32]NIHAR B,KEVIN D,PEYMAN N.Generalized zero-shotlearning using multimodal variational auto-encoder with semantic concepts[C]//2021 IEEE International Conference on Image Processing(ICIP).IEEE,2021:1284-1288.
[33]FERDA O,FIROJ A,MUHAMMAD I.Analysis of socialmedia data using multimodal deep learning for disaster response[J].arXiv:2004.11838,2020.
[34]GAO H,ZHUANG L,LAURENS V D M,et al.Densely connected convolutional networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[35]JACOB D,CHANG M W,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,NAACL-HLT.2019:4171-4186.
[36]MASASHI S,MATTHIAS K,KLAUS-ROBERT M.Cova-riate shift adaptation by importance weighted cross validation[J].Journal of Machine Learning Research,2007,8(5):985-1005.
[37]DIEDERIK P K,JIMMY B.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[1] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[2] YAO Xiao-ming, DING Shi-chang, ZHAO Tao, HUANG Hong, LUO Jar-der, FU Xiao-ming. Big Data-driven Based Socioeconomic Status Analysis:A Survey [J]. Computer Science, 2022, 49(4): 80-87.
[3] LIU Chuang, XIONG De-yi. Survey of Multilingual Question Answering [J]. Computer Science, 2022, 49(1): 65-72.
[4] ZHOU Xin-min, HU Yi-gui, LIU Wen-jie, SUN Rong-jun. Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method [J]. Computer Science, 2021, 48(9): 50-58.
[5] DAI Hong-liang, ZHONG Guo-jin, YOU Zhi-ming , DAI Hong-ming. Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark [J]. Computer Science, 2021, 48(9): 118-124.
[6] WU A-ming, JIANG Pin, HAN Ya-hong. Survey of Cross-media Question Answering and Reasoning Based on Vision and Language [J]. Computer Science, 2021, 48(3): 71-78.
[7] WANG Shu-hui, YAN Xu, HUANG Qing-ming. Overview of Research on Cross-media Analysis and Reasoning Technology [J]. Computer Science, 2021, 48(3): 79-86.
[8] WANG Li-fang, WANG Rui-fang, LIN Su-zhen, QIN Pin-le, GAO Yuan, ZHANG Jin. Multimodal Medical Image Fusion Based on Dual Residual Hyper Densely Networks [J]. Computer Science, 2021, 48(2): 160-166.
[9] CHEN Jie-ting, WANG Wei-ying, JIN Qin. Multi-label Video Classification Assisted by Danmaku [J]. Computer Science, 2021, 48(1): 167-174.
[10] ZHANG Zhi-yang, ZHANG Feng-li, TAN Qi, WANG Rui-jin. Review of Information Cascade Prediction Methods Based on Deep Learning [J]. Computer Science, 2020, 47(7): 141-153.
[11] ZHANG Zhi-yang, ZHANG Feng-li, CHEN Xue-qin, WANG Rui-jin. Information Cascade Prediction Model Based on Hierarchical Attention [J]. Computer Science, 2020, 47(6): 201-209.
[12] DENG Yi-jiao, ZHANG Feng-li, CHEN Xue-qin, AI Qing, YU Su-zhe. Collaborative Attention Network Model for Cross-modal Retrieval [J]. Computer Science, 2020, 47(4): 54-59.
[13] SONG Chang,YU Ke,WU Xiao-fei. Fake Account Detection Method in Online Social Network Based on Improved Edge Weighted Paired Markov Random Field Model [J]. Computer Science, 2020, 47(2): 251-255.
[14] YAO Lin-li, CHEN Shi-zhe, JIN Qin. Fine-grained Facial Makeup Image Ordering via Language [J]. Computer Science, 2020, 47(12): 161-168.
[15] LIN Min-hong, MENG Zu-qiang. Multimodal Sentiment Analysis Based on Attention Neural Network [J]. Computer Science, 2020, 47(11A): 508-514.
Full text



No Suggested Reading articles found!