基于多特征融合的细粒度视频人物关系抽取

doi:10.11896/jsjkx.200800160

摘要/Abstract

摘要： 视频人物关系抽取是信息抽取问题中的重要任务,在视频描述、视频检索,以及人物搜索、公安监察等方面具有重要价值。由于视频数据的底层像素与高层关系语义之间存在巨大的鸿沟,现有方法很难准确地抽取人物关系。现有研究大多通过粗粒度地分析人物共现等因素来抽取人物关系,忽略了具有丰富语义的视频中的细粒度信息。为解决现有算法难以准确、完整地抽取视频人物关系的问题,文中提出了一种基于多特征融合的细粒度视频人物关系抽取方法。首先,为了准确识别视频人物实体,提出了一种基于多特征融合的人物实体识别模型;然后,提出了一种基于细粒度特征的人物关系识别模型,该模型不仅融合了视频中人物的时空特征,而且考虑了与人物相关的细粒度物体信息特征,从而建立更好的映射关系来准确识别人物关系。以电影视频数据和SRIV人物关系识别数据集为实验数据,实验结果验证了该模型的有效性和准确性,与现有同类模型相比,所提模型的人物实体识别F₁值提高了约14.4%,人物关系识别的准确率提高了约10.1%。

关键词: 多特征融合, 关系抽取, 人物关系识别, 深度学习, 视频分析

Abstract: Video character relation extraction is an important task of information extraction.It is valuable for video description,video retrieval,character search,public security supervision,etc.Due to the huge gap between the underlying pixels of video data and the semantics of high-level relation,it is difficult to accurately extract the relations.Most existing studies are based on coarse- granularity analysis,such as co-occurrence of characters,which ignores the fine-granularity information.In order to solve the problem that it is difficult to accurately and completely extract the relations among video characters,this paper proposes a new method for extracting relations of video characters based on multi-feature fusion and fine-granularity analysis.First,a new character entity recognition model,named CRMF(Character Recognition based on Multi-feature Fusion),is proposed.Through this manner,we can generate a more complete character set using face and body features fusion.Second,we exploit a character relationship recognition model based on fine-granularity features,named FGAG(Fine-Granularity Analysis based on GCN),which not only fuses the spatio-temporal features,but also considers the fine-granularity objects information related to the characters.Thus a better mapping can be established to accurately identify the character relations.Comprehensive evaluations are conducted on the movie video and SRIV character relationship recognition dataset,and the experimental results demonstrate that the proposed method outperforms the state-of-the-art methods on character entity and relation recognition,F₁ value increases by 14.4% and accuracy increases by 10.1%.

Key words: Character relation recognition, Deep learning, Multi-feature fusion, Relation extraction, Video analysis

中图分类号:

TP391

吕金娜, 邢春玉, 李莉. 基于多特征融合的细粒度视频人物关系抽取[J]. 计算机科学, 2021, 48(4): 117-122. https://doi.org/10.11896/jsjkx.200800160

LYU Jin-na, XING Chun-yu , LI Li. Video Character Relation Extraction Based on Multi-feature Fusion and Fine-granularity Analysis[J]. Computer Science, 2021, 48(4): 117-122. https://doi.org/10.11896/jsjkx.200800160

参考文献

[1]ZHANG Z,LUO P,LOY C C,et al.Learning Social RelationTraits from Face Images[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision.Santiago,Chile,2015:3631-3639.
[2]LI J,WONG Y,ZHAO Q,et al.Dual-Glance Model for Deciphering Social Relationships[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2669-2678.
[3]VICOL P,TAPASWI M,CASTREJON L,et al.MovieGraphs:Towards Understanding Human-Centric Situations from Videos[C]//Proceedings of the CVPR.2018:8581-8590.
[4]TRAN Q D,JUNG J E.CoCharNet:Extracting Social Networks using Character Co-occurrence in Movies [J].Journal of Universal Computer Science,2015,21(6):796-815.
[5]WENG C,CHU W,WU J.RoleNet:Movie Analysis from the Perspective of Social Networks[J].IEEE Transactions on Multimedia,2009,11(2):256-271.
[6]YUAN K,YAO H,JI R,et al.Mining actor correlations with hie-rarchical concurrence parsing[C]//Proceedings of the IEEE International Conference on Acoustics,Speech,and Signal Processing.Dallas,Texas,USA,2010:798-801.
[7]WANG G,GALLAGHER A C,LUO J,et al.Seeing People in Social Context:Recognizing People and Social Relationships[C]//Proceedings of the Computer Vision- ECCV 2010-11th European Conference on Computer Vision.Crete,Greece,2010:169-182.
[8]DAI Q,CARR P,SIGAL L,et al.Family Member Identification from Photo Collections[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision.Waikoloa,HI,USA,2015:982-989.
[9]SAPRU A,BOURLARD H.Automatic Recognition of Emer-gent Social Roles in Small Group Interactions [J].IEEE Trans.Multimedia,2015,17(5):746-760.
[10]BOJANOWSKI P,BACH F R,LAPTEV I,et al.Finding Actors and Actions in Movies[C]//Proceedings of the IEEE International Conference on Computer Vision.Sydney,Australia,2013:2280-2287.
[11]LV J,LIU W,ZHOU L,et al.Multi-stream Fusion Model for Social Relation Recognition from Videos[C]//MultiMedia Modeling 24th International Conference.Bangkok,Thailand,2018:355-368.
[12]YAN H,HU J.Video-based kinship verification using distance metric learning [J].Pattern Recognition,2018,75:15-24.
[13]HE X M,CHN Y D,LI D.A Construction for Social Network on the Basis of Project Cooperation[J].Journal of Computer Research and Development,2016,53(4):776-784.
[14]MIKA P.Flink:Semantic Web technology for the extraction and analysis of social networks [J].SSRN Electronic Journal,2005,3(2/3):211-223.
[15]DING L,YILMAZ A.Learning Relations among Movie Characters:A Social Network Perspective[C]//Computer Vision - ECCV 2010,11th European Conference on Computer Vision.Hera-klion,Crete,Greece,2010:410-423.
[16]TRAN Q D,JUNG J E.CoCharNet:Extracting Social Networks using Character Co-occurrence in Movies [J].Journal of Universal Computer,2015,21(6):796-815.
[17]ZHANG K,ZHANG Z,LI Z,et al.Joint face detection andalignment using multitask cascaded convolutional networks [J].IEEE Signal Processing Letters,2016,23(10):1499-1503.
[18]BARTOLI F,LISANTI G,KARAMAN S,et al.Scene-depen-dent proposals for efficient person detection [J].Pattern Recognition,2019,87:170-178.
[19]WANG F,ZUO W,LIN L,et al.Joint learning of single-image and cross-image representations for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,USA,2016:1288-1296.
[20]WANG C.Research and implementation of Chinese microblog character relationship map based on deep learning [D].Wuhan:Wuhan Institute of Posts and Telecommunications,2018.
[21]KANG Y R,ZHAO L,FAN W,et al.Digital Profiling:Relationships Analysis Based on Time Information of WeChat [J].Journal of Criminal Technique,2018,43(3):187-192.
[22]QIN X,TAN X,CHEN S.Tri-Subject Kinship Verification:Understanding the Core of a Family[J].IEEE Trans on Multimedia,2015,17(10):1855-1867.
[23]HUANG Q,XIONG Y,LIN D.Unifying Identification and Context Learning for Person Recognition[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA,2018:2217-2225.
[24]ERTUGRUL I O,JENI L A.Modeling and synthesis of kinship patterns of facial expressions [J].Image Vision Computer,2018,79:133-143.
[25]LÓPEZ M B,HADID A,BOUTELLAA E,et al.Kinship verification from facial images and videos:human versus machine [J].Mach.Vis.Appl.,2018,29(5):873-890.
[26]YAN H.Learning discriminative compact binary face descriptor for kinship verification [J].Pattern Recognition Letters,2019,117:146-152.
[27]BIBI S,ANJUM N,SHER M.Automated multi-feature human interaction recognition in complex environment [J].Computers in Industry,2018,99:282-293.
[28]LV J,WU B,ZHOU L,et al.StoryRoleNet:Social Network Construction of Role Relationship in Video [J].IEEE Access,2018,6:25958-25969.
[29]WANG X,GUPTA A.Videos as Space-Time Region Graphs[C]//Proceedings of the Computer Vision - ECCV 2018 -15th European Conference.Munich,Germany,2018:413-431.
[30]JDENG J K,GUO J,ZAFEIRIOU S F.Arcface:Additive angular margin loss for deep face recognition [J].arXiv:1801.07698,2018.
[31]ZHANG X,LUO H,FAN X,et al.AlignedReID:SurpassingHuman-Level Performance in Person Re-Identification [J].ar-Xiv:1711.08184,2017.
[32]FLORIAN S,DMITRY K,JAMES P.FaceNet:A unified em-bedding for face recognition and clustering[C]//Proceedings of the CVPR 2015.Boston,Massachusetts,2015:815-823.
[33]YUAN K,YAO H.Mining actor correlations with hierarchical concurrence parsing[C]//IEEE ICASSP.2010:798-801.
[34]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision(ICCV).Santiago,Chile,2015:4489-4497.
[35]FINDLER N V.Short note on a heuristic search strategy in long-term memory networks [J].Information Processing Letters,1972,1(5):191-196.
[36]WANG L,XIONG Y,WANG Z,et al.Temporal segment networks:towards good practices for deep action recognition[C]//Proceedings of the ECCV 2016.Amsterdam,Netherlands,2016:20-36.
[37]LYU J N,WU B.Spatio-Temporal Attention Model Based on Multi-view for Social Relation Understanding [C]//International Conference on Multimedia Modeling.Springer,Cham,2019:390-401.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[9]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[10]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[11]	周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[12]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[13]	苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[14]	郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[15]	王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰. 基于多源迁移学习的大坝裂缝检测 Dam Crack Detection Based on Multi-source Transfer Learning 计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed