计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 161-168.doi: 10.11896/jsjkx.200800209
姚林丽, 陈师哲, 金琴
YAO Lin-li, CHEN Shi-zhe, JIN Qin
摘要: 文中研究了化妆领域中基于文本的细粒度视觉推理问题具体探究了一个新颖的多模态任务即根据有序的化妆步骤描述对化妆过程中打乱顺序的人脸图片进行排序.针对这个新颖的任务通过数据的处理和分析提出了两个排序模型:第一个排序模型从单模态的角度出发只利用图片的信息进行排序;第二个模型从多模态的角度出发通过建立文本描述和图片之间的联系来指导图片排序.在You Makeup VQA Challenge数据集上进行了详实的实验以及分析实验结果表明所提出的两个模型在不同的图片对数据上具有互补性在美妆图片排序任务上具有良好的表现在测试集上的选择准确率分别达到了70%和58.93%.
中图分类号:
[1] CHEN S,WANG W,RUAN L,et al.YouMakeup VQA Challenge:Towards Fine-grained Action Understanding in Domain-Specific Videos[J].arXiv:2004.05573. [2] TONG W S,TANG C K,BROWN M S,et al.Example-basedcosmetic transfer[C]//15th Pacific Conference on Computer Graphics and Applications (PG'07).IEEE,2007:211-218. [3] GU Q,WANG G,CHIU M T,et al.Ladn:Local adversarial disentangling network for facial makeup and de-makeup[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea (South):IEEE,2019:10480-10489. [4] GUO D,SIM T.Digital face makeup by example[C]//IEEE Conference on Computer Vision and Pattern Recognition.Miami,FL:IEEE,2009:73-79. [5] CHEN H J,HUI K M,WANG S Y,et al.Beautyglow:On-demand makeup transfer framework with reversible generative network[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach.CA,USA:IEEE,2019:10034-10042. [6] LI Y,HUANG H,YU J,et al.Cosmetic-Aware Makeup Clean-ser[J].arXiv:2004.09147. [7] WANG W,WANG Y,CHEN S,et al.YouMakeup:A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension[C]//Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.Hong Kong,China:ACL,2019:5136-5146. [8] VO N,JIANG L,SUN C,et al.Composing text and image for image retrieval-an empirical odyssey[C]//IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition (CVPR).Long Beach,CA,USA:IEEE,2019:6432-6441. [9] NAM H,HA J W,KIM J.Dual attention networks for multimodal reasoning and matching[C]//IEEE Conference on Compu-ter Vision and Pattern Recognition.Honolulu,HI:IEEE,2017:2156-2164. [10] LEE K H,CHEN X,HUA G,et al.Stacked cross attention for image-text matching[C]//European Conference on Computer Vision.Springer,Cham,2018:201-216. [11] LI K,ZHANG Y,LI K,et al.Visual semantic reasoning for ima-ge-text matching[C]//International Conference on Computer Vision.IEEE,2019:4654-4662. [12] CHEN H,DING G,LIN Z,et al.Cross-modal image-text retrieval with semantic consistency[C]//Proceedings of the 27th ACM International Conference on Multimedia.Nice,France,ACM,New York,NY,USA,2019:1749-1757. [13] WANG T,XU X,YANG Y,et al.Matching images and text with multi-modal tensor fusion and re-ranking[C]//In Procee-dings of the 27th ACM International Conference on Multimedia.ACM,2019:12-20. [14] GUO X,WU H,CHENG Y,et al.Dialog-based interactive image retrieval[C]//Advances in Neural Information Processing Systems.MIT Press,2018:678-688. [15] HOSSEINZADEH M,WANG Y.Composed Query Image Re-trieval Using Locally Bounded Features[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA:IEEE,2020:3596-3605. [16] PARK D H,DARRELL T,ROHRBACH A.Robust changecaptioning[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea (South):IEEE,2019:4623-4632. [17] TAN H,DERNONCOURT F,LIN Z,et al.Expressing visual relationships via language[J].arXiv:1906.07689. [18] BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.Montreal,Quebec,Canada:ACM,2009:41-48. [19] CHOPRA S,HADSELL R,LECUN Y.Learning a similaritymetric discriminatively,with application to face verification[C]//Conference on Computer Vision and Pattern Recognition (CVPR'05).IEEE,2005:539-546. [20] HE K,ZHANG X,REN S,et al.Deep residual learning for ima-ge recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778. [21] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [22] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]//In Advances in Neural Information Processing Systems.MIT Press,2012:1097-1105. |
[1] | 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130 |
[2] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[3] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[4] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[5] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[6] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[7] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[8] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[9] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[10] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[11] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[12] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[13] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[14] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[15] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
|