计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 161-168.doi: 10.11896/jsjkx.200800209

• 计算机图形学与多媒体 • 上一篇    下一篇

基于语言描述的细粒度美妆图片排序

姚林丽, 陈师哲, 金琴   

  1. 中国人民大学信息学院 北京 100872
  • 收稿日期:2020-07-30 修回日期:2020-09-06 出版日期:2020-12-15 发布日期:2020-12-17
  • 通讯作者: 金琴(qjin@ruc.edu.cn)
  • 作者简介:linliyao@ruc.edu.cn
  • 基金资助:
    国家自然科学基金(61772535);北京市自然科学基金(4192028);国家重点研发计划(2016YFB1001202)

Fine-grained Facial Makeup Image Ordering via Language

YAO Lin-li, CHEN Shi-zhe, JIN Qin   

  1. School of Information Renmin University of China Beijing 100872,China
  • Received:2020-07-30 Revised:2020-09-06 Online:2020-12-15 Published:2020-12-17
  • About author:YAO Lin-li,born in 1998postgraduate.Her main research interests include image-text matching and visual semantic understanding.
    JIN Qin,born in 1972Ph.DprofessorPh.D supervisoris a member of China Computer Federation.Her main research interests include multimedia computing and human computer interaction.
  • Supported by:
    National Natural Science Foundation of China(61772535),Natural Science Foundation of Beijing(4192028) and National Key Research and Development Plan(2016YFB1001202).

摘要: 文中研究了化妆领域中基于文本的细粒度视觉推理问题具体探究了一个新颖的多模态任务即根据有序的化妆步骤描述对化妆过程中打乱顺序的人脸图片进行排序.针对这个新颖的任务通过数据的处理和分析提出了两个排序模型:第一个排序模型从单模态的角度出发只利用图片的信息进行排序;第二个模型从多模态的角度出发通过建立文本描述和图片之间的联系来指导图片排序.在You Makeup VQA Challenge数据集上进行了详实的实验以及分析实验结果表明所提出的两个模型在不同的图片对数据上具有互补性在美妆图片排序任务上具有良好的表现在测试集上的选择准确率分别达到了70%和58.93%.

关键词: 多模态, 美妆领域, 深度学习, 视觉推理, 图片排序, 细粒度

Abstract: This paper studies text-based fine-grained visual reasoning in makeup domain and explores a novel multi-modal taskwhich sorts a set of facial images from a makeup video into the correct order according to the given ordered step descriptions.On this novel taskthis paper first does data processing and analysis to learn the characteristic of the makeup datasetand then proposes two baseline models to solve the image ordering task.The first baseline model only uses image information and ignores the guiding role of the text description from a single-modal aspect.The second model utilizes the text semantics to guide image orderingestablishes the relationship between text description and images and can reason the visual appearance change brought by step description.This paper conducts extensive experiments on the YouMakeup VQA dataset.The experiments show that the two models are complementary to each otherand achieve good performance on the image ordering taskwith the selection accuracy on the test set of 70% and 58.93% respectively.

Key words: Deep learning, Fine-grained, Image ordering, Makeup domain, Multi-modal, Visual reasoning

中图分类号: 

  • TP37
[1] CHEN S,WANG W,RUAN L,et al.YouMakeup VQA Challenge:Towards Fine-grained Action Understanding in Domain-Specific Videos[J].arXiv:2004.05573.
[2] TONG W S,TANG C K,BROWN M S,et al.Example-basedcosmetic transfer[C]//15th Pacific Conference on Computer Graphics and Applications (PG'07).IEEE,2007:211-218.
[3] GU Q,WANG G,CHIU M T,et al.Ladn:Local adversarial disentangling network for facial makeup and de-makeup[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea (South):IEEE,2019:10480-10489.
[4] GUO D,SIM T.Digital face makeup by example[C]//IEEE Conference on Computer Vision and Pattern Recognition.Miami,FL:IEEE,2009:73-79.
[5] CHEN H J,HUI K M,WANG S Y,et al.Beautyglow:On-demand makeup transfer framework with reversible generative network[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach.CA,USA:IEEE,2019:10034-10042.
[6] LI Y,HUANG H,YU J,et al.Cosmetic-Aware Makeup Clean-ser[J].arXiv:2004.09147.
[7] WANG W,WANG Y,CHEN S,et al.YouMakeup:A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension[C]//Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.Hong Kong,China:ACL,2019:5136-5146.
[8] VO N,JIANG L,SUN C,et al.Composing text and image for image retrieval-an empirical odyssey[C]//IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition (CVPR).Long Beach,CA,USA:IEEE,2019:6432-6441.
[9] NAM H,HA J W,KIM J.Dual attention networks for multimodal reasoning and matching[C]//IEEE Conference on Compu-ter Vision and Pattern Recognition.Honolulu,HI:IEEE,2017:2156-2164.
[10] LEE K H,CHEN X,HUA G,et al.Stacked cross attention for image-text matching[C]//European Conference on Computer Vision.Springer,Cham,2018:201-216.
[11] LI K,ZHANG Y,LI K,et al.Visual semantic reasoning for ima-ge-text matching[C]//International Conference on Computer Vision.IEEE,2019:4654-4662.
[12] CHEN H,DING G,LIN Z,et al.Cross-modal image-text retrieval with semantic consistency[C]//Proceedings of the 27th ACM International Conference on Multimedia.Nice,France,ACM,New York,NY,USA,2019:1749-1757.
[13] WANG T,XU X,YANG Y,et al.Matching images and text with multi-modal tensor fusion and re-ranking[C]//In Procee-dings of the 27th ACM International Conference on Multimedia.ACM,2019:12-20.
[14] GUO X,WU H,CHENG Y,et al.Dialog-based interactive image retrieval[C]//Advances in Neural Information Processing Systems.MIT Press,2018:678-688.
[15] HOSSEINZADEH M,WANG Y.Composed Query Image Re-trieval Using Locally Bounded Features[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA:IEEE,2020:3596-3605.
[16] PARK D H,DARRELL T,ROHRBACH A.Robust changecaptioning[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea (South):IEEE,2019:4623-4632.
[17] TAN H,DERNONCOURT F,LIN Z,et al.Expressing visual relationships via language[J].arXiv:1906.07689.
[18] BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.Montreal,Quebec,Canada:ACM,2009:41-48.
[19] CHOPRA S,HADSELL R,LECUN Y.Learning a similaritymetric discriminatively,with application to face verification[C]//Conference on Computer Vision and Pattern Recognition (CVPR'05).IEEE,2005:539-546.
[20] HE K,ZHANG X,REN S,et al.Deep residual learning for ima-ge recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[21] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[22] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]//In Advances in Neural Information Processing Systems.MIT Press,2012:1097-1105.
[1] 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙.
基于自然语言的视频片段定位综述
Overview of Natural Language Video Localization
计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[2] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[3] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[5] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[6] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[7] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[8] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[9] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[10] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[11] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[12] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[13] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[14] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[15] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!