Computer Science ›› 2020, Vol. 47 ›› Issue (12): 161-168.doi: 10.11896/jsjkx.200800209

Previous Articles     Next Articles

Fine-grained Facial Makeup Image Ordering via Language

YAO Lin-li, CHEN Shi-zhe, JIN Qin   

  1. School of Information Renmin University of China Beijing 100872,China
  • Received:2020-07-30 Revised:2020-09-06 Online:2020-12-15 Published:2020-12-17
  • About author:YAO Lin-li,born in 1998postgraduate.Her main research interests include image-text matching and visual semantic understanding.
    JIN Qin,born in 1972Ph.DprofessorPh.D supervisoris a member of China Computer Federation.Her main research interests include multimedia computing and human computer interaction.
  • Supported by:
    National Natural Science Foundation of China(61772535),Natural Science Foundation of Beijing(4192028) and National Key Research and Development Plan(2016YFB1001202).

Abstract: This paper studies text-based fine-grained visual reasoning in makeup domain and explores a novel multi-modal taskwhich sorts a set of facial images from a makeup video into the correct order according to the given ordered step descriptions.On this novel taskthis paper first does data processing and analysis to learn the characteristic of the makeup datasetand then proposes two baseline models to solve the image ordering task.The first baseline model only uses image information and ignores the guiding role of the text description from a single-modal aspect.The second model utilizes the text semantics to guide image orderingestablishes the relationship between text description and images and can reason the visual appearance change brought by step description.This paper conducts extensive experiments on the YouMakeup VQA dataset.The experiments show that the two models are complementary to each otherand achieve good performance on the image ordering taskwith the selection accuracy on the test set of 70% and 58.93% respectively.

Key words: Deep learning, Fine-grained, Image ordering, Makeup domain, Multi-modal, Visual reasoning

CLC Number: 

  • TP37
[1] CHEN S,WANG W,RUAN L,et al.YouMakeup VQA Challenge:Towards Fine-grained Action Understanding in Domain-Specific Videos[J].arXiv:2004.05573.
[2] TONG W S,TANG C K,BROWN M S,et al.Example-basedcosmetic transfer[C]//15th Pacific Conference on Computer Graphics and Applications (PG'07).IEEE,2007:211-218.
[3] GU Q,WANG G,CHIU M T,et al.Ladn:Local adversarial disentangling network for facial makeup and de-makeup[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea (South):IEEE,2019:10480-10489.
[4] GUO D,SIM T.Digital face makeup by example[C]//IEEE Conference on Computer Vision and Pattern Recognition.Miami,FL:IEEE,2009:73-79.
[5] CHEN H J,HUI K M,WANG S Y,et al.Beautyglow:On-demand makeup transfer framework with reversible generative network[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach.CA,USA:IEEE,2019:10034-10042.
[6] LI Y,HUANG H,YU J,et al.Cosmetic-Aware Makeup Clean-ser[J].arXiv:2004.09147.
[7] WANG W,WANG Y,CHEN S,et al.YouMakeup:A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension[C]//Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.Hong Kong,China:ACL,2019:5136-5146.
[8] VO N,JIANG L,SUN C,et al.Composing text and image for image retrieval-an empirical odyssey[C]//IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition (CVPR).Long Beach,CA,USA:IEEE,2019:6432-6441.
[9] NAM H,HA J W,KIM J.Dual attention networks for multimodal reasoning and matching[C]//IEEE Conference on Compu-ter Vision and Pattern Recognition.Honolulu,HI:IEEE,2017:2156-2164.
[10] LEE K H,CHEN X,HUA G,et al.Stacked cross attention for image-text matching[C]//European Conference on Computer Vision.Springer,Cham,2018:201-216.
[11] LI K,ZHANG Y,LI K,et al.Visual semantic reasoning for ima-ge-text matching[C]//International Conference on Computer Vision.IEEE,2019:4654-4662.
[12] CHEN H,DING G,LIN Z,et al.Cross-modal image-text retrieval with semantic consistency[C]//Proceedings of the 27th ACM International Conference on Multimedia.Nice,France,ACM,New York,NY,USA,2019:1749-1757.
[13] WANG T,XU X,YANG Y,et al.Matching images and text with multi-modal tensor fusion and re-ranking[C]//In Procee-dings of the 27th ACM International Conference on Multimedia.ACM,2019:12-20.
[14] GUO X,WU H,CHENG Y,et al.Dialog-based interactive image retrieval[C]//Advances in Neural Information Processing Systems.MIT Press,2018:678-688.
[15] HOSSEINZADEH M,WANG Y.Composed Query Image Re-trieval Using Locally Bounded Features[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA:IEEE,2020:3596-3605.
[16] PARK D H,DARRELL T,ROHRBACH A.Robust changecaptioning[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea (South):IEEE,2019:4623-4632.
[17] TAN H,DERNONCOURT F,LIN Z,et al.Expressing visual relationships via language[J].arXiv:1906.07689.
[18] BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.Montreal,Quebec,Canada:ACM,2009:41-48.
[19] CHOPRA S,HADSELL R,LECUN Y.Learning a similaritymetric discriminatively,with application to face verification[C]//Conference on Computer Vision and Pattern Recognition (CVPR'05).IEEE,2005:539-546.
[20] HE K,ZHANG X,REN S,et al.Deep residual learning for ima-ge recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[21] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[22] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]//In Advances in Neural Information Processing Systems.MIT Press,2012:1097-1105.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[3] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[4] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[5] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[6] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[7] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[8] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[9] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[10] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[11] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[12] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[13] SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[14] WANG Jun-feng, LIU Fan, YANG Sai, LYU Tan-yue, CHEN Zhi-yu, XU Feng. Dam Crack Detection Based on Multi-source Transfer Learning [J]. Computer Science, 2022, 49(6A): 319-324.
[15] CHU Yu-chun, GONG Hang, Wang Xue-fang, LIU Pei-shun. Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 [J]. Computer Science, 2022, 49(6A): 337-344.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!