计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 212-218.doi: 10.11896/jsjkx.201100143

• 计算机图形学&多媒体 • 上一篇    下一篇

面向多标签小样本学习的双流重构网络

方仲礼, 王喆, 迟子秋   

  1. 华东理工大学信息科学与工程学院 上海200237
  • 收稿日期:2020-11-23 修回日期:2021-03-27 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 王喆(wangzhe@ecust.edu.cn)
  • 作者简介:434383537@163.com
  • 基金资助:
    上海市科技计划项目(20511100600);国家自然科学基金(62076094)

Dual-stream Reconstruction Network for Multi-label and Few-shot Learning

FANG Zhong-li, WANG Zhe, CHI Zi-qiu   

  1. School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Received:2020-11-23 Revised:2021-03-27 Online:2022-01-15 Published:2022-01-18
  • About author:FANG Zhong-li,born in 1996,postgra-duate,is a member of China Computer Federation.His main research interests include multi-label learning and deep learning.
    WANG Zhe,born in 1981,Ph.D,asso-ciate professor,is a member of China Computer Federation.His main research interests include pattern recognition and image processing.
  • Supported by:
    National Social Science Fund of China(15BGL048).

摘要: 多标签图像分类问题是计算机视觉领域的重要问题之一,它需要对图像中的所有标签进行预测。而一幅图像中待分类的标签个数往往不止一个,同时图像中对象的大小、位置和姿态的变化都会对模型的分类性能产生影响。因此,如何有效地提高图像特征的准确表达能力是一个亟需解决的难题。 针对上述难题,文中提出了一个新颖的双流重构网络来对图像进行特征抽取。具体而言,该模型首先应用一个双流注意力网络来对图像进行基于通道信息和空间信息的特征提取,并经过特征拼接使得图像特征同时兼顾通道特征细节信息和空间特征细节信息。其次,该模型引入了重构损失函数,对双流网络进行特征约束,迫使上述两种分歧特征具有相同的特征表达能力,以此促使提取的双流特征共同向真值特征迫近。在基于VOC 2007和MS COCO多标签图像数据集上的实验结果表明,所提出的双流重构网络能够准确有效地提取出显著特征,并产生更好的分类精度。同时,鉴于重建损失对模型的解拟合作用,将该方法应用在小样本场景上,实验结果显示,所提模型对小样本数据同样具有较好的分类精度。

关键词: 多标签图像识别, 特征重构, 深度学习, 小样本学习, 图像注意力机制

Abstract: The multi-label image classification problem is one of the most important problems in the field of computer vision,which needs to predict and output all the labels in an image.However,the number of labels to be classified in an image is often more than one,and the changeable size,posture,and position of objects in the image will increase the difficulty of classification.Therefore,how to effectively improve the accurate expression ability of image features is an urgent problem to be solved.In response to the above-mentioned problem,a novel dual-stream reconstruction network is proposed to extract features from images.Specifically,the model first proposes a dual-stream attention network to extract features based on channel information and spatial information,and uses feature stitching to make image features have both channel detail information and spatial detail information.Secondly,a reconstruction loss function is introduced to constrain the features of the dual-stream network,forcing the above two divergent features to have the same feature expression ability,thereby promoting the extracted dual-stream features to approach the ground-truth features.Experimental results on multi-label image datasets based on VOC 2007 and MS COCO show that the proposed dual-stream reconstruction network can accurately and effectively extract salient features and produce better classification accuracy.At the same time,in view of the sparse effect of reconstruction loss on model features,the proposed method is also applied to few-shot learning.The experimental results show thatthe proposed model also has good classification accuracy for few-shot learning.

Key words: Multi-label image recognition, Feature reconstruction, Deep learning, Few-shot learning, Image attention mechanism

中图分类号: 

  • TP183
[1]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[2]CHEN T R,LING J.Differential Privacy Protection MachineLearning Method Based on Features Mapping[J].Computer Science,2021,48(7):33-39.
[3]WANG Q,JIA N,BRECKON T P.A baseline for multi-label image classification using an ensemble deep CNN[J].IEEE International Conference on Image Processing(ICIP),2019.
[4]WEI Y,XIA W,LIN M,et al.HCP:A flexible CNN framework for multi-label image classification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,38(9):1901-1907.
[5]WANG J,YANG Y,MAO J,et al.Cnn-rnn:A unified frame-work for multi-label image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2285-2294.
[6]YANG T,CHAN A B.Learning dynamic memory networks for object tracking[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:152-167.
[7]YANG Z,HE X,GAO J,et al.Stacked attention networks for image question answering[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:21-29.
[8]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[9]WANG P,CHEN P,YUAN Y,et al.Understanding convolution for semantic segmentation[C]//2018 IEEE Winter Conference on Applications of Computer vision (WACV).IEEE,2018:1451-1460.
[10]ZHU F,LI H,OUYANG W,et al.Learning spatial regularization with image-level supervisions for multi-label image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5513-5522.
[11]WANG Z,CHEN T,LI G,et al.Multi-label image recognition by recurrently discovering attentional regions[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:464-472.
[12]EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al.The pascal visual object classes (voc) challenge[J].Internatio-nal Journal of Computer Vision,2010,88(2):303-338.
[13]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[14]WU B,CHEN W,FAN Y,et al.Tencent ml-images:A large-scale multi-label image database for visual representation lear-ning[J].IEEE Access,2019,7:172683-172693.
[15]GUO H,ZHENG K,FAN X,et al.Visual attention consistency under image transforms for multi-label image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:729-739.
[16]LUO Y,JIANG M,ZHAO Q.Visual attention in multi-labelimage classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2019.
[17]DEMBCZYHSKI K,WAEGEMAN W,CHENG W,et al.On label dependence and loss minimization in multi-label classification[J].Machine Learning,2012,88(1/2):5-45.
[18]NAM J,MENCÍA E L,KIM H J,et al.Maximizing subset accuracy with recurrent neural networks in multi-label classification[J].Advances in Neural Information Processing Systems,2017,30:5413-5423.
[19]WANG Y,WANG S, TANG J,et al.Ppp:Joint pointwise and pairwise image label prediction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:6005-6013.
[20]DECUBBER S,MORTIER T,DEMBCZYHSKI K,et al.Deepf-measure maximization in multi-label classification:A comparative study[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Springer,2018:290-305.
[21]WU X Z,ZHOU Z H.A unified view of multi-label performance measures[C]//International Conference on Machine Learning.PMLR,2017:3780-3788.
[22]LI Y,SONG Y,LUO J.Improving pairwise ranking for multi-label image classification[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:3617-3625.
[23]ELISSEEFF A,WESTON J.A kernel method for multi-labelled classification[C]//Advances in Neural Information Processing Systems.2002:681-687.
[24]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[25]HARZALLAH H,JURIE F,SCHMID C.Combining efficientobject localization and image classification[C]//2009 IEEE 12th International Conference on Computer Vision.IEEE,2009:237-244.
[26]DONG J,XIA W,CHEN Q,et al.Subcategory-aware object clas-sification[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2013:827-834.
[27]SONG Z,CHEN Q,HUANG Z,et al.Contextualizing object detection and classification[C]//CVPR 2011.IEEE,2011:1585-1592.
[28]LYU F,WU Q,HU F,et al.Attend and imagine:Multi-labelimage classification with visual attention and recurrent neural networks[J].IEEE Transactions on Multimedia,2019,21(8):1971-1981.
[29]ZHANG J,WU Q,SHEN C,et al.Multi-label image classification with regional latent semantic dependencies[J].IEEE Transa-ctions on Multimedia,2018,20(10):2801-2813.
[30]GONG Y,JIA Y,LEUNG T,et al.Deep convolutional ranking for multilabel image annotation[J].arXiv:1312.4894,2013.
[1] 蒋宗礼, 樊珂, 张津丽. 基于生成对抗网络和元路径的异质网络表示学习[J]. 计算机科学, 2022, 49(1): 133-139.
[2] 肖丁, 张玙璠, 纪厚业. 基于多头注意力机制的用户窃电行为检测[J]. 计算机科学, 2022, 49(1): 140-145.
[3] 祝一帆, 王海涛, 李可, 吴贺俊. 一种高精度路面裂缝检测网络结构:Crack U-Net[J]. 计算机科学, 2022, 49(1): 204-211.
[4] 刘昕, 袁家斌, 王天星. 基于场景先验知识的室内人体行为识别方法[J]. 计算机科学, 2022, 49(1): 225-232.
[5] 牛富生, 郭延哺, 李维华, 刘文洋. 基于序列特征融合的蛋白质可溶性预测[J]. 计算机科学, 2022, 49(1): 285-291.
[6] 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述[J]. 计算机科学, 2021, 48(9): 21-35.
[7] 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究[J]. 计算机科学, 2021, 48(9): 50-58.
[8] 钱梦薇, 过弋. 融合偏置深度学习的距离分解Top-N推荐算法[J]. 计算机科学, 2021, 48(9): 103-109.
[9] 徐涛, 田崇阳, 刘才华. 基于深度学习的人群异常行为检测综述[J]. 计算机科学, 2021, 48(9): 125-134.
[10] 张新峰, 宋博. 一种基于改进三元组损失和特征融合的行人重识别方法[J]. 计算机科学, 2021, 48(9): 146-152.
[11] 林椹尠, 张梦凯, 吴成茂, 郑兴宁. 利用生成对抗网络的人脸图像分步补全法[J]. 计算机科学, 2021, 48(9): 174-180.
[12] 黄晓生, 徐静. 基于PCANet的非下采样剪切波域多聚焦图像融合[J]. 计算机科学, 2021, 48(9): 181-186.
[13] 田野, 陈宏巍, 王法胜, 陈兴文. 室内移动机器人的SLAM算法综述[J]. 计算机科学, 2021, 48(9): 223-234.
[14] 谢良旭, 李峰, 谢建平, 许晓军. 基于融合神经网络模型的药物分子性质预测[J]. 计算机科学, 2021, 48(9): 251-256.
[15] 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述[J]. 计算机科学, 2021, 48(8): 13-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 韩朝, 苗夺谦, 任福继. 基于粗糙集理论的中文知识问答的知识谓词分析[J]. 计算机科学, 2018, 45(6): 183 -186 .
[2] 庄陵,尹耀虎. 认知异构网络中基于不完全频谱感知的资源分配算法[J]. 计算机科学, 2018, 45(5): 49 -53 .
[3] 黄金国, 刘涛, 周先春, 严锡君. 基于群组运动模式变化分析的群体骚乱行为检测[J]. 计算机科学, 2018, 45(9): 314 -319 .
[4] 曹峰,唐超,张婧. 一种结合二元蚁群和粗糙集的连续属性离散化算法[J]. 计算机科学, 2017, 44(9): 222 -226 .
[5] 司文杰,杨飞飞. 基于大规模训练神经网络的微小故障在线检测[J]. 计算机科学, 2017, 44(2): 239 -243 .
[6] 沈金伟,凌捷. 一种改进的超轻量级RFID所有权转移协议[J]. 计算机科学, 2014, 41(12): 125 -128 .
[7] 孙德才,王晓霞. 一种基于尾匹配q-gram的近似串匹配算法[J]. 计算机科学, 2014, 41(6): 243 -249 .
[8] 翟军昌,秦玉平,车伟伟. 垃圾邮件过滤中信息增益的改进研究[J]. 计算机科学, 2014, 41(6): 214 -216 .
[9] 武兴宇,孙磊,胡翠云,孙瑞辰. 基于改进粒子群优化算法的虚拟机迁移选择策略研究[J]. 计算机科学, 2015, 42(Z6): 20 -23 .
[10] 孟亚坤,孙景昶. AS的地理分布对Internet网络稳定性的影响[J]. 计算机科学, 2015, 42(2): 39 -42 .