计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 212-218.doi: 10.11896/jsjkx.201100143

• 计算机图形学&多媒体 • 上一篇    下一篇

面向多标签小样本学习的双流重构网络

方仲礼, 王喆, 迟子秋   

  1. 华东理工大学信息科学与工程学院 上海200237
  • 收稿日期:2020-11-23 修回日期:2021-03-27 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 王喆(wangzhe@ecust.edu.cn)
  • 作者简介:434383537@163.com
  • 基金资助:
    上海市科技计划项目(20511100600);国家自然科学基金(62076094)

Dual-stream Reconstruction Network for Multi-label and Few-shot Learning

FANG Zhong-li, WANG Zhe, CHI Zi-qiu   

  1. School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Received:2020-11-23 Revised:2021-03-27 Online:2022-01-15 Published:2022-01-18
  • About author:FANG Zhong-li,born in 1996,postgra-duate,is a member of China Computer Federation.His main research interests include multi-label learning and deep learning.
    WANG Zhe,born in 1981,Ph.D,asso-ciate professor,is a member of China Computer Federation.His main research interests include pattern recognition and image processing.
  • Supported by:
    National Social Science Fund of China(15BGL048).

摘要: 多标签图像分类问题是计算机视觉领域的重要问题之一,它需要对图像中的所有标签进行预测。而一幅图像中待分类的标签个数往往不止一个,同时图像中对象的大小、位置和姿态的变化都会对模型的分类性能产生影响。因此,如何有效地提高图像特征的准确表达能力是一个亟需解决的难题。 针对上述难题,文中提出了一个新颖的双流重构网络来对图像进行特征抽取。具体而言,该模型首先应用一个双流注意力网络来对图像进行基于通道信息和空间信息的特征提取,并经过特征拼接使得图像特征同时兼顾通道特征细节信息和空间特征细节信息。其次,该模型引入了重构损失函数,对双流网络进行特征约束,迫使上述两种分歧特征具有相同的特征表达能力,以此促使提取的双流特征共同向真值特征迫近。在基于VOC 2007和MS COCO多标签图像数据集上的实验结果表明,所提出的双流重构网络能够准确有效地提取出显著特征,并产生更好的分类精度。同时,鉴于重建损失对模型的解拟合作用,将该方法应用在小样本场景上,实验结果显示,所提模型对小样本数据同样具有较好的分类精度。

关键词: 多标签图像识别, 深度学习, 特征重构, 图像注意力机制, 小样本学习

Abstract: The multi-label image classification problem is one of the most important problems in the field of computer vision,which needs to predict and output all the labels in an image.However,the number of labels to be classified in an image is often more than one,and the changeable size,posture,and position of objects in the image will increase the difficulty of classification.Therefore,how to effectively improve the accurate expression ability of image features is an urgent problem to be solved.In response to the above-mentioned problem,a novel dual-stream reconstruction network is proposed to extract features from images.Specifically,the model first proposes a dual-stream attention network to extract features based on channel information and spatial information,and uses feature stitching to make image features have both channel detail information and spatial detail information.Secondly,a reconstruction loss function is introduced to constrain the features of the dual-stream network,forcing the above two divergent features to have the same feature expression ability,thereby promoting the extracted dual-stream features to approach the ground-truth features.Experimental results on multi-label image datasets based on VOC 2007 and MS COCO show that the proposed dual-stream reconstruction network can accurately and effectively extract salient features and produce better classification accuracy.At the same time,in view of the sparse effect of reconstruction loss on model features,the proposed method is also applied to few-shot learning.The experimental results show thatthe proposed model also has good classification accuracy for few-shot learning.

Key words: Deep learning, Feature reconstruction, Few-shot learning, Image attention mechanism, Multi-label image recognition

中图分类号: 

  • TP183
[1]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[2]CHEN T R,LING J.Differential Privacy Protection MachineLearning Method Based on Features Mapping[J].Computer Science,2021,48(7):33-39.
[3]WANG Q,JIA N,BRECKON T P.A baseline for multi-label image classification using an ensemble deep CNN[J].IEEE International Conference on Image Processing(ICIP),2019.
[4]WEI Y,XIA W,LIN M,et al.HCP:A flexible CNN framework for multi-label image classification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,38(9):1901-1907.
[5]WANG J,YANG Y,MAO J,et al.Cnn-rnn:A unified frame-work for multi-label image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2285-2294.
[6]YANG T,CHAN A B.Learning dynamic memory networks for object tracking[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:152-167.
[7]YANG Z,HE X,GAO J,et al.Stacked attention networks for image question answering[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:21-29.
[8]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[9]WANG P,CHEN P,YUAN Y,et al.Understanding convolution for semantic segmentation[C]//2018 IEEE Winter Conference on Applications of Computer vision (WACV).IEEE,2018:1451-1460.
[10]ZHU F,LI H,OUYANG W,et al.Learning spatial regularization with image-level supervisions for multi-label image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5513-5522.
[11]WANG Z,CHEN T,LI G,et al.Multi-label image recognition by recurrently discovering attentional regions[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:464-472.
[12]EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al.The pascal visual object classes (voc) challenge[J].Internatio-nal Journal of Computer Vision,2010,88(2):303-338.
[13]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[14]WU B,CHEN W,FAN Y,et al.Tencent ml-images:A large-scale multi-label image database for visual representation lear-ning[J].IEEE Access,2019,7:172683-172693.
[15]GUO H,ZHENG K,FAN X,et al.Visual attention consistency under image transforms for multi-label image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:729-739.
[16]LUO Y,JIANG M,ZHAO Q.Visual attention in multi-labelimage classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2019.
[17]DEMBCZYHSKI K,WAEGEMAN W,CHENG W,et al.On label dependence and loss minimization in multi-label classification[J].Machine Learning,2012,88(1/2):5-45.
[18]NAM J,MENCÍA E L,KIM H J,et al.Maximizing subset accuracy with recurrent neural networks in multi-label classification[J].Advances in Neural Information Processing Systems,2017,30:5413-5423.
[19]WANG Y,WANG S, TANG J,et al.Ppp:Joint pointwise and pairwise image label prediction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:6005-6013.
[20]DECUBBER S,MORTIER T,DEMBCZYHSKI K,et al.Deepf-measure maximization in multi-label classification:A comparative study[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Springer,2018:290-305.
[21]WU X Z,ZHOU Z H.A unified view of multi-label performance measures[C]//International Conference on Machine Learning.PMLR,2017:3780-3788.
[22]LI Y,SONG Y,LUO J.Improving pairwise ranking for multi-label image classification[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:3617-3625.
[23]ELISSEEFF A,WESTON J.A kernel method for multi-labelled classification[C]//Advances in Neural Information Processing Systems.2002:681-687.
[24]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[25]HARZALLAH H,JURIE F,SCHMID C.Combining efficientobject localization and image classification[C]//2009 IEEE 12th International Conference on Computer Vision.IEEE,2009:237-244.
[26]DONG J,XIA W,CHEN Q,et al.Subcategory-aware object clas-sification[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2013:827-834.
[27]SONG Z,CHEN Q,HUANG Z,et al.Contextualizing object detection and classification[C]//CVPR 2011.IEEE,2011:1585-1592.
[28]LYU F,WU Q,HU F,et al.Attend and imagine:Multi-labelimage classification with visual attention and recurrent neural networks[J].IEEE Transactions on Multimedia,2019,21(8):1971-1981.
[29]ZHANG J,WU Q,SHEN C,et al.Multi-label image classification with regional latent semantic dependencies[J].IEEE Transa-ctions on Multimedia,2018,20(10):2801-2813.
[30]GONG Y,JIA Y,LEUNG T,et al.Deep convolutional ranking for multilabel image annotation[J].arXiv:1312.4894,2013.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[9] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[10] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[11] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[12] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[13] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[14] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15] 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩.
基于Transformer和LSTM的药物相互作用预测
Drug-Drug Interaction Prediction Based on Transformer and LSTM
计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!