计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 136-143.doi: 10.11896/jsjkx.181202316

• 计算机图形学&多媒体 • 上一篇    下一篇

融合三元卷积神经网络与关系网络的小样本食品图像识别

吕永强1,2,闵巍庆2,段华1,蒋树强2   

  1. (山东科技大学数学与系统科学学院 山东 青岛266590)1;
    (中国科学院计算技术研究所 北京100190)2
  • 收稿日期:2018-12-14 发布日期:2020-01-19
  • 通讯作者: 段华(huaduan59@163.com)
  • 基金资助:
    国家自然科学基金(61532018,61602437);教育部人文社会科学研究项目(18YJAZH017);山东省自然科学基金(ZR2017MF027);山东科技大学领军人才与优秀科研团队计划资助项目(2015TDJH102)

Few-shot Food Recognition Combining Triplet Convolutional Neural Network with Relation Network

LV Yong-qiang1,2,MIN Wei-qing2,DUAN Hua1,JIANG Shu-qiang2   

  1. (College of Mathematics and System Science,Shandong University of Science and Technology,Qingdao,Shandong 266590,China)1;
    (Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2018-12-14 Published:2020-01-19
  • About author:LV Yong-qiang,born in 1992,postgraduate.His main research interests include deep learning,computer vision and machine learning;DUAN Hua,born in 1976,Ph.D,professor.Her main research interests include Petri nets,process mining and machine learning.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61472229,61602279,71704096,31671588),Sci. & Tech. Deve-lopment Fund of Shandong Province of China (2016ZDJS02A11,ZR2017BF015,ZR2017MF027),Humanities and Social Science Research Project of the Ministry of Education (16YJCZH154,16YJCZH041 16YJCZH012,18YJAZH017),Taishan Scholar Climbing Program of Shandong Pro-vince,and SDUST Research Fund (2015TDJH102).

摘要: 食品识别在食品健康和智能家居等领域获得了广泛关注。目前大部分的食品识别工作是基于大规模标记样本的深度神经网络,这些工作无法有效地识别只有少量样本的类别,因此小样本食品识别是一个亟待解决的问题。目前基于度量学习的小样本识别方法着重于探究样本之间的相似度信息,忽略了类内与类间更加细粒度的区分。学习类内与类间区分信息的主流方法是基于线性度量函数的三元卷积神经网络,然而对于食品图像而言,线性度量函数的鉴别能力不足。为此,引入可学习的关系网络作为三元卷积神经网络的非线性度量函数,进一步提出了一种基于非线性度量的三元神经网络用于小样本食品识别方法。该方法使用三元神经网络学习图像的特征嵌入表示,然后采用鉴别能力更强的关系网络作为非线性度量函数,基于端到端的训练方式来学习类内与类间更加细粒度的区分信息。此外,提出了一种可以使模型训练更加稳定的三元组样本在线采样方案。通过在Food-101,VIREO Food-172和ChineseFoodNet食品数据集上的实验结果可知,相比基于孪生网络的小样本学习方法,所提方法的性能平均提高了3.0%,相比基于线性度量函数的三元神经网络的方法,所提方法的性能平均提升了1.0%。文中还探究了损失函数的阈值、三元组采样的参数和初始化方式对实验性能的影响。

关键词: 非线性度量, 三元神经网络, 食品识别, 小样本识别

Abstract: Food recognition attracts wide attention in the fields of food health and smart home.Most existing work focuses on food recognition with large-scale labeled samples,thus failing to robustly recognize food categories with few samples,under this condition,few-shot food recognition is an urgent problem.Most metric learning based few-shot recognition methods emphasize more on the similarity values of the image pairs without paying substantial attention to the inter-class and intra-class variations.Most works mainly use triplet convolutional neural network with linear metric function to learn the inter-class and intra-class information,however the liner metric function is not discriminative enough for measuring similarities of food images.To address this problem,this paper used the learnable relation network as non-linear metric and proposed a triplet network with relation network to solve the above two disadvantages of the few-shot learning and triplet network.This model adopts triplet network as feature embedding network for the image feature learning and uses a relation network with better discrimination as the non-linearity metric to learn the inter-class and intra-class information.Also the proposed model is trained end-to-end.In addition,this paper proposed an on-line mining rule for triplet samples,which makes the model stable in the training stage.The comprehensive experi-mental was conducted on three food datasets,which are Food-101,VIREO Food-172 and ChineseFoodNet.Compared with popular few-shot learning methods,such as Relation network,Matching network,the proposed model achieves an average improvement of about 3.0%,and compared with triplet network with liner metric,it achieves an average improvement of about 1.0%.Also this paper explored the influence of the margin in the loss function,parameters setting of online triplet sampling and initialization methods on experiment performance.

Key words: Few-shot learning, Food recognition, Non-linear metric, Triplet network

中图分类号: 

  • TP391
[1]BOSSARD L,GUILLAUMIN M,VANGOOL L.Food-101-mining discriminative components with random forests[C]∥European Conference on Computer Vision.2014:446-461.
[2]AO S,LING C X.Adapting new categories for food recognition with deep representation[C]∥IEEE International Conference on Data Mining Workshop.2015:1196-1203.
[3]HERRANZ L,JIANG S,XU R.Modeling restaurant context for food recognition[J].IEEE Transactions on Multimedia,2017,19(2):430-440.
[4]AIZAWA K,MARUYAMA Y,LI H,et al.Food balance estimation by using personal dietary tendencies in a multimedia foodlog[J].IEEE Transactions on Multimedia,2013,15(8):2176-2185.
[5]ZHENG J,WANG Z J,ZHU C.Food image recognition via superpixel based low-level and mid-level distance coding for smart home applications[J].Sustainability,2017,9(5):856.
[6]BOLANOS M,FERRA A,RADEVA P.Food ingredients recognition through multi-label learning[C]∥International Confe-rence on Image Analysis and Processing.2017:394-402.
[7]ZHANG N,DONAHUE J,GIRSHICK R,et al.Part-based r-cnns for fine-grained category detection[C]∥European Conference on Computer Vision.2014:834-849.
[8]CHRISTODOULIDIS S,ANTHIMOPOULOS M,MOUGIA- KAKOU S.Food recognition for dietary assessment using deep convolutional neural networks[C]∥International Conference on Image Analysis and Processing.2015:58-465.
[9]MARTINEL,NIKI,FORESTI G,et al.Wide-Slice Residual Networks for Food Recognition[C]∥IEEE Winter Conference on Applications of Computer Vision IEEE Computer Society.2018:567-576.
[10]KOCH G,ZEMEL R,SALAKHUTDINOV R.Siamese neural networks for one-shot image recognition[C]∥International Conference on Machine Learning.2015.
[11]VINYALS O,BLUNDELL C,LILLICRAP T,et al.Matching networks for one shot learning[C]∥Advances in Neural Information Processing Systems.2016:3630-3638.
[12]SUNG F,YANG Y,ZHANG L,et al.Learning to compare:Relation network for few-shot learning[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2017.
[13]FINN C,ABBEEL P,LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[M].arXiv:1703.03400,2017.
[14]ANDRYCHOWIEZ M,DENIL M,GOMEZ S,et al.Learning to learn by gradient descent by gradient descent[C]∥Advances in Neural Information Processing Systems.2016:3981-3989.
[15]CEALLE S,MANINIS K,PONTTUEST J,et al.One-Shot Video Object Segmentation[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017.
[16]HOFFE E,AILON N.Deep metric learning using triplet net- work[M].In International Workshop on Similarity-Based Pattern Recognition,2015.
[17]HERRMANS A,BEYER L,LEIBE B.In defense of the triplet loss for person re-identification[J].arXiv:1703.07737,2017.
[18]GENG M,WANG Y,XIANG T,et al.Deep transfer learning for person re-identification[J].arXiv:1611.05244,2016.
[19]LI Y,LI Y,YAN H.Deep joint discriminative learning for vehicle re-identification and retrieval[C]∥IEEE International Conference on Image Processing.IEEE,2017:395-399.
[20]CHEN J,NGO C W.Deep-based ingredient recognition for cooking recipe retrieval[C]∥Proceedings of the ACM International Conference on Multimedia.2016:32-41.
[21]CHEN X,ZHOU H,ZHU Y,et al.Chinesefoodnet:A largescale image dataset for chinese food recognition[J].arXiv:1705.02743,2017.
[22]MIN W Q,JIANG S Q,LIU L H,et al.A Survey on food computing[J/OL].https://arxiv.org/abs/1808.07202?context=cs.mm
[23]KAWANO Y,YANAI K.Food image recognition with deep convolutional features[C]∥Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing:Adjunct Publication.2014:589-593.
[24]KAGAYA H,AIZAWA K,OGAWA M.Food detection and recognition using convolutional neural network[C]∥Procee-dings of the ACM International Conference on Multimedia.2014:1085-1088.
[25]XU R,HERRANZ L,JIANG S Q.Geolocalized Modeling for Dish Recognition[J].IEEE Transactions on Multimedia,2015,17(8):1187-1199.
[26]MIN W Q,JIANG S Q,SANG J T,et al.Being a super cook:Joint food attributes and multimodal content modeling for recipe retrieval and exploration[J].IEEE Transactions on Multimedia,2017(5):1100-1113.
[27]MIN W Q,BAO B K,MEI S H,et al.You are what you eat:Exploring rich recipe information for cross-region food analysis[J].IEEE Transactions on Multimedia,2017,20(4):950-964.
[28]WANG H,MIN W,LI X,et al.Where and what to eat:Simultaneous restaurant and dish recognition from food image[C]∥Pacific Rim Conference on Multimedia.2016:520-528.
[29]MEI S H,MIN W Q,LIU L H.Faster R-CNN based food image retrieval and classification [J].Journal of Nanjing University of Information Science & Technology (Natural Science Edition),2017,9(6):635-641.
[30]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[31]KINGMA D,BA J.Adam:A method for stochastic optimization[C]∥arXiv:1412.6980.2014.
[32]MENG Y,GUO Y.Deep Triplet Ranking Networks for One- Shot Recognition[J].arXiv:1804.07275,2018.
[1] 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙.
数据流概念漂移处理方法研究综述
Survey of Concept Drift Handling Methods in Data Streams
计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[2] 王明, 武文芳, 王大玲, 冯时, 张一飞.
生成链接树:一种高数据真实性的反事实解释生成方法
Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity
计算机科学, 2022, 49(9): 33-40. https://doi.org/10.11896/jsjkx.220300158
[3] 张佳, 董守斌.
基于评论方面级用户偏好迁移的跨领域推荐算法
Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer
计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[4] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[5] 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲.
基于无监督集群级的科技论文异质图节点表示学习方法
Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level
计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[6] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[7] 郑文萍, 刘美麟, 杨贵.
一种基于节点稳定性和邻域相似性的社区发现算法
Community Detection Algorithm Based on Node Stability and Neighbor Similarity
计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146
[8] 吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇.
基于异质信息网的短文本特征扩充方法
Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network
计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241
[9] 徐天慧, 郭强, 张彩明.
基于全变分比分隔距离的时序数据异常检测
Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance
计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174
[10] 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙.
基于自然语言的视频片段定位综述
Overview of Natural Language Video Localization
计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[11] 曹晓雯, 梁美玉, 鲁康康.
基于细粒度语义推理的跨媒体双路对抗哈希学习模型
Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model
计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011
[12] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[13] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[14] 曲倩文, 车啸平, 曲晨鑫, 李瑾如.
基于信息感知的虚拟现实用户临场感研究
Study on Information Perception Based User Presence in Virtual Reality
计算机科学, 2022, 49(9): 146-154. https://doi.org/10.11896/jsjkx.220500200
[15] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!