Computer Science ›› 2020, Vol. 47 ›› Issue (1): 136-143.doi: 10.11896/jsjkx.181202316

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Few-shot Food Recognition Combining Triplet Convolutional Neural Network with Relation Network

LV Yong-qiang1,2,MIN Wei-qing2,DUAN Hua1,JIANG Shu-qiang2   

  1. (College of Mathematics and System Science,Shandong University of Science and Technology,Qingdao,Shandong 266590,China)1;
    (Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2018-12-14 Published:2020-01-19
  • About author:LV Yong-qiang,born in 1992,postgraduate.His main research interests include deep learning,computer vision and machine learning;DUAN Hua,born in 1976,Ph.D,professor.Her main research interests include Petri nets,process mining and machine learning.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61472229,61602279,71704096,31671588),Sci. & Tech. Deve-lopment Fund of Shandong Province of China (2016ZDJS02A11,ZR2017BF015,ZR2017MF027),Humanities and Social Science Research Project of the Ministry of Education (16YJCZH154,16YJCZH041 16YJCZH012,18YJAZH017),Taishan Scholar Climbing Program of Shandong Pro-vince,and SDUST Research Fund (2015TDJH102).

Abstract: Food recognition attracts wide attention in the fields of food health and smart home.Most existing work focuses on food recognition with large-scale labeled samples,thus failing to robustly recognize food categories with few samples,under this condition,few-shot food recognition is an urgent problem.Most metric learning based few-shot recognition methods emphasize more on the similarity values of the image pairs without paying substantial attention to the inter-class and intra-class variations.Most works mainly use triplet convolutional neural network with linear metric function to learn the inter-class and intra-class information,however the liner metric function is not discriminative enough for measuring similarities of food images.To address this problem,this paper used the learnable relation network as non-linear metric and proposed a triplet network with relation network to solve the above two disadvantages of the few-shot learning and triplet network.This model adopts triplet network as feature embedding network for the image feature learning and uses a relation network with better discrimination as the non-linearity metric to learn the inter-class and intra-class information.Also the proposed model is trained end-to-end.In addition,this paper proposed an on-line mining rule for triplet samples,which makes the model stable in the training stage.The comprehensive experi-mental was conducted on three food datasets,which are Food-101,VIREO Food-172 and ChineseFoodNet.Compared with popular few-shot learning methods,such as Relation network,Matching network,the proposed model achieves an average improvement of about 3.0%,and compared with triplet network with liner metric,it achieves an average improvement of about 1.0%.Also this paper explored the influence of the margin in the loss function,parameters setting of online triplet sampling and initialization methods on experiment performance.

Key words: Few-shot learning, Food recognition, Non-linear metric, Triplet network

CLC Number: 

  • TP391
[1]BOSSARD L,GUILLAUMIN M,VANGOOL L.Food-101-mining discriminative components with random forests[C]∥European Conference on Computer Vision.2014:446-461.
[2]AO S,LING C X.Adapting new categories for food recognition with deep representation[C]∥IEEE International Conference on Data Mining Workshop.2015:1196-1203.
[3]HERRANZ L,JIANG S,XU R.Modeling restaurant context for food recognition[J].IEEE Transactions on Multimedia,2017,19(2):430-440.
[4]AIZAWA K,MARUYAMA Y,LI H,et al.Food balance estimation by using personal dietary tendencies in a multimedia foodlog[J].IEEE Transactions on Multimedia,2013,15(8):2176-2185.
[5]ZHENG J,WANG Z J,ZHU C.Food image recognition via superpixel based low-level and mid-level distance coding for smart home applications[J].Sustainability,2017,9(5):856.
[6]BOLANOS M,FERRA A,RADEVA P.Food ingredients recognition through multi-label learning[C]∥International Confe-rence on Image Analysis and Processing.2017:394-402.
[7]ZHANG N,DONAHUE J,GIRSHICK R,et al.Part-based r-cnns for fine-grained category detection[C]∥European Conference on Computer Vision.2014:834-849.
[8]CHRISTODOULIDIS S,ANTHIMOPOULOS M,MOUGIA- KAKOU S.Food recognition for dietary assessment using deep convolutional neural networks[C]∥International Conference on Image Analysis and Processing.2015:58-465.
[9]MARTINEL,NIKI,FORESTI G,et al.Wide-Slice Residual Networks for Food Recognition[C]∥IEEE Winter Conference on Applications of Computer Vision IEEE Computer Society.2018:567-576.
[10]KOCH G,ZEMEL R,SALAKHUTDINOV R.Siamese neural networks for one-shot image recognition[C]∥International Conference on Machine Learning.2015.
[11]VINYALS O,BLUNDELL C,LILLICRAP T,et al.Matching networks for one shot learning[C]∥Advances in Neural Information Processing Systems.2016:3630-3638.
[12]SUNG F,YANG Y,ZHANG L,et al.Learning to compare:Relation network for few-shot learning[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2017.
[13]FINN C,ABBEEL P,LEVINE S.Model-agnostic meta-learning for fast adaptation of deep networks[M].arXiv:1703.03400,2017.
[14]ANDRYCHOWIEZ M,DENIL M,GOMEZ S,et al.Learning to learn by gradient descent by gradient descent[C]∥Advances in Neural Information Processing Systems.2016:3981-3989.
[15]CEALLE S,MANINIS K,PONTTUEST J,et al.One-Shot Video Object Segmentation[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017.
[16]HOFFE E,AILON N.Deep metric learning using triplet net- work[M].In International Workshop on Similarity-Based Pattern Recognition,2015.
[17]HERRMANS A,BEYER L,LEIBE B.In defense of the triplet loss for person re-identification[J].arXiv:1703.07737,2017.
[18]GENG M,WANG Y,XIANG T,et al.Deep transfer learning for person re-identification[J].arXiv:1611.05244,2016.
[19]LI Y,LI Y,YAN H.Deep joint discriminative learning for vehicle re-identification and retrieval[C]∥IEEE International Conference on Image Processing.IEEE,2017:395-399.
[20]CHEN J,NGO C W.Deep-based ingredient recognition for cooking recipe retrieval[C]∥Proceedings of the ACM International Conference on Multimedia.2016:32-41.
[21]CHEN X,ZHOU H,ZHU Y,et al.Chinesefoodnet:A largescale image dataset for chinese food recognition[J].arXiv:1705.02743,2017.
[22]MIN W Q,JIANG S Q,LIU L H,et al.A Survey on food computing[J/OL].
[23]KAWANO Y,YANAI K.Food image recognition with deep convolutional features[C]∥Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing:Adjunct Publication.2014:589-593.
[24]KAGAYA H,AIZAWA K,OGAWA M.Food detection and recognition using convolutional neural network[C]∥Procee-dings of the ACM International Conference on Multimedia.2014:1085-1088.
[25]XU R,HERRANZ L,JIANG S Q.Geolocalized Modeling for Dish Recognition[J].IEEE Transactions on Multimedia,2015,17(8):1187-1199.
[26]MIN W Q,JIANG S Q,SANG J T,et al.Being a super cook:Joint food attributes and multimodal content modeling for recipe retrieval and exploration[J].IEEE Transactions on Multimedia,2017(5):1100-1113.
[27]MIN W Q,BAO B K,MEI S H,et al.You are what you eat:Exploring rich recipe information for cross-region food analysis[J].IEEE Transactions on Multimedia,2017,20(4):950-964.
[28]WANG H,MIN W,LI X,et al.Where and what to eat:Simultaneous restaurant and dish recognition from food image[C]∥Pacific Rim Conference on Multimedia.2016:520-528.
[29]MEI S H,MIN W Q,LIU L H.Faster R-CNN based food image retrieval and classification [J].Journal of Nanjing University of Information Science & Technology (Natural Science Edition),2017,9(6):635-641.
[30]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[31]KINGMA D,BA J.Adam:A method for stochastic optimization[C]∥arXiv:1412.6980.2014.
[32]MENG Y,GUO Y.Deep Triplet Ranking Networks for One- Shot Recognition[J].arXiv:1804.07275,2018.
[1] PENG Yun-cong, QIN Xiao-lin, ZHANG Li-ge, GU Yong-xiang. Survey on Few-shot Learning Algorithms for Image Classification [J]. Computer Science, 2022, 49(5): 1-9.
[2] FANG Zhong-li, WANG Zhe, CHI Zi-qiu. Dual-stream Reconstruction Network for Multi-label and Few-shot Learning [J]. Computer Science, 2022, 49(1): 212-218.
[3] WANG Hang, CHEN Xiao, TIAN Sheng-zhao, CHEN Duan-bing. SAR Image Recognition Based on Few-shot Learning [J]. Computer Science, 2020, 47(5): 124-128.
Full text



No Suggested Reading articles found!