计算机科学 ›› 2020, Vol. 47 ›› Issue (4): 125-130.doi: 10.11896/jsjkx.190700163

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多任务学习的有限样本多视角三维形状识别算法

周子钦, 严华   

  1. 四川大学电子信息学院 成都610065
  • 收稿日期:2019-07-23 出版日期:2020-04-15 发布日期:2020-04-15
  • 通讯作者: 严华(yanhua@scu.edu.cn)
  • 基金资助:
    四川省重点研发项目 (2019YFG0409)

3D Shape Recognition Based on Multi-task Learning with Limited Multi-view Data

ZHOU Zi-qin, YAN Hua   

  1. College of Electronics and Information Engineering,Sichuan University,Chengdu 610065,China
  • Received:2019-07-23 Online:2020-04-15 Published:2020-04-15
  • Contact: YAN Hua,born in 1971,professor.His main research interests include pattern recognition and intelligent system.
  • About author:ZHOU Zi-qin,born in 1996,postgradua-te.Her main research interests include machine learning,deep learning,computer vision,3D shape analysis,active learning and neural architecture search.
  • Supported by:
    This work was supported by the Key Research and Development Program of Sichuan Province (2019YFG0409)

摘要: 随着三维扫描技术的快速发展,三维形状分析得到了学术界的广泛关注;尤其是深度学习在计算机视觉上取得的显著成功,使得基于多视图的三维形状识别方法成为了目前三维模型识别的主流方式。已有研究表明,三维数据集的数量对于最终的分类精度是一个非常重要的影响条件。然而,由于专业三维扫描设备的限制,三维形状数据难以采集。实际上,现有的公共基准三维数据集的规模远远小于二维数据集,三维形状分析的发展因此受到阻碍。为了解决这一问题,文中主要研究在极小数据样本情况下,三维形状识别问题的优化解策略。受多任务学习的启发,搭建了多分支的网络结构,并引入基于度量学习的辅助比较模块,用于挖掘类内和类间的相似性和差异性信息。网络模型包括主支路与辅助支路,分别使用不同的损失函数对应不同的训练任务,并使用权值超参数平衡多项损失。主支路获得预测分类,使用交叉熵损失函数进行更新;辅助支路得到不同样本间的相似性得分,使用均方差损失函数进行更新。为保证特征向量被投影到同一个空间中,主、辅助支路共享相同的特征提取模块,在训练阶段共同更新参数,在测试阶段仅使用主支路获得的分类结果。在两个公开的三维形状基准数据集上的大量实验结果表明,所提网络结构与训练策略相比传统方法,在少样本的情况下可以显著提高特征模块对不同类别的区分能力,获得更优的识别结果。

关键词: 多任务学习, 多视图三维形状, 辅助支路, 三维形状识别, 有限样本

Abstract: With the rapid development of 3D scanning technology,3D shape analysis has been widely concerned by researchers.Especially with the significant success of deep learning in computer vision,the approaches of 3D shape recognition based on multi-view have become the dominant methods.In the previous work,we notice that the amount of 3D shapes is essential for the recognition accuracy.However,due to the limitation of professional 3D scanning equipment,the 3D shape data is hard to collect.In fact,the scale of existing benchmark datasets is far smaller than that of 2D datasets which impedes the development of 3D shape analysis.In order to solve this problem,we mainly develop an optimal strategy of 3D shape recognition with limited data.Inspired by multi-task learning,we develop a novel network with multiple branches and construct an auxiliary comparison module based on metric learning to exploit the similarity and discrepancy between different samples intra-class and inter-class.The proposed network mainly includes a primary branch and an auxiliary branch,which respectively use disparate loss functions with different training tasks and hyper-parameter to balance different loss items.The primary branch aims to obtain the prediction of classification and uses Cross Entropy Loss function to train it.While the similarity scores of different samples are calculated by the auxiliary module,and the Mean Square Error is used to update this branch.Both two branches share the same feature extractor to project all samples into the same representation space and train the structure jointly in training phase,while the primary branch would be used in testing phrase to calculate the accuracy.Extensive experimental results have reported on two public 3D shape benchmark datasets which demonstrate the effectiveness of our proposed architecture to enhance the discriminative power and achieve better performance compared with traditional methods,especially in the situation where merely has limited multi-view data.

Key words: 3D shape recognition, Auxiliary branch, Limit data, Multi-task learning, Multi-view 3D shape

中图分类号: 

  • TP391.413
[1]WU Z,SONG S,KHOSLA A,et al.3D shapenets:A deep representation for volumetric shapes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1912-1920.
[2]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.2009:248-255.
[3]YU T,YAN J,WANG Y,et al.Generalizing graph matching beyond quadratic assignment model[C]//Advances in Neural Information Processing Systems.2018:853-863.
[4]YANG Y,FENG C,SHEN Y,et al.Foldingnet:Point cloud auto-encoder via deep grid deformation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:206-215.
[5]SHEN Y,FENG C,YANG Y,et al.Mining point cloud localstructures by kernel correlation and graph pooling[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4548-4557.
[6]QI C R,SU H,NIEβNER M,et al.Volumetric and multi-view cnns for object classification on 3D data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:5648-5656.
[7]JOHNS E,LEUTENEGGER S,DAVISON A J.Pairwise decomposition of image sequences for active multi-view recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3813-3822.
[8]HAN Z,SHANG M,LIU Y S,et al.View inter-prediction gan:Unsupervised representation learning for 3D shapes by learning global shape memories to support local view predictions[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8376-8384.
[9]SU H,MAJI S,KALOGERAKIS E,et al.Multi-view convolutional neural networks for 3D shape recognition[C]//Procee-dings of the IEEE International Conference on Computer Vision.2015:945-953.
[10]SCHROFF F,KALENICHENKO D,PHILBIN J.Facenet:Aunified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:815-823.
[11]WANG C,ZHANG X,LAN X.How to train triplet networks with 100k identities?[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1907-1915.
[12]HE X,ZHOU Y,ZHOU Z,et al.Triplet-center loss for multi-view 3D object retrieval[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2018:1945-1954.
[13]LONG M,WANG J.Learning multiple tasks with deep relationship networks[J].arXiv:1506.02117,2015.
[14]BINGEL J,SØGAARD A.Identifying beneficial task relationsfor multi-task learning in deep neural networks[J].arXiv:1702.08303,2017.
[15]LU Y,KUMAR A,ZHAI S,et al.Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5334-5343.
[16]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.2012:1097-1105.
[17]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[18]HE K,ZHANG X,REN S,et al.Delving deep into rectifiers:Surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE international conference on computer vision.2015:1026-1034.
[1] 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建.
基于注意力机制和多任务学习的阿尔茨海默症分类
Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning
计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072
[2] 赵凯, 安卫超, 张晓宇, 王彬, 张杉, 相洁.
共享浅层参数多任务学习的脑出血图像分割与分类
Intracerebral Hemorrhage Image Segmentation and Classification Based on Multi-taskLearning of Shared Shallow Parameters
计算机科学, 2022, 49(4): 203-208. https://doi.org/10.11896/jsjkx.201000153
[3] 杨晓宇, 殷康宁, 候少麒, 杜文仪, 殷光强.
基于特征定位与融合的行人重识别算法
Person Re-identification Based on Feature Location and Fusion
计算机科学, 2022, 49(3): 170-178. https://doi.org/10.11896/jsjkx.210100132
[4] 宋龙泽, 万怀宇, 郭晟楠, 林友芳.
面向出租车空载时间预测的多任务时空图卷积网络
Multi-task Spatial-Temporal Graph Convolutional Network for Taxi Idle Time Prediction
计算机科学, 2021, 48(7): 112-117. https://doi.org/10.11896/jsjkx.201000089
[5] 郭文, 尹童灵, 张天柱, 徐常胜.
时间一致性保持的多任务稀疏深度表达视觉跟踪
Temporal Consistency Preserving Multi-Mask Sparse Deep Representation for Visual Tracking
计算机科学, 2021, 48(6): 110-117. https://doi.org/10.11896/jsjkx.200800212
[6] 刘小龙, 韩芳, 王直杰.
基于知识表示的联合问答模型
Joint Question Answering Model Based on Knowledge Representation
计算机科学, 2021, 48(6): 241-245. https://doi.org/10.11896/jsjkx.200600011
[7] 周晓进, 徐陈铭, 阮彤.
面向中文电子病历的多粒度医疗实体识别
Multi-granularity Medical Entity Recognition for Chinese Electronic Medical Records
计算机科学, 2021, 48(4): 237-242. https://doi.org/10.11896/jsjkx.200100036
[8] 张春云, 曲浩, 崔超然, 孙皓亮, 尹义龙.
基于过程监督的序列多任务法律判决预测方法
Process Supervision Based Sequence Multi-task Method for Legal Judgement Prediction
计算机科学, 2021, 48(3): 227-232. https://doi.org/10.11896/jsjkx.200700056
[9] 王体爽, 李培峰, 朱巧明.
基于数据增强的中文隐式篇章关系识别方法
Chinese Implicit Discourse Relation Recognition Based on Data Augmentation
计算机科学, 2021, 48(10): 85-90. https://doi.org/10.11896/jsjkx.200800115
[10] 潘祖江, 刘宁, 张伟, 王建勇.
基于层次注意力机制的多任务疾病进展模型
MTHAM:Multitask Disease Progression Modeling Based on Hierarchical Attention Mechanism
计算机科学, 2020, 47(9): 185-189. https://doi.org/10.11896/jsjkx.190900001
[11] 耿蕾蕾, 崔超然, 石成, 申朕, 尹义龙, 冯仕红.
基于深度多任务学习的社交图像标签和分组联合推荐
Social Image Tag and Group Joint Recommendation Based on Deep Multi-task Learning
计算机科学, 2020, 47(12): 177-182. https://doi.org/10.11896/jsjkx.191000141
[12] 陈训敏, 叶书函, 詹瑞.
基于多任务学习及由粗到精的卷积神经网络人群计数模型
Crowd Counting Model of Convolutional Neural Network Based on Multi-task Learning and Coarse to Fine
计算机科学, 2020, 47(11A): 183-187. https://doi.org/10.11896/jsjkx.200300012
[13] 高利剑,毛启容.
环境辅助的多任务混合声音事件检测方法
Environment-assisted Multi-task Learning for Polyphonic Acoustic Event Detection
计算机科学, 2020, 47(1): 159-164. https://doi.org/10.11896/jsjkx.190200365
[14] 吴良庆, 张栋, 李寿山, 陈瑛.
基于多任务学习的多模态情绪识别方法
Multi-modal Emotion Recognition Approach Based on Multi-task Learning
计算机科学, 2019, 46(11): 284-290. https://doi.org/10.11896/jsjkx.180901665
[15] 孟浩华 李国正.
基于遗传算法的多任务学习

计算机科学, 2008, 35(10): 186-187.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!