计算机科学 ›› 2019, Vol. 46 ›› Issue (5): 228-234.doi: 10.11896/j.issn.1002-137X.2019.05.035

• 图形图像与模式识别 • 上一篇    下一篇

基于多重语义交互的递归式场景理解框架

姚拓中1, 左文辉2, 安鹏1, 宋加涛1   

  1. (宁波工程学院电子与信息工程学院 浙江 宁波315016)1
    (浙江大学信息与电子工程学院 杭州310027)2
  • 发布日期:2019-05-15
  • 作者简介:姚拓中(1983-),男,博士,讲师,主要研究方向为计算机视觉和机器学习,E-mail:thomasyao@zju.edu.cn;左文辉(1984-),男,博士生,主要研究方向为计算机视觉和机器学习,E-mail:wenhuizuo@126.com(通信作者);安 鹏(1981-),男,博士,教授,主要研究方向为嵌入式系统和移动机器人;宋加涛(1966-),男,博士,教授,主要研究方向为图像处理和模式识别。
  • 基金资助:
    国家自然科学青年基金(61502256),浙江省重点研发计划项目(2018C01086),宁波市自然科学基金(2018A610160)资助。

Multi-semantic Interaction Based Iterative Scene Understanding Framework

YAO Tuo-zhong1, ZUO Wen-hui2, AN Peng1, SONG Jia-tao1   

  1. (School of Electronic and Information Engineering,Ningbo University of Technology,Ningbo,Zhejiang 315016,China)1
    (College of Information Science and Electronic Engineering,Zhejiang University,Hangzhou 310027,China)2
  • Published:2019-05-15

摘要: 传统基于前馈设计的视觉系统已经非常普遍,但其存在的一大缺陷是某个环节出现的错误无法被及时修正,从而影响系统的最终性能。为此,提出了一种简易的交互式框架,其特点在于场景语义的不确定性能够通过不同的视觉分析过程协同工作实现求解和优化。在该框架中,分别使用了3个经典的场景理解算法作为视觉分析模块,不同模块之间利用彼此输出的表面布局、边界、深度、视点和物体类等上下文语义之间的交互以实现各自性能的渐进式提升。提出的方法不需要人为设置约束条件,可根据需求插入新的模块而无须对原有框架和算法进行大的修改,具有良好的可扩展性。基于Geometric Context数据集的实验结果表明,这种基于本征信息交互的反馈式设计通过多次递归后能够有效弥补前馈式系统存在的不足,其中表面布局、边界和视点估计的平均精度提升了5%以上,而物体类的平均检测精度也提升了6%以上,其可成为未来改进视觉系统性能的途径之一。

关键词: 边界/深度估计, 表面布局估计, 递归式场景理解, 多重语义交互, 物体/视点检测

Abstract: Traditional feed-forward based visual systems have been widely used for years and one fatal defect of this kind of system is that they can’t correct the mistakes by themselves during working,thus resulting in the performance degradation.This paper proposed a simple interactive framework,which solves the semantic uncertainty of the scene through the cooperation of multiple visual analysis processes,leading to scene understanding optimization.In this framework,three classic scene understanding algorithms are used as visual analysis modules and their outputs such as surface layout,boundary,depth,viewpoint and object class are shared for each other by contextual interaction,so as to improve their own performance iteratively.The proposed framework doesn’t need man-made constraints and can add new models in without large modifications of the original framework and algorithms,so it has good scalability.The experimental results on Geometric Context dataset demonstrate that this intrinsic information interaction based system has better flexibility and performs better than traditional feed-forward based systems.The mean accuracy of surface layout,boundary and viewpoint estimation is increased by more than 5% and the mean accuracy of object detection is increased by more than 6%.This attempt can be an efficient way of improving traditional visual systems.

Key words: Boundary/Depth estimation, Iterative scene understanding, Multi-semantic interaction, Object/Viewpoint detection, Surface layout estimation

中图分类号: 

  • TP391.4
[1]SAVINOV N,HANE C,LADICKY L,et al.Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint [C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:5460-5469.
[2]GOULD S,FULTON R,KOLLER D.Decomposing a Scene into Geometric and Semantically Consistent Regions [C]∥IEEE 12th International Conference on Computer Vision.2009:1-8.
[3]MUSTAFA A,HILTON A.Semantically Coherent Co-segmentation and Reconstruction of Dynamic Scenes [C]∥IEEE International Conference in Computer Vision and Pattern Recognition.2017:5583-5592.
[4]TATENO K,TOMBARI F,LAINA I,et al.CNN-SLAM:Real-Time Dense Monocular SLAM with Learned Depth Prediction [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2017:6565-6574.
[5]LEE J K,YEA J,PARK M G,et al.Joint Layout Estimation and Global Multi-view Registration for Indoor Reconstruction [C]∥IEEE International Conference on Computer Vision.2017:162-171.
[6]ULUSOY A O,BLACK M G,GEIGER A.Semantic Multi-view Stereo:Jointly Estimating Objects and Voxels [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2017:4531-4540.
[7]RABINOVICH A,VEDALDI A,GALLEGUILLOS C,et al.
Objects in Context [C]∥IEEE International Conference on Computer Vision.2007:1-8.
[8]YAO B P,NIEBLES J C,FEI-FEI L.Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition [C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.2009:100-106.
[9]WAN Y,SHI Y,CHEN X L.Image Classification with Non-negative and Local Laplacian Sparse Coding and Context Information [J].Journal of Image and Graphi-cs,2017,22(6):731-740.(in Chinese)万源,史莹,陈晓丽.非负局部Laplacian稀疏编码和上下文信息的图像分类[J].中国图象图形学报,2017,22(6):731-740.
[10]WU H,YU X,SUI Y,et al.Structure Recovery AlgorithmUsing Contextual Information [J].Journal of Image and Gra-phics,2012,17 (7):839-845.(in Chinese) 武晖,于昕,隋尧,等.融合上下文信息的场景结构恢复[J].中国图象图形学报,2012,17(7):839-845.
[11]GUPTA A,EFROS A A,HEBERT M.Blocks World Revisited:Image Understanding Using Qualitative Geometry and Mecha-nics [C]∥Proceedings of the 11th European Confe-rence on Computer Vision.2010:482-496.
[12]SAVINOV N,HANE C,LADICKY L,et al.Semantic 3D Re-construction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint [C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:5460-5469.
[13]MUSTAFA A,HILTON A.Semantically Coherent Co-segmentation and Reconstruction of Dynamic Scenes [C]∥IEEE International Conference in Computer Vision and Pattern Recognition.2017:5583-5592 .
[14]LIU T L,FENG X L,GU Y Q,et al.Coarse-to-Fine semantic parsing method for RGB-D indoor scenes [J].Journal of Southeast University (Natural Science Edition),2016,46(4):681-687.(in Chinese) 刘天亮,冯希龙,顾雁秋,等.一种由粗至精的RGB-D室内场景语义分割方法.东南大学学报(自然科学版),2016,46(4):681-687.
[15]BARROW H,TENENBAUM J.Recovering Intrinsic SceneCharacteristics from Images [M].Computer Vision Systems,1978.
[16]HOIEM D,STEIN A N,EFROS A A,et al.Recovering Occlusion Boundaries from a Single Image [C]∥IEEE 11th International Conference on Computer Vision.2007:1-8.
[17]DIVVALA S K,HOIEM D,HAYS J H,et al.An Empirical Study of Context in Object Detection [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2009:1271-1278.
[18]HOIEM D,EFROS A A,HEBERT M.Geometric Context from A Single Image [C]∥IEEE 10th International Conference on Computer Vision.2005:654-661.
[19]FELZENSZWALB P,GIRSHICK R B,MCALLESTER D,etal.Object Detection with Discriminatively Trained Part-Based Models [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1627-1645.
[20]SHI J B,MALIK J.Normalized Cuts and Image Segmentation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905.
[21]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2005:886-893.
[22]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN [C]∥IEEE International Conference on Computer Vision.2017:2980-2988.
[23]XU D,OUYANG W L,WANG X G,et al.PAD-Net:Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing [C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:675-684.
[24]CAO Z,SIMON T,WEI S E,et al.Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.2017.
[25]MAHASSENI B,TODOROVIC S.Regularizing Long ShortTerm Memory with 3D Human-Skeleton Sequences for Action Recognition [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2016:3054-3062.
[1] 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥.
视频理解中的动作质量评估方法综述
Survey on Action Quality Assessment Methods in Video Understanding
计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028
[2] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[3] 武霖, 孙静宇.
多分支RA胶囊网络及在图像分类中的应用
Multi-branch RA Capsule Network and Its Application in Image Classification
计算机科学, 2022, 49(6): 224-230. https://doi.org/10.11896/jsjkx.210400087
[4] 吴子斌, 闫巧.
基于动量的映射式梯度下降算法
Projected Gradient Descent Algorithm with Momentum
计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
[5] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[6] 黄璞, 杜旭然, 沈阳阳, 杨章静.
基于局部正则二次线性重构表示的人脸识别
Face Recognition Based on Locality Regularized Double Linear Reconstruction Representation
计算机科学, 2022, 49(6A): 407-411. https://doi.org/10.11896/jsjkx.210700018
[7] 黄璞, 沈阳阳, 杜旭然, 杨章静.
基于局部约束特征线表示的人脸识别
Face Recognition Based on Locality Constrained Feature Line Representation
计算机科学, 2022, 49(6A): 429-433. https://doi.org/10.11896/jsjkx.210300169
[8] 宗迪迪, 谢益武.
基于法线迭代的模型中轴生成方法
Model Medial Axis Generation Method Based on Normal Iteration
计算机科学, 2022, 49(6A): 764-770. https://doi.org/10.11896/jsjkx.210400050
[9] 胡伏原, 万新军, 沈鸣飞, 徐江浪, 姚睿, 陶重犇.
深度卷积神经网络图像实例分割方法研究进展
Survey Progress on Image Instance Segmentation Methods of Deep Convolutional Neural Network
计算机科学, 2022, 49(5): 10-24. https://doi.org/10.11896/jsjkx.210200038
[10] 成科扬, 王宁, 崔宏纲, 詹永照.
基于局部注意力图互迁移的可解释性优化方法
Interpretability Optimization Method Based on Mutual Transfer of Local Attention Map
计算机科学, 2022, 49(5): 64-70. https://doi.org/10.11896/jsjkx.210400176
[11] 魏勤, 李瑛娇, 娄平, 严俊伟, 胡辑伟.
基于边云协同的人脸识别方法研究
Face Recognition Method Based on Edge-Cloud Collaboration
计算机科学, 2022, 49(5): 71-77. https://doi.org/10.11896/jsjkx.210300222
[12] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[13] 鹿婷, 侯国家, 潘振宽, 王国栋.
基于HVS的水下图像质量评价
Underwater Image Quality Assessment Based on HVS
计算机科学, 2022, 49(5): 98-104. https://doi.org/10.11896/jsjkx.210100224
[14] 李京泰, 王晓丹.
基于代价敏感激活函数XGBoost的不平衡数据分类方法
XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function
计算机科学, 2022, 49(5): 135-143. https://doi.org/10.11896/jsjkx.210400064
[15] 瞿中, 陈雯.
基于空洞卷积和多特征融合的混凝土路面裂缝检测
Concrete Pavement Crack Detection Based on Dilated Convolution and Multi-features Fusion
计算机科学, 2022, 49(3): 192-196. https://doi.org/10.11896/jsjkx.210100164
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!