Computer Science ›› 2019, Vol. 46 ›› Issue (5): 228-234.doi: 10.11896/j.issn.1002-137X.2019.05.035

Previous Articles     Next Articles

Multi-semantic Interaction Based Iterative Scene Understanding Framework

YAO Tuo-zhong1, ZUO Wen-hui2, AN Peng1, SONG Jia-tao1   

  1. (School of Electronic and Information Engineering,Ningbo University of Technology,Ningbo,Zhejiang 315016,China)1
    (College of Information Science and Electronic Engineering,Zhejiang University,Hangzhou 310027,China)2
  • Published:2019-05-15

Abstract: Traditional feed-forward based visual systems have been widely used for years and one fatal defect of this kind of system is that they can’t correct the mistakes by themselves during working,thus resulting in the performance degradation.This paper proposed a simple interactive framework,which solves the semantic uncertainty of the scene through the cooperation of multiple visual analysis processes,leading to scene understanding optimization.In this framework,three classic scene understanding algorithms are used as visual analysis modules and their outputs such as surface layout,boundary,depth,viewpoint and object class are shared for each other by contextual interaction,so as to improve their own performance iteratively.The proposed framework doesn’t need man-made constraints and can add new models in without large modifications of the original framework and algorithms,so it has good scalability.The experimental results on Geometric Context dataset demonstrate that this intrinsic information interaction based system has better flexibility and performs better than traditional feed-forward based systems.The mean accuracy of surface layout,boundary and viewpoint estimation is increased by more than 5% and the mean accuracy of object detection is increased by more than 6%.This attempt can be an efficient way of improving traditional visual systems.

Key words: Boundary/Depth estimation, Iterative scene understanding, Multi-semantic interaction, Object/Viewpoint detection, Surface layout estimation

CLC Number: 

  • TP391.4
[1]SAVINOV N,HANE C,LADICKY L,et al.Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint [C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:5460-5469.
[2]GOULD S,FULTON R,KOLLER D.Decomposing a Scene into Geometric and Semantically Consistent Regions [C]∥IEEE 12th International Conference on Computer Vision.2009:1-8.
[3]MUSTAFA A,HILTON A.Semantically Coherent Co-segmentation and Reconstruction of Dynamic Scenes [C]∥IEEE International Conference in Computer Vision and Pattern Recognition.2017:5583-5592.
[4]TATENO K,TOMBARI F,LAINA I,et al.CNN-SLAM:Real-Time Dense Monocular SLAM with Learned Depth Prediction [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2017:6565-6574.
[5]LEE J K,YEA J,PARK M G,et al.Joint Layout Estimation and Global Multi-view Registration for Indoor Reconstruction [C]∥IEEE International Conference on Computer Vision.2017:162-171.
[6]ULUSOY A O,BLACK M G,GEIGER A.Semantic Multi-view Stereo:Jointly Estimating Objects and Voxels [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2017:4531-4540.
[7]RABINOVICH A,VEDALDI A,GALLEGUILLOS C,et al.
Objects in Context [C]∥IEEE International Conference on Computer Vision.2007:1-8.
[8]YAO B P,NIEBLES J C,FEI-FEI L.Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition [C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.2009:100-106.
[9]WAN Y,SHI Y,CHEN X L.Image Classification with Non-negative and Local Laplacian Sparse Coding and Context Information [J].Journal of Image and Graphi-cs,2017,22(6):731-740.(in Chinese)万源,史莹,陈晓丽.非负局部Laplacian稀疏编码和上下文信息的图像分类[J].中国图象图形学报,2017,22(6):731-740.
[10]WU H,YU X,SUI Y,et al.Structure Recovery AlgorithmUsing Contextual Information [J].Journal of Image and Gra-phics,2012,17 (7):839-845.(in Chinese) 武晖,于昕,隋尧,等.融合上下文信息的场景结构恢复[J].中国图象图形学报,2012,17(7):839-845.
[11]GUPTA A,EFROS A A,HEBERT M.Blocks World Revisited:Image Understanding Using Qualitative Geometry and Mecha-nics [C]∥Proceedings of the 11th European Confe-rence on Computer Vision.2010:482-496.
[12]SAVINOV N,HANE C,LADICKY L,et al.Semantic 3D Re-construction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint [C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:5460-5469.
[13]MUSTAFA A,HILTON A.Semantically Coherent Co-segmentation and Reconstruction of Dynamic Scenes [C]∥IEEE International Conference in Computer Vision and Pattern Recognition.2017:5583-5592 .
[14]LIU T L,FENG X L,GU Y Q,et al.Coarse-to-Fine semantic parsing method for RGB-D indoor scenes [J].Journal of Southeast University (Natural Science Edition),2016,46(4):681-687.(in Chinese) 刘天亮,冯希龙,顾雁秋,等.一种由粗至精的RGB-D室内场景语义分割方法.东南大学学报(自然科学版),2016,46(4):681-687.
[15]BARROW H,TENENBAUM J.Recovering Intrinsic SceneCharacteristics from Images [M].Computer Vision Systems,1978.
[16]HOIEM D,STEIN A N,EFROS A A,et al.Recovering Occlusion Boundaries from a Single Image [C]∥IEEE 11th International Conference on Computer Vision.2007:1-8.
[17]DIVVALA S K,HOIEM D,HAYS J H,et al.An Empirical Study of Context in Object Detection [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2009:1271-1278.
[18]HOIEM D,EFROS A A,HEBERT M.Geometric Context from A Single Image [C]∥IEEE 10th International Conference on Computer Vision.2005:654-661.
[19]FELZENSZWALB P,GIRSHICK R B,MCALLESTER D,etal.Object Detection with Discriminatively Trained Part-Based Models [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1627-1645.
[20]SHI J B,MALIK J.Normalized Cuts and Image Segmentation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905.
[21]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2005:886-893.
[22]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN [C]∥IEEE International Conference on Computer Vision.2017:2980-2988.
[23]XU D,OUYANG W L,WANG X G,et al.PAD-Net:Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing [C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:675-684.
[24]CAO Z,SIMON T,WEI S E,et al.Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.2017.
[25]MAHASSENI B,TODOROVIC S.Regularizing Long ShortTerm Memory with 3D Human-Skeleton Sequences for Action Recognition [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2016:3054-3062.
[1] ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[2] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[3] WU Lin, SUN Jing-yu. Multi-branch RA Capsule Network and Its Application in Image Classification [J]. Computer Science, 2022, 49(6): 224-230.
[4] WU Zi-bin, YAN Qiao. Projected Gradient Descent Algorithm with Momentum [J]. Computer Science, 2022, 49(6A): 178-183.
[5] YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[6] HUANG Pu, DU Xu-ran, SHEN Yang-yang, YANG Zhang-jing. Face Recognition Based on Locality Regularized Double Linear Reconstruction Representation [J]. Computer Science, 2022, 49(6A): 407-411.
[7] HUANG Pu, SHEN Yang-yang, DU Xu-ran, YANG Zhang-jing. Face Recognition Based on Locality Constrained Feature Line Representation [J]. Computer Science, 2022, 49(6A): 429-433.
[8] ZONG Di-di, XIE Yi-wu. Model Medial Axis Generation Method Based on Normal Iteration [J]. Computer Science, 2022, 49(6A): 764-770.
[9] HU Fu-yuan, WAN Xin-jun, SHEN Ming-fei, XU Jiang-lang, YAO Rui, TAO Zhong-ben. Survey Progress on Image Instance Segmentation Methods of Deep Convolutional Neural Network [J]. Computer Science, 2022, 49(5): 10-24.
[10] CHENG Ke-yang, WANG Ning, CUI Hong-gang, ZHAN Yong-zhao. Interpretability Optimization Method Based on Mutual Transfer of Local Attention Map [J]. Computer Science, 2022, 49(5): 64-70.
[11] WEI Qin, LI Ying-jiao, LOU Ping, YAN Jun-wei, HU Ji-wei. Face Recognition Method Based on Edge-Cloud Collaboration [J]. Computer Science, 2022, 49(5): 71-77.
[12] XING Yun-bing, LONG Guang-yu, HU Chun-yu, HU Li-sha. Human Activity Recognition Method Based on Class Increment SVM [J]. Computer Science, 2022, 49(5): 78-83.
[13] LU Ting, HOU Guo-jia, PAN Zhen-kuan, WANG Guo-dong. Underwater Image Quality Assessment Based on HVS [J]. Computer Science, 2022, 49(5): 98-104.
[14] LI Jing-tai, WANG Xiao-dan. XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function [J]. Computer Science, 2022, 49(5): 135-143.
[15] QU Zhong, CHEN Wen. Concrete Pavement Crack Detection Based on Dilated Convolution and Multi-features Fusion [J]. Computer Science, 2022, 49(3): 192-196.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!