计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 212-219.doi: 10.11896/jsjkx.200900005

• 计算机图形学&多媒体 • 上一篇    下一篇

基于编码-解码器架构的光场深度估计方法

晏旭1,2,3, 马帅1,2,3, 曾凤娇1,2,3, 郭正华1,2,3, 伍俊龙1,2,3, 杨平1,2, 许冰1,2   

  1. 1 中国科学院光电技术研究所自适应光学重点实验室 成都610209
    2 中国科学院光电技术研究所 成都610209
    3 中国科学院大学 北京100049
  • 收稿日期:2020-09-01 修回日期:2021-02-02 出版日期:2021-10-15 发布日期:2021-10-18
  • 通讯作者: 许冰((bing_xu_ioe@163.com)
  • 作者简介:18123087889@163.com
  • 基金资助:
    国家自然科学基金(J19K004)

Light Field Depth Estimation Method Based on Encoder-decoder Architecture

YAN Xu1,2,3, MA Shuai1,2,3, ZENG Feng-jiao1,2,3, GUO Zheng-hua1,2,3, WU Jun-long1,2,3, YANG Ping1,2, XU Bing1,2   

  1. 1 Key Laboratory on Adaptive Optics,Institute of Optics and Electronics,Chinese Academy of Sciences,Chengdu 610209,China
    2 Institute of Optics and Electronics,Chinese Academy of Sciences,Chengdu 610209,China
    3 University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2020-09-01 Revised:2021-02-02 Online:2021-10-15 Published:2021-10-18
  • About author:YAN Xu,born in 1995,postgraduate.His main research interests include computer vision and deep learning.
    XU Bing,born in 1960,senior research scientist,Ph.D supervisor.His research interests include application of adaptive optics in improving laser beam quality,wavefront detector development,and application of light field cameras.
  • Supported by:
    National Natural Science Foundation of China(J19K004).

摘要: 针对现有光场深度估计方法存在的计算时间长和精度低的问题,提出了一种融合光场结构特征的基于编码-解码器架构的光场深度估计方法。该方法基于卷积神经网络,采用端到端的方式进行计算,一次输入光场图像就可获得场景视差信息,计算量远低于传统方法,大大缩短了计算时间。为提高计算精确度,网络模型以光场图像的多方向极平面图堆叠体(Epipolar Plane Image Volume,EPI-volume)为输入,先利用多路编码模块对输入的光场图像进行特征提取,再使用带跳跃连接的编码-解码器架构进行特征聚合,使网络在逐像素视差估计时能够融合目标像素点邻域的上下文信息。此外,模型采取不同深度的卷积块从中心视角图中提取场景的结构特征,并将该结构特征引入对应的跳跃连接中,为视差图预测提供了额外的边缘特征参考,进一步提高了计算精确度。对HCI-4D光场基准测试集的实验结果表明,所提方法的坏像素率(BadPix)指标比对比方法降低了31.2%,均方误差(MSE)指标比对比方法降低了54.6%。对于基准测试集中的光场图像,深度估计的平均计算时间为1.2s,计算速度远超对比方法。

关键词: 编码-解码器结构, 光场, 极平面图, 上下文信息, 深度估计

Abstract: Aiming at the solution to the time-consuming and low-precision disadvantage of present methodologies,the light field depth estimation method combining context information of the scene is proposed.This method is based on an end-to-end convolutional neural network,with the advantage of obtaining depth map from a single light field image.On merit of the reduced computational cost from this method,the time consumption is consequently decreased.For improvement in calculation accuracy,multi orientation epipolar plane image volumes of the light field images are input to network,from which feature can be extracted by the multi-stream encoding module,and then aggregated by the encoding-decoding architecture with skip connection,resulting in fuse the context information of the neighborhood of the target pixel in the process of per-pixel disparity estimation.Furthermore,the model uses convolutional blocks of different depths to extract the structural features of the scene from the central viewpoint image,by introducing these structural features into the corresponding skip connection,additional references for edge features are obtained and the calculation accuracy is further improved.Experiments in the HCI 4D Light Field Benchmark show that the BadPix index and MSE index of the proposed method are respectively 31.2% and 54.6% lower than those of the comparison me-thod,and the average calculation time of depth estimation is 1.2 seconds,which is much faster than comparison method.

Key words: Context information, Depth estimation, Encoder-decoder, Epipolar plane image, Light field

中图分类号: 

  • TP391
[1]GERSHUN A.The Light Field[J].Studies in Applied Mathematics,1939,18(1/2/3/4):51-151.
[2]WANNER S,GOLDLUECKE B.Globally consistent depth la-beling of 4D light fields[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2012:41-48.
[3]TOSIC I,BERKNER K.Light Field Scale-Depth Space Trans-form for Dense Depth Estimation[C]//Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition Workshops.2014:435-442.
[4]ZHANG S,SHENG H,LI C,et al.Robust depth estimation for light field via spinning parallelogram operator[J].Computer Vision and Image Understanding,2016,145:148-159.
[5]JEON H G,PARK J,CHOE G,et al.Accurate depth map estimation from a lenslet light field camera[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2015:1547-1555.
[6]CHEN C,LIN H,YU Z,et al.Light Field Stereo MatchingUsing Bilateral Statistics of Surface Cameras[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition.2014:1518-1525.
[7]KALANTARI N K,WANG T C,RAMAMOORTHI R.Lear-ning-based view synthesis for light field cameras[J].ACM Transactions on Graphics (TOG),2016,35(6):1-10.
[8]YOON Y,JEON H G,YOO D,et al.Light-field image super-resolution using convolutional neural network[J].IEEE Signal Processing Letters,2017,24(6):848-852.
[9]WANG T C,ZHU J Y,HIROAKI E,et al.A 4d light-field dataset and cnn architectures for material recognition[C]//Procee-dings of the European Conference on Computer Vision.2016:121-138.
[10]SRINIVASAN P P,WANG T,SREELAL A,et al.Learning to synthesize a 4d rgbd light field from a single image[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2243-2251.
[11]ZHONG T,JIN X,LI L,et al.Light field image compressionusing depth-based CNN in intra prediction[C]//Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).2019:8564-8567.
[12]HEBER S,POCK T.Convolutional networks for shape fromlight field[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2016:3746-3754.
[13]HEBER S,YU W,POCK T.Neural EPI-volume networks for shape from light field[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2252-2260.
[14]ZHOU W,LIANG L,ZHANG H,et al.Scale and Orientation Aware EPI-Patch Learning for Light Field Depth Estimation[C]//Proceedings of the International Conference on Pattern Recognition.2018:2362-2367.
[15]TAGHANAKI S A,ABHISHEK K,COHEN J P,et al.Deep Semantic Segmentation of Natural and Medical Images:A Review[J].Artificial Intelligence Review,2021,54(1):137-178.
[16]KENDALL A,MARTIROSYAN H,DASGUPTA S,et al.End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:66-75.
[17]CHANG J R,CHEN Y S.Pyramid stereo matching network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5410-5418.
[18]HUANG P H,MATZEN K,KOPF J,et al.Deepmvs:Learning multi-view stereopsis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2821-2830.
[19]WANNER S,GOLDLUECKE B.Variational Light Field Analysis for Disparity Estimation and Super-Resolution[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(3):606-619.
[20]JOHANNSEN O,SULC A,GOLDLUECKE B.What SparseLight Field Coding Reveals about Scene Structure[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3262-3270.
[21]STRECKE M,ALPEROVICH A,GOLDLUECKE B.Accurate Depth and Normal Maps from Occlusion-Aware Focal Stack Symmetry [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2814-2822.
[22]SHENG H,ZHANG S,CAO X,et al.Geometric OcclusionAnalysis in Depth Estimation Using Integral Guided Filter for Light-Field Image[J].IEEE Transactions on Image Processing,2017,26(12):5758-5771.
[23]HONAUER K,JOHANNSEN O,KONDERMANN D,et al.A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields[C]//Proceedings of the Asian Conference on Computer Vision.2016:19-34.
[24]JOHANNSEN O,HONAUER K,GOLDLUECKE B,et al.ATaxonomy and Evaluation of Dense Light Field Depth Estimation Algorithms[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).2017:82-99.
[25]BUSLAEV A,IGLOVIKOV V I,KHVEDCHENYA E,et al.Albumentations:fast and flexible image augmentations[J].Information,2020,11(2):125.
[26]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2012,25:1097-1105.
[27]JEON H,PARK J,CHOE G,et al.Depth from a Light FieldImage with Learning-based Matching Costs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(2):297-310.
[28]LUO Y,ZHOU W,FANG J,et al.EPI-Patch Based Convolu-tional Neural Network for Depth Estimation on 4D Light Field[C]//Proceedings of the International Conference on Neural Information Processing.2017:642-652.
[29]RERABEK M,EBRAHIMI T.New Light Field Image Dataset[C]//8th International Conference on Quality of Multimedia Experience (QoMEX).2016.
[30]PENDU M L,JIANG X,GUILLEMOT C.Light Field Inpain-ting Propagation via Low Rank Matrix Completion[J].IEEE Transactions on Image Processing,2018,27(4):1981-1993.
[1] 黄少滨, 孙雪薇, 李熔盛.
基于跨句上下文信息的神经网络关系分类方法
Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network
计算机科学, 2022, 49(6A): 119-124. https://doi.org/10.11896/jsjkx.210600150
[2] 郝志峰, 廖祥财, 温雯, 蔡瑞初.
基于多上下文信息的协同过滤推荐算法
Collaborative Filtering Recommendation Algorithm Based on Multi-context Information
计算机科学, 2021, 48(3): 168-173. https://doi.org/10.11896/jsjkx.200700101
[3] 马海江.
基于卷积神经网络与约束概率矩阵分解的推荐算法
Recommendation Algorithm Based on Convolutional Neural Network and Constrained Probability Matrix Factorization
计算机科学, 2020, 47(6A): 540-545. https://doi.org/10.11896/JsJkx.191000172
[4] 杨少鹏, 刘宏哲, 王雪峤.
基于特征图融合的小尺寸人脸检测
Small Size Face Detection Based on Feature Map Fusion
计算机科学, 2020, 47(6): 126-132. https://doi.org/10.11896/jsjkx.19050002
[5] 李天培, 陈黎.
基于双注意力编码-解码器架构的视网膜血管分割
Retinal Vessel Segmentation Based on Dual Attention and Encoder-decoder Structure
计算机科学, 2020, 47(5): 166-171. https://doi.org/10.11896/jsjkx.190400062
[6] 周鹏程,龚声蓉,钟珊,包宗铭,戴兴华.
基于深度特征融合的图像语义分割
Image Semantic Segmentation Based on Deep Feature Fusion
计算机科学, 2020, 47(2): 126-134. https://doi.org/10.11896/jsjkx.190100119
[7] 徐扬,王建成,刘启元,李寿山.
基于上下文信息的口语意图检测方法
Intention Detection in Spoken Language Based on Context Information
计算机科学, 2020, 47(1): 205-211. https://doi.org/10.11896/jsjkx.181202269
[8] 姚拓中, 左文辉, 安鹏, 宋加涛.
基于多重语义交互的递归式场景理解框架
Multi-semantic Interaction Based Iterative Scene Understanding Framework
计算机科学, 2019, 46(5): 228-234. https://doi.org/10.11896/j.issn.1002-137X.2019.05.035
[9] 赵鹏, 吴礼发, 洪征.
基于经纪人的多云访问控制模型研究
Research on Broker Based Multicloud Access Control Model
计算机科学, 2019, 46(11): 123-129. https://doi.org/10.11896/jsjkx.190300112
[10] 曾崇, 郭华龙, 曾志宏, 赵娟.
基于光场扫描的真三维立体显示系统的开发
Development of Real 3D Display System Based on Light Field Scanning
计算机科学, 2018, 45(6A): 598-600.
[11] 文俊浩,孙光辉,李顺.
基于用户聚类和移动上下文的矩阵分解推荐算法研究
Study on Matrix Factorization Recommendation Algorithm Based on User Clustering and Mobile Context
计算机科学, 2018, 45(4): 215-219. https://doi.org/10.11896/j.issn.1002-137X.2018.04.036
[12] 薛松,王文剑.
基于高斯-柯西混合模型的单幅散焦图像深度恢复方法
Depth Estimation from Single Defocused Image Based on Gaussian-Cauchy Mixed Model
计算机科学, 2017, 44(1): 32-36. https://doi.org/10.11896/j.issn.1002-137X.2017.01.006
[13] 赵青青,张涛,郑伟波.
基于光场成像的数据提取与预处理方法
Data Acquisition and Pre-processing Based on Light Field Photography
计算机科学, 2016, 43(Z11): 140-143. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.030
[14] 高宏伟,王慧科,刘传银,李斌.
单相机变焦图像深度估计技术研究
Depth Estimation of Single Camera Zooming Image
计算机科学, 2014, 41(Z6): 164-166.
[15] 谌国风,孔俊俊,郭耀,陈向群.
一种智能手机上下文信息获取的代价模型及其应用
Context Retrieval Cost Model on Smartphones and its Application
计算机科学, 2014, 41(11): 132-136. https://doi.org/10.11896/j.issn.1002-137X.2014.11.026
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!