基于语义边缘驱动的实时双目深度估计算法

doi:10.11896/jsjkx.200800203

计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 216-222.doi: 10.11896/jsjkx.200800203

• 计算机图形学&多媒体 • 上一篇下一篇

基于语义边缘驱动的实时双目深度估计算法

张鹏, 王新晴, 肖毅, 段宝国, 许鸿辉

陆军工程大学野战工程学院机械工程系南京210007

收稿日期:2020-08-29 修回日期:2020-09-08 出版日期:2021-09-15 发布日期:2021-09-10
通讯作者: 王新晴(wwwxxxqqq@126.com)
作者简介:ZPhlgfs19951027@163.com
基金资助:
国家自然科学基金(61671470);国家重点基础研究发展计划(2016YFC0802904);中国博士后科学基金(2017M623423)

Real-time Binocular Depth Estimation Algorithm Based on Semantic Edge Drive

ZHANG Peng, WANG Xin-qing, XIAO Yi, DUAN Bao-guo, XU Hong-hui

Department of Mechanical Engineering,College of Field Engineering,Army Engineering University,Nanjing 210007,China

Received:2020-08-29 Revised:2020-09-08 Online:2021-09-15 Published:2021-09-10
About author:ZHANG Peng,born in 1995,postgra-duate.His main research interests include deep learning,computer vision and point cloud processing.
WANG Xin-qing,born in 1963,Ph.D,professor,Ph.D supervisor.His main research interests include intelligent signal processing and deep learning.
Supported by:
National Natural Science Foundation of China(61671470),National Basic Research Program of China(2016YFC0802904) and China Postdoctoral Science Foundation (2017M623423)

摘要/Abstract

摘要： 针对立体匹配中不适定区域视差边缘模糊、视差不平滑、单个物体视差不连续、存在空洞的问题,提出了一种轻量化的实时双目深度估计算法,将场景图、通过语义分割得到的语义标签图和通过边缘检测得到的边缘细节图作为辅助损失,以地面真值图为主要损失,构造了联合损失函数,以更好地监督视差图的生成。此外,构造了一个轻量化的特征提取模块,以降低特征提取模块的冗余性,从而更好地简化特征提取步骤,提高了网络的实时性和轻量性。最后利用由粗到精的思想实现视差图的渐进细化过程,利用低分辨率视差图变形与高分辨率特征图融合的方式,分阶段生成不同尺度的视差图,细节特征逐渐丰富,从而获得了最终的精准视差图。在KITTI 2012数据集上得到1.72%的3px错误率,在Middlebury 2014数据集中,Vintge错误率为1.23%,Playroom错误率为2.23%,Recycle错误率为1.65%,并且在Scene Flow数据集上计算时间低至0.76 s,内存占用量为2.4 G,显著提高了立体匹配算法在不适定区域的准确性和计算效率,能够满足工程实践中的实时性要求,对于实时三维重建任务有着很重要的指导意义。

关键词: 边缘提取, 端到端网络, 立体匹配, 由粗到精, 语义理解

Abstract: Aiming at the problem of ill-posed regions with blurred disparity edges,unsmooth disparity,discontinuous disparity of a single object,and holes in stereo matching,a lightweight real-time binocular depth estimation algorithm is proposed,which uses the semantic tags obtained by semantic segmentation of the scene graph and the edge detail images obtained by edge detection asauxi-liary loss,and the ground truth image as the main loss,to construct the joint loss function which can better supervise the generation of the disparity map.In addition,a lightweight feature extraction module is constructed to reduce the redundancy of the feature extraction stage,which can better simplify the feature extraction steps,and improve the real-time and lightness of the network.Finally,the idea of from coarse to fine is used to realize the gradual refinement process of the disparity map with fusion of low-resolution disparity map deformation and high-resolution feature map to generate disparity maps of different scales in stages,meanwhile,the detailed features are gradually enriched,thus obtaining the final accurate disparity map.The 3px error rate of 1.72% is obtained on the KITTI 2012 dataset,the Vintge error rate on the Middlebury 2014 dataset is 1.23%,the Playroom error rate is 2.23%,and the Recycle error rate is 1.65%.Meanwhile,the calculation time on the Scene Flow dataset reaches 0.76 s with 2.4 G memory occupation,which significantly improves the accuracy and computational efficiency of stereo matching algorithms in the ill-posed regions,meets the real-time requirements in engineering practice,and has important guiding significance for real-time 3D reconstruction tasks.

Key words: Edge extraction, End-to-end network, From coarse to fine, Semantic understanding, Stereo matching

中图分类号:

TP391.41

张鹏, 王新晴, 肖毅, 段宝国, 许鸿辉. 基于语义边缘驱动的实时双目深度估计算法[J]. 计算机科学, 2021, 48(9): 216-222. https://doi.org/10.11896/jsjkx.200800203

ZHANG Peng, WANG Xin-qing, XIAO Yi, DUAN Bao-guo, XU Hong-hui. Real-time Binocular Depth Estimation Algorithm Based on Semantic Edge Drive[J]. Computer Science, 2021, 48(9): 216-222. https://doi.org/10.11896/jsjkx.200800203

参考文献

[1]ZHAO X,LIU L,ZHENG R,et al.A robust stereo feature-aided semi-direct SLAM system[J].Robotics and Autonomous Systems,2020,132(5):103597.
[2]SCHARSTEIN D,SZELISKI R.A Taxonomy and evaluation of dense two-Frame stereo correspondence algorithms[J].International Joural of Computer Vision,2018,47(3):7-42.
[3]ZBONTAR J,LECUN Y.Stereo matching by training a convolutional neural network to compare image patches[J].arXiv:1510.05970,2016.
[4]LUO W,ALEXANDER G,RAQUEL U.Efficient deep learning for stereo matching[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.2016:5695-5703.
[5]LEE Y,KYUNG C.A memory and accuracy aware gaussian parameter-based stereo matching using confidence measure[J].IEEE Transaction on Pattern Analysis and Machine Intelligence,2019,99(2):1.
[6]MAYER N,ILG E,HAUSSER P.A large dataset to train con-volutional networks for disparity,optical flow,and scene flow estimation [C]//Proceedings of the IEEE International Confe-rence on Computer Vision and Pattern Recognition.2016:4040-4048.
[7]WU Z,WU X,ZHANG X,et al.Semantic stereo matching with pyramid cost volumes [C]//Proceedings of the IEEE International Conference on Computer Vision.2019:7483-7492.
[8]SONG X,ZHAO X,HU H,et al.EdgeStereo:a context inte-grated residual pyramid network for stereo matching[J].arXiv:1803.05196,2018.
[9]ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.2016:6230-6239.
[10]HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEETran-sactions on Pattern Analysis & Machine Intelligence,2014,37(9):1904-1916.
[11]XU H,ZHANG J.AANet:adaptive aggregation network for efficient stereo matching[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.2020:1-11.
[12]GU X,FAN Z,DAI Z,et al.Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.2020:43-57.
[13]WU Z,SHI G,CHEN Y,et al.Coarse-to-fine classification for diabetic retinopathy grading using convolutional neural network[J].Artificial Intelligence in Medicine,2020,108(21):101936.
[14]CHANG J,CHEN Y.Pyramid stereo matching network[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.2018:5410-5418.
[15]DOVESI P L,POGGI M,ANDRAGHETTI L.Real-time se-mantic stereo matching[J].Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,2018:5410-5418.
[16]KANG J H,CHEN L,DENG F,et al.Context pyramidal network for stereo matching regularized by disparity gradients[J].ISPRS Journal of Photogrammetry and Remote Sensing,2019,157(5):201-215.
[17]GONG W,QIN L,REN GF,et al.Binocular stereo matching algorithm based on multidimensional feature fusion[J].Laser & Optoelectronics Progress,2020,57(6):1-8.
[18]CAO Y,ZHAO T,XIAN K,et al.Monocular depth estimation with augmented ordinal depth relationships[J].IEEE Transactions on Circults and Systems for Video Technology,2019,30(8):2674-2682.
[19]CHEN L C,PAPANDREOU G,KOKKINOS I.DeepLab:se-mantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(4):834-848.
[20]YANG M K,YU K,ZHANG C,et al.DenseASPP for semantic segmentation in street scenes[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.2018:3684-3692.
[21]RAMIREZ P Z,POHHI M,TOSI F,et al.Geometry meets semantics for semi-supervised monocular depth estimation[C]//Proceedings of Asian Conference on Computer Vision.2018:298-313.
[22]GEIGER A,LENZ P,URTASUN R.Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.2012:3354-3361.
[23]SCHARSTEIN D,H H Y K.High-resolution stereo datasetswith subpixel-accurate ground truth[C]//Proceedings of the 36th German Conference.2014:31-42.
[24]PASZKE A,GROSS S,MASSA F,et al.PyTorch:An Imperative Style,High-performance deep learning library[J].arXiv:1912.01703,2019.
[25]ZHANG F,PRISACARIU V,YANG R.GA-Net:guided aggregation net for end-to-end stereo matching[J].arXiv:1904.06587,2019.
[26]DU X,E1-KHAMY M V,LEE J.AMNet:deep atrous multiscale stereo disparity estimation networks[J].arXiv:1904.09099,2019.

相关文章 15

[1]	封雷, 朱登明, 李兆歆, 王兆其. 一种基于遮罩的稀疏点云滤波算法 Sparse Point Cloud Filtering Algorithm Based on Mask 计算机科学, 2022, 49(5): 25-32. https://doi.org/10.11896/jsjkx.210600129
[2]	曹林, 于威威. 基于图像分割的自适应窗口双目立体匹配算法研究 Adaptive Window Binocular Stereo Matching Algorithm Based on Image Segmentation 计算机科学, 2021, 48(11A): 314-318. https://doi.org/10.11896/jsjkx.201200264
[3]	桑苗苗, 彭进先, 达通航, 张旭峰. 基于PatchMatch的半全局高效双目立体匹配算法 Efficient Semi-global Binocular Stereo Matching Algorithm Based on PatchMatch 计算机科学, 2021, 48(1): 204-208. https://doi.org/10.11896/jsjkx.191000205
[4]	朱玲莹, 桑庆兵, 顾婷婷. 基于视差信息的无参考立体图像质量评价 No-reference Stereo Image Quality Assessment Based on Disparity Information 计算机科学, 2020, 47(9): 150-156. https://doi.org/10.11896/jsjkx.190700213
[5]	徐扬,王建成,刘启元,李寿山. 基于上下文信息的口语意图检测方法 Intention Detection in Spoken Language Based on Context Information 计算机科学, 2020, 47(1): 205-211. https://doi.org/10.11896/jsjkx.181202269
[6]	何晓军, 徐爱功, 李玉. 利用HSI空间相似性的彩色形态学图像处理方法 Color Morphology Image Processing Method Using Similarity in HSI Space 计算机科学, 2019, 46(4): 285-292. https://doi.org/10.11896/j.issn.1002-137X.2019.04.045
[7]	杜娟, 沈思昀. 基于改进多权值滑动窗口的立体匹配方法的实现及应用 Implementation and Application of Stereo Matching Method Based onImproved Multi-weight Sliding Window 计算机科学, 2019, 46(11A): 241-245.
[8]	李银国, 周中奎, 白羚. 基于双目图像的大尺度智能驾驶场景重建 Large-scale Automatic Driving Scene Reconstruction Based on Binocular Image 计算机科学, 2019, 46(11A): 251-254.
[9]	李广敬, 鲍泓, 徐成. 一种基于3D激光雷达的实时道路边缘提取算法 Real-time Road Edge Extraction Algorithm Based on 3D-Lidar 计算机科学, 2018, 45(9): 294-298. https://doi.org/10.11896／j.issn.1002-137X.2018.09.049
[10]	江泽涛, 王琦, 赵艳. 一种基于自适应支持权重优化的立体匹配算法 Stereo Matching Algorithm Based on Adaptive Support Weight Optimization 计算机科学, 2018, 45(8): 242-246. https://doi.org/10.11896/j.issn.1002-137X.2018.08.043
[11]	王日宏,崔兴梅,周炜,王成龙,李永珺. 改进的基于语义理解的文本情感分类方法研究 Research of Text Sentiment Classification Based on Improved Semantic Comprehension 计算机科学, 2017, 44(Z11): 92-97. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.018
[12]	张彦峰,黄向生,李杭,王梦伟. 基于渐进可靠点生长的散斑图快速立体匹配 Fast Stereo Matching Based on Progressive Reliable Point Growing Matching for Speckle Pattern Images 计算机科学, 2014, 41(Z6): 143-146.
[13]	张博闻,田小林,孙延奎. 基于改进的数学形态学的OCT图像快速边缘提取算法 Based on the Improved Mathematical Morphology OCT Image Quick Edge Detection Algorithm 计算机科学, 2013, 40(Z6): 173-175.
[14]	郑玲，张扮，林洁，付立辰. 基于自适应噪声闭值的EMD域多尺度边缘提取 Multi-scale Edge Detection in the EMD Domain Using the Adaptive Noise Threshold 计算机科学, 2012, 39(Z6): 552-554.
[15]	曾凡志，鲍苏苏. 一种自适应多窗口的立体匹配算法 Adaptive Multiple Windows Stereo Matching Algorithm 计算机科学, 2012, 39(Z6): 519-521.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于语义边缘驱动的实时双目深度估计算法

Real-time Binocular Depth Estimation Algorithm Based on Semantic Edge Drive

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0