计算机科学 ›› 2022, Vol. 49 ›› Issue (11): 126-133.doi: 10.11896/jsjkx.220500193

• 计算机图形学&多媒体 • 上一篇    下一篇

变分推断域适配驱动的城市街景语义分割

金玉杰1,2, 初旭1,3, 王亚沙1,4, 赵俊峰1,2   

  1. 1 高可信软件技术教育部重点实验室(北京大学) 北京 100871
    2 北京大学计算机学院 北京 100871
    3 清华大学计算机系 北京 100084
    4 北京大学软件工程国家工程研究中心 北京 100871
  • 收稿日期:2022-05-20 修回日期:2022-07-22 出版日期:2022-11-15 发布日期:2022-11-03
  • 通讯作者: 王亚沙(wangyasha@pku.edu.cn)
  • 作者简介:(jyj17pku@pku.edu.cn)
  • 基金资助:
    国家自然科学基金(62172011)

Variational Domain Adaptation Driven Semantic Segmentation of Urban Scenes

JIN Yu-jie1,2, CHU Xu1,3, WANG Ya-sha1,4, ZHAO Jun-feng1,2   

  1. 1 Key Lab of High Confidence Software Technologies(Peking University),Ministry of Education,Beijing 100871,China
    2 School of Computer Science and Technology,Peking University,Beijing 100871,China
    3 Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
    4 National Engineering Research Center for Software Engineering,Peking University,Beijing 100871,China
  • Received:2022-05-20 Revised:2022-07-22 Online:2022-11-15 Published:2022-11-03
  • About author:JIN Yu-jie,born in 1999,postgraduate.His main research interests include machine learning and data mining.
    WANG Ya-sha,born in 1975,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include smart city and big data analysis.
  • Supported by:
    National Natural Science Foundation of China(62172011).

摘要: 街景语义分割技术旨在从图像中识别分割出行人、障碍物、道路、标志物等要素,为车辆提供道路上自由空间的信息,是自动驾驶的关键技术之一。高性能的语义分割系统非常依赖于训练时所需的大量真实标注数据,然而为图像中的每个像素进行标注成本很高,往往难以实现。一种低成本获取标注数据的方法是利用视频游戏收集逼真且标注成本低的合成图片,来帮助机器学习模型对现实世界中的图片作语义分割,这对应域适配技术。与当前基于VC维理论或Rademacher复杂度理论的主流语义分割域适配方法不同,受基于PAC-Bayes理论的兼容伪标签函数的域适配目标域Gibbs风险上界启发,考虑假设空间的平均情况而非最差情况,以避免主流方法过度约束隐空间上的领域差异,从而导致目标域泛化误差上界未能被有效估计并优化的问题。在上述思想的指导下,提出了一种变分推断语义分割域适配方法(VISA),该方法在利用Dropout变分族进行变分推断求解假设空间上的理想后验分布的同时能快速得到一个近似Bayes分类器,并通过目标域熵最小化和筛选像素点使得对风险上界的估计更加准确。在街景语义分割数据集GTA5→Cityscapes上的适配的实验结果表明,VISA方法相比基线方法平均交并比提高了0.5%~6.6%,且在行人、车辆等关键街景要素上具有较高的识别准确率。

关键词: 语义分割, 域适配, PAC-Bayes理论, 变分推断, 深度神经网络

Abstract: Semantic segmentation of urban scenes aims to identify and segment persons,obstacles,roads,signs and other elements from the image,and provide information of free space on the road for vehicles.It is one of the key technologies of automatic dri-ving.High performance semantic segmentation systems rely heavily on a large number of real annotation data required for trai-ning.However,labeling each pixel in the image is costly and often difficult to achieve.One way is to collect photo-realistic synthe-tic data from video games,where pixel-level annotation can be automatically generated at a low cost,to train the machine learning model to segment the images in the real world,which corresponds to domain adaptation.Different from the current mainstream semantic segmentation domain adaptation methods based on Vapnik-Chervonenkis dimension theory or Rademacher complexity theory,our method is inspired by the target domain Gibbs risk upper bound compatible with pseudo labels based on PAC-Bayes theory,and considers the average situation of the hypothetical space rather than the worst situation,so as to avoid excessively constraining the domain discrepancy in the latent space which leads to the problem that the upper bound of target domain genera-lization error cannot be estimated and optimized effectively.Under the guidance of the above ideas,this paper proposes a varia-tional inference method for semantic segmentation adaptation(VISA).The dropout variational family is used for variational infe-rence.While solving the ideal posterior distribution in the hypothesis space,an approximate Bayes classifier can be quickly obtained,and the estimation of the upper bound of risk is more accurate by minimizing the entropy of the target domain and filtering pixels.Experiments show that the mean intersection over the union(mIoU) of VISA is 0.5% ~ 6.6% higher than that of baseline methods,and has high accuracy in pedestrian,vehicle and other urban scene elements.

Key words: Semantic segmentation, Domain adaptation, PAC-Bayes theory, Variational inference, Deep neural network

中图分类号: 

  • TP181
[1]ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[2]YU F,KOLTUNV,FUNKHOUSER T.Dilated residual net-works[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:472-480.
[3]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[4]RICHTER S R,VINEET V,ROTH S,et al.Playing for data:Ground truth from computer games[C]//European Conference on ComputerVision.Cham:Springer,2016:102-118.
[5]HOFFMAN J,WANG D,YU F,et al.Fcns in the wild:Pixel-level adversarial and constraint-based adaptation[C]//CoRR.2016.
[6]ZHANG Y,DAVID P,GONG B.Curriculum domain adaptation for semantic segmentation of urban scenes[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2020-2030.
[7]WANG M,DENG W.Deep visual domain adaptation:A survey[J].Neurocomputing,2018,312:135-153.
[8]TSAI Y H,HUNG W C,SCHULTER S,et al.Learning toadapt structured output space for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7472-7481.
[9]VU T H,JAIN H,BUCHER M,et al.Advent:Adversarial entropy minimization for domain adaptation in semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:2517-2526.
[10]BEN-DAVID S,BLITZER J,CRAMMER K,et al.Analysis of representations for domain adaptation[C]//Advances in Neural Information Processing Systems 19:Proceedings of the 2006 Conference.MIT Press,2006.
[11]MANSOUR Y,MOHRI M,ROSTAMIZADEH A.Domainadaptation:Learning bounds and algorithms[J].arXiv:0902.3430,2009.
[12]LIU H,LONG M,WANG J,et al.Transferable adversarialtraining:A general approach to adapting deep classifiers[C]//International Conference on Machine Learning.PMLR,2019:4013-4022.
[13]JIN C,NETRAPALLI P,JORDAN M.What is local optimality in nonconvex-nonconcave minimax optimization?[C]//International Conference on Machine Learning.PMLR,2020:4880-4889.
[14]CHU X.Feature Map Sharing towards High-dimensional Un-der-Labeled Data Analysis[D].Beijing:Peking University,2021.
[15]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition.2015:3431-3440.
[16]LONG M,CAO Y,WANG J,et al.Learning transferable fea-tures with deep adaptation networks[C]//International Confe-rence on Machine Learning.PMLR,2015:97-105.
[17]GANIN Y,USTINOVA E,AJAKAN H,et al.Domain-adversa-rial training of neural networks[J].TheJournal of Machine Learning Research,2016,17(1):2096-2030.
[18]JUDY H,ERIC T,TAESUNG P,et al.Cycada:Cycle-consistent adversarial domain adaptation[C]//Proceedings of the 35th International Conference on Machine Learning.2018.
[19]GONG R,LI W,CHEN Y,et al.Dlow:Domain flow for adaptation and generalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:2477-2486.
[20]CHEN Y,LI W,VAN GOOL L.Road:Reality oriented adaptation for semantic segmentation of urban scenes[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7892-7901.
[21]SUN B,SAENKO K.From Virtual to Reality:Fast Adaptation of Virtual Object Detectors to Real Domains[C]//BMVC.2014.
[22]VAZQUEZ D,LOPEZ A M,MARIN J,et al.Virtual and real world adaptation for pedestrian detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,36(4):797-809.
[23]GERMAIN P,HABRARD A,LAVIOLETTE F,et al.A PAC-Bayesian approach for domain adaptation with specialization to linear classifiers[C]//International Conference on Machine Learning.PMLR,2013:738-746.
[24]GERMAIN P,HABRARD A,LAVIOLETTE F,et al.PAC-Bayes and domain adaptation[J].Neurocomputing,2020,379:379-397.
[25]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal lossfor dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[26]CHEN M,XUE H,CAI D.Domain adaptation for semantic segmentation with maximum squares loss[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2090-2099.
[27]CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3213-3223.
[28]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[29]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[30]LUO Y,LIU P,GUAN T,et al.Significance-aware information bottleneck for domain adaptive semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6778-6787.
[1] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[2] 胡伏原, 万新军, 沈鸣飞, 徐江浪, 姚睿, 陶重犇.
深度卷积神经网络图像实例分割方法研究进展
Survey Progress on Image Instance Segmentation Methods of Deep Convolutional Neural Network
计算机科学, 2022, 49(5): 10-24. https://doi.org/10.11896/jsjkx.210200038
[3] 焦翔, 魏祥麟, 薛羽, 王超, 段强.
基于深度学习的自动调制识别研究
Automatic Modulation Recognition Based on Deep Learning
计算机科学, 2022, 49(5): 266-278. https://doi.org/10.11896/jsjkx.211000085
[4] 高捷, 刘沙, 黄则强, 郑天宇, 刘鑫, 漆锋滨.
基于国产众核处理器的深度神经网络算子加速库优化
Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor
计算机科学, 2022, 49(5): 355-362. https://doi.org/10.11896/jsjkx.210500226
[5] 程恺, 刘满, 王之腾, 毛绍臣, 申秋慧, 张宏军.
基于全局属性注意力神经过程模型的数据补全研究
Study on Data Filling Based on Global-attributes Attention Neural Process Model
计算机科学, 2022, 49(10): 111-117. https://doi.org/10.11896/jsjkx.210800038
[6] 范红杰, 李雪冬, 叶松涛.
面向电子病历语义解析的疾病辅助诊断方法
Aided Disease Diagnosis Method for EMR Semantic Analysis
计算机科学, 2022, 49(1): 153-158. https://doi.org/10.11896/jsjkx.201100125
[7] 王施云, 杨帆.
基于U-Net特征融合优化策略的遥感影像语义分割方法
Remote Sensing Image Semantic Segmentation Method Based on U-Net Feature Fusion Optimization Strategy
计算机科学, 2021, 48(8): 162-168. https://doi.org/10.11896/jsjkx.200700182
[8] 周欣, 刘硕迪, 潘薇, 陈媛媛.
自然交通场景中的车辆颜色识别
Vehicle Color Recognition in Natural Traffic Scene
计算机科学, 2021, 48(6A): 15-20. https://doi.org/10.11896/jsjkx.200800078
[9] 刘东, 王叶斐, 林建平, 马海川, 杨闰宇.
端到端优化的图像压缩技术进展
Advances in End-to-End Optimized Image Compression Technologies
计算机科学, 2021, 48(3): 1-8. https://doi.org/10.11896/jsjkx.201100134
[10] 詹瑞, 雷印杰, 陈训敏, 叶书函.
基于多重差异特征网络的街景变化检测
Street Scene Change Detection Based on Multiple Difference Features Network
计算机科学, 2021, 48(2): 142-147. https://doi.org/10.11896/jsjkx.200500158
[11] 潘雨, 邹军华, 王帅辉, 胡谷雨, 潘志松.
基于网络表示学习的深度社团发现方法
Deep Community Detection Algorithm Based on Network Representation Learning
计算机科学, 2021, 48(11A): 198-203. https://doi.org/10.11896/jsjkx.210200113
[12] 王鑫, 张昊宇, 凌诚.
基于U-Net优化的SAR遥感图像语义分割
Semantic Segmentation of SAR Remote Sensing Image Based on U-Net Optimization
计算机科学, 2021, 48(11A): 376-381. https://doi.org/10.11896/jsjkx.210300260
[13] 朱戎, 叶宽, 杨博, 谢欢, 赵蕾.
基于改进DeeplabV3+的地物分类方法研究
Feature Classification Method Based on Improved DeeplabV3+
计算机科学, 2021, 48(11A): 382-385. https://doi.org/10.11896/jsjkx.201100184
[14] 马琳, 王云霄, 赵丽娜, 韩兴旺, 倪金超, 张婕.
基于多模型判别的网络入侵检测系统
Network Intrusion Detection System Based on Multi-model Ensemble
计算机科学, 2021, 48(11A): 592-596. https://doi.org/10.11896/jsjkx.201100170
[15] 刘天星, 李伟, 许铮, 张立华, 戚骁亚, 甘中学.
面向高维连续行动空间的蒙特卡罗树搜索算法
Monte Carlo Tree Search for High-dimensional Continuous Control Space
计算机科学, 2021, 48(10): 30-36. https://doi.org/10.11896/jsjkx.201000129
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!