计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 279-283.

• 模式识别与图像处理 • 上一篇    下一篇

基于上下文以及多尺度信息融合的目标检测算法

吕培建1, 陈佳鹏2, 袁飞1, 彭强2, 项煜3   

  1. 河南省高速公路联网监控收费通信服务有限公司 郑州4500001;
    西南交通大学信息科学与技术学院 成都6117562;
    长安大学公路学院 西安 7100643
  • 出版日期:2019-06-14 发布日期:2019-07-02
  • 通讯作者: 吕培建(1971-),男,高级工程师,主要研究方向为交通机电工程,E-mail:409676667@qq.com
  • 作者简介:陈佳鹏(1993-),男,硕士生,主要研究方向为计算机视觉;袁 飞(1974-),男,高级工程师,主要研究方向为公路机电工程;彭 强(1962-),博士,教授,CCF高级会员,主要研究方向为视频编码与传输、图像处理、虚拟现实和智能交通;项 煜(1987-),男,博士,工程师,主要研究方向为交通工程、交通信息化。

Object Detection Algorithm Based on Context and Multi-scale Information Fusion

LV Pei-jian1, CHEN Jia-peng2, YUAN Fei1, PENG Qiang2, XIANG Yu3   

  1. Henan Expressway Network Monitoring Charge Communication Service Company,Transportation Department of ;
    Henan Province,Zhengzhou 450000,China1;
    School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China2;
    School of Highway,Chang’an University,Xi'an 710064,China3
  • Online:2019-06-14 Published:2019-07-02

摘要: 卷积神经网络的快速发展极大地提升了目标检测的性能。针对SqueezeDet算法没有利用多尺度以及上下文信息的问题,文章结合跳过连接(skip connection)和快捷连接(shortcut connection)来汇聚多尺度特征图,利用膨胀卷积(dilated convolution)来扩大卷积感受野以及上下文信息,提出了一种基于上下文的多尺度目标检测模型,提升了整个网络对复杂场景下的目标检测的精度和鲁棒性。该模型融合3种不同分辨率的特征图:将最小以及中间尺寸的特征图通过不同采样率的膨胀卷积聚集上下文信息,然后通过双线性插值的方式将最小特征图的分辨率放大一倍,最大特征图经卷积层降采样之后获得与中间特征图相同的尺寸,与之进行融合,并且使用了快捷连接来连接不同尺寸的特征图,从较大特征图中获取丢失的信息。将该模型在自动驾驶国际公开基准测试数据集KITTI中进行了实验,与SqueezeDet相比,所提算法的准确率提升约5%,同时在GPU中的推断速度可达30fps。

关键词: 卷积神经网络, 快捷连接, 膨胀卷积, 跳过连接

Abstract: Recent advances in convolutional neural networks(CNNs) have led to significant improvement in object detection.To solve the problem of missing context and multi-scale information of SqueezeDet algorithm,this paper combines skip connection and shortcut connection to aggregate multi-scale feature maps,and use dilated convolution to expand the convolutional receptive field and context.A context-based multi-scale object detection model was proposed to effectively improve the accuracy and robustness of object detection for complex scenes.This model fuses three different resolution feature maps:the minimum and middle size feature maps gather context through dilated convolution,the minimum size feature maps are doubled through bilinear interpolation and the maximum size feature maps use convolution whose stride is 2 to down-sample.Then the three feature maps have the same size and can be fused.In addition,this paper uses shortcut connection to connect different size of feature maps to obtain lost information from the larger feature maps.The model is evaluated on the autopilot international benchmark dataset KITTI and achieves 6% improvement compare to the SqueezeDet.The speed of the model reach 30fps on a GPU.

Key words: Convolutional neural network, Dilated convolution, Shortcut connection, Skip connection

中图分类号: 

  • TP391
[1]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.IEEE,2012:1097-1105.
[2]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]∥Advances in Neural Information Processing Systems.2015:91-99.
[3]GIRSHICK R.Fast R-CNN[C]∥2015 IEEE International Conference on Computer Vision (ICCV).IEEE,2015:1440-1448.
[4]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:779-788.
[5]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shot multibox detector[C]∥European Conference on Computer Vision.Cham:Springer,2016:21-37.
[6]WU B,IANDOLA F,JIN P H,et al.SqueezeDet:Unified, Small,Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.IEEE,2017:129-137.
[7]KONG T,YAO A,CHEN Y,et al.Hypernet:Towards accurate region proposal generation and joint object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:845-853.
[8]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[9]YU F,KOLTUN V.Multi-scale context aggregation by dilated convolutions[J].arXiv:1511.07122,2015.
[10]GEIGER A,LENZ P,URTASUN R.Are we ready for autonomous driving? thekitti vision benchmark suite[C]∥2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2012:3354-3361.
[11]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeeze-Net:AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size[J].arXiv:1602.07360,2016.
[12]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017:6517-6525.
[13]XIANG Y,CHOI W,LIN Y,et al.Subcategory-aware convolutional neural networks for object proposals and detection[C]∥2017 IEEE Winter Conference on Applications of Computer Vision (WACV).IEEE,2017:924-933.
[14]CAI Z,FAN Q,FERIS R S,et al.A unified multi-scale deep convolutional neural network for fast object detection[C]∥European Conference on Computer Vision.Cham:Springer,2016:354-370.
[15]ASHRAF K,WU B,IANDOLA F N,et al.Moskewicz,Kurt Keutzer:Shallow Networks for High-accuracy Road Object-detection[C]∥VEHITS.2017:33-40.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[7] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[8] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[9] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[10] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[11] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[12] 吴子斌, 闫巧.
基于动量的映射式梯度下降算法
Projected Gradient Descent Algorithm with Momentum
计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
[13] 杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行.
基于步态分类辅助的虚拟IMU的行人导航方法
Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification
计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148
[14] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[15] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!