基于上下文以及多尺度信息融合的目标检测算法

计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 279-283.

基于上下文以及多尺度信息融合的目标检测算法

吕培建¹, 陈佳鹏², 袁飞¹, 彭强², 项煜³

河南省高速公路联网监控收费通信服务有限公司郑州450000¹;
西南交通大学信息科学与技术学院成都611756²;
长安大学公路学院西安 710064³

出版日期:2019-06-14 发布日期:2019-07-02
通讯作者: 吕培建(1971-),男,高级工程师,主要研究方向为交通机电工程,E-mail:409676667@qq.com
作者简介:陈佳鹏(1993-),男,硕士生,主要研究方向为计算机视觉;袁飞(1974-),男,高级工程师,主要研究方向为公路机电工程;彭强(1962-),博士,教授,CCF高级会员,主要研究方向为视频编码与传输、图像处理、虚拟现实和智能交通;项煜(1987-),男,博士,工程师,主要研究方向为交通工程、交通信息化。

Object Detection Algorithm Based on Context and Multi-scale Information Fusion

LV Pei-jian¹, CHEN Jia-peng², YUAN Fei¹, PENG Qiang², XIANG Yu³

Henan Expressway Network Monitoring Charge Communication Service Company,Transportation Department of ;
Henan Province,Zhengzhou 450000,China¹;
School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China²;
School of Highway,Chang’an University,Xi'an 710064,China³

Online:2019-06-14 Published:2019-07-02

摘要/Abstract

摘要： 卷积神经网络的快速发展极大地提升了目标检测的性能。针对SqueezeDet算法没有利用多尺度以及上下文信息的问题,文章结合跳过连接(skip connection)和快捷连接(shortcut connection)来汇聚多尺度特征图,利用膨胀卷积(dilated convolution)来扩大卷积感受野以及上下文信息,提出了一种基于上下文的多尺度目标检测模型,提升了整个网络对复杂场景下的目标检测的精度和鲁棒性。该模型融合3种不同分辨率的特征图:将最小以及中间尺寸的特征图通过不同采样率的膨胀卷积聚集上下文信息,然后通过双线性插值的方式将最小特征图的分辨率放大一倍,最大特征图经卷积层降采样之后获得与中间特征图相同的尺寸,与之进行融合,并且使用了快捷连接来连接不同尺寸的特征图,从较大特征图中获取丢失的信息。将该模型在自动驾驶国际公开基准测试数据集KITTI中进行了实验,与SqueezeDet相比,所提算法的准确率提升约5%,同时在GPU中的推断速度可达30fps。

关键词: 卷积神经网络, 快捷连接, 膨胀卷积, 跳过连接

Abstract: Recent advances in convolutional neural networks(CNNs) have led to significant improvement in object detection.To solve the problem of missing context and multi-scale information of SqueezeDet algorithm,this paper combines skip connection and shortcut connection to aggregate multi-scale feature maps,and use dilated convolution to expand the convolutional receptive field and context.A context-based multi-scale object detection model was proposed to effectively improve the accuracy and robustness of object detection for complex scenes.This model fuses three different resolution feature maps:the minimum and middle size feature maps gather context through dilated convolution,the minimum size feature maps are doubled through bilinear interpolation and the maximum size feature maps use convolution whose stride is 2 to down-sample.Then the three feature maps have the same size and can be fused.In addition,this paper uses shortcut connection to connect different size of feature maps to obtain lost information from the larger feature maps.The model is evaluated on the autopilot international benchmark dataset KITTI and achieves 6% improvement compare to the SqueezeDet.The speed of the model reach 30fps on a GPU.

Key words: Convolutional neural network, Dilated convolution, Shortcut connection, Skip connection

中图分类号:

TP391

吕培建, 陈佳鹏, 袁飞, 彭强, 项煜. 基于上下文以及多尺度信息融合的目标检测算法[J]. 计算机科学, 2019, 46(6A): 279-283. https://doi.org/

LV Pei-jian, CHEN Jia-peng, YUAN Fei, PENG Qiang, XIANG Yu. Object Detection Algorithm Based on Context and Multi-scale Information Fusion[J]. Computer Science, 2019, 46(6A): 279-283. https://doi.org/

参考文献

[1]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.IEEE,2012:1097-1105.
[2]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]∥Advances in Neural Information Processing Systems.2015:91-99.
[3]GIRSHICK R.Fast R-CNN[C]∥2015 IEEE International Conference on Computer Vision (ICCV).IEEE,2015:1440-1448.
[4]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:779-788.
[5]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shot multibox detector[C]∥European Conference on Computer Vision.Cham:Springer,2016:21-37.
[6]WU B,IANDOLA F,JIN P H,et al.SqueezeDet:Unified, Small,Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.IEEE,2017:129-137.
[7]KONG T,YAO A,CHEN Y,et al.Hypernet:Towards accurate region proposal generation and joint object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:845-853.
[8]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[9]YU F,KOLTUN V.Multi-scale context aggregation by dilated convolutions[J].arXiv:1511.07122,2015.
[10]GEIGER A,LENZ P,URTASUN R.Are we ready for autonomous driving? thekitti vision benchmark suite[C]∥2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2012:3354-3361.
[11]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeeze-Net:AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size[J].arXiv:1602.07360,2016.
[12]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017:6517-6525.
[13]XIANG Y,CHOI W,LIN Y,et al.Subcategory-aware convolutional neural networks for object proposals and detection[C]∥2017 IEEE Winter Conference on Applications of Computer Vision (WACV).IEEE,2017:924-933.
[14]CAI Z,FAN Q,FERIS R S,et al.A unified multi-scale deep convolutional neural network for fast object detection[C]∥European Conference on Computer Vision.Cham:Springer,2016:354-370.
[15]ASHRAF K,WU B,IANDOLA F N,et al.Moskewicz,Kurt Keutzer:Shallow Networks for High-accuracy Road Object-detection[C]∥VEHITS.2017:33-40.

相关文章 15

[1]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3]	陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[7]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[8]	戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[9]	刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[10]	徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[11]	孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[12]	吴子斌, 闫巧. 基于动量的映射式梯度下降算法 Projected Gradient Descent Algorithm with Momentum 计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
[13]	杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行. 基于步态分类辅助的虚拟IMU的行人导航方法 Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification 计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148
[14]	杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[15]	杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed