计算机科学 ›› 2018, Vol. 45 ›› Issue (5): 232-237.doi: 10.11896/j.issn.1002-137X.2018.05.040

• 图形图像与模式识别 • 上一篇    下一篇

双通道Faster R-CNN在RGB-D手部检测中的应用

刘壮,柴秀娟,陈熙霖   

  1. 中国科学院上海微系统与信息技术研究所 上海200050;中国科学院计算技术研究所智能信息处理重点实验室 北京100190;上海科技大学信息科学与技术学院 上海201210;中国科学院大学 北京100049,中国科学院计算技术研究所智能信息处理重点实验室 北京100190;中国科学院大学 北京100049,中国科学院计算技术研究所智能信息处理重点实验室 北京100190;上海科技大学信息科学与技术学院 上海201210;中国科学院大学 北京100049
  • 出版日期:2018-05-15 发布日期:2018-07-25
  • 基金资助:
    本文受大规模数据集3D手语识别的研究(61472398)资助

Application of Two-stream Faster R-CNN in RGB-D Hand Detection

LIU Zhuang, CHAI Xiu-juan and CHEN Xi-lin   

  • Online:2018-05-15 Published:2018-07-25

摘要: 在人机交互、手语识别等大量与人手有关的视觉任务中,手部检测是极为重要的一个预处理阶段。随着RGB-D数据采集设备的发展,额外提供的深度数据能够与传统使用的彩色数据互相补充以提供更强的特征表达。此外,传统的检测方法由于使用肤色、HOG等手工设计的特征,不能对手部进行很好的表达。而基于深度学习的检测方法通过从数据中自动学习有效的特征避免了这个问题。为了结合RGB-D数据和深度学习技术的优点,提出了一种融合彩色和深度数据的双通道 Faster R-CNN检测框架。该方法在原有Faster R-CNN检测框架的基础上,增加了Depth通道信息,并在特征层面上将其与RGB通道信息进行融合。实验结果表明,所提方法在性能上比仅采用RGB或在数据层面上融合的Faster R-CNN框架有明显优势。因此,该方法能有效融合来自彩色和深度通道的数据,以提升手部检测性能。

关键词: 手部检测,深度数据,深度学习,双通道Faster R-CNN

Abstract: In most vision tasks related to human hands,such as human computer interaction and sign language recognition,hand detection is a distinctly important preprocessing phase.With the development of RGB-D data acquisition equipment,the extra depth data can complement the color data effectively,so they can provide more powerful feature representation.The traditional detection methods based on hand-crafted features(skin color or HOG) cannot form a well hand representation.While a lot of detection methods based on deep learning can avoid such weakness by learning effective features from data.To combine the advantages of RGB-D data and deep learning,a two-stream Faster R-CNN detection framework was proposed in this paper.The proposed method adds an extra depth stream information,and combines it with RGB stream information in the feature level.The experiment results show that the proposed method can achieve a higher detection precision than the Faster R-CNN framework which uses RGB or fuses the RGB and Depth in the data level.Thus,the proposed method can fuse the color and depth data effectively,and improve the performance of hand detection.

Key words: Hand detection,Depth data,Deep learning,Two-stream Faster R-CNN

[1] KAKUMANU P,MAKROGIANNIS S,BOURBAKIS N.Asurvey of skin-color modeling and detection methods[J].Pattern recognition,2007,40(3):1106-1122.
[2] DAWOD A Y,ABDULLAH J,ALAM M J.Adaptive skin color model for hand segmentation[C]∥2010 International Confe-rence on Computer Applications and Industrial Electronics(ICCAIE).IEEE,2010:486-489.
[3] KLSCH M,TURK M.Robust Hand Detection[C]∥FGR.2004:614-619.
[4] SHOTTON J,BLAKE A,CIPOLLA R.Contour-based learning for object detection[C]∥Tenth IEEE International Conference on Computer Vision,2005(ICCV 2005).IEEE,2005:503-510.
[5] ONG E J,BOWDEN R.A boosted classifier tree for hand shape detection[C]∥Sixth IEEE International Conference on Automatic Face and Gesture Recognition,2004.IEEE,2004:889-894.
[6] SHEIKH Y,JAVED O,KANADE T.Background subtraction for freely moving cameras[C]∥2009 IEEE 12th International Conference on Computer Vision.IEEE,2009:1219-1225.
[7] FELZENSZWALB P F,GIRSHICK R B,MCALLESTER D,et al.Object detection with discriminatively trained part-based models[J].IEEE Transactions on Pattern Nnalysis and Machine Intelligence,2010,32(9):1627-1645.
[8] MITTAL A,ZISSERMAN A,TORR P H S.Hand detectionusing multiple proposals[C]∥Proceedings of British Machine Vision Conference.2011:1-11.
[9] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587.
[10] HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[C]∥European Conference on Computer Vision.Springer International Publishing,2014:346-361.
[11] GIRSHICK R.Fast r-cnn[C]∥IEEE International Conference on Computer Vision.2015:1440-1448.
[12] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towardsreal-time object detection with region proposal networks[C]∥Advances in Neural Information Processing Systems.2015:91-99.
[13] DAI J,LI Y,HE K,et al.R-FCN:Object Detection via Region-based Fully Convolutional Networks[J].arXiv preprint arXiv:1605.06409,2016.
[14] ZHANG L,LIN L,LIANG X,et al.Is Faster R-CNN DoingWell for Pedestrian Detection?[C]∥European Conference on Computer Vision.Springer International Publishing,2016:443-457.
[15] UIJLINGS J R R,VAN DE SANDE K E A,GEVERS T,et al.Selective search for obje湣晴攠牲敥湣捯敧?潩湴??潮浛灊畝琮敉牮?噥楲獮楡潴湩?慮湡摬?偊慯瑵瑲敮牡湬?副敦挠潃杯湭楰瑵楴潥湲?坖潩牳歩獯桮漬瀲猰?社???????????戭爱????孢??崊??噛?制?丠???????娠?匠卌?剄??乌???坐??????卢???????散瑡?慩汮?吠桯敢?健?却????癰楯猭畳慡汬?漠扦橲敯捭琠?捤汧慥獳獛敃獝?噅併???捥桡慮氠汃敯湮杦敥孲?嵮??渠瑯敮爠湃慯瑭楰潵湴慥汲??潩畳物湯慮氮?潰晲??潧浥灲甠瑉敮牴?噲楮獡楴潩湯??ぬㄠぐ???????の???????戳爹????戵爮?br> [17] NEUBECK A,VAN GOOL L.Efficient non-maximum suppression[C]∥18th International Conference on Pattern Recognition,2006(ICPR 2006).IEEE,2006,3:850-855.
[18] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]∥IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[19] LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single shot multibox detector[C]∥European Conference on Computer Vision.Springer International Publishing,2016:21-37.
[20] REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[J].arXiv preprint arXiv:1612.08242,2016.
[21] SHARIF RAZAVIAN A,AZIZPOUR H,SULLIVAN J,et al.CNN features off-the-shelf:an astounding baseline for recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition Workshops.2014:806-813.
[22] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[23] BRADSKI G,KAEHLER A.Learning OpenCV:Computer vision with the OpenCV library[M].Sebastopol:O’Reilly Media,Inc.,2008.
[24] JIA Y,SHELHAMER E,DONAHUE J,et al.Caffe:Convolutional architecture for fast feature embedding[C]∥22nd ACM International Conference on Multimedia.ACM,2014:675-678.
[25] ZEILER M D,FERGUS R.Visualizing and understanding convolutional networks[C]∥European Conference on Computer Vision.Springer International Publishing,2014:818-833.
[26] DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]∥ IEEE Conference on Computer Vision and Pattern Recognition,2009(CVPR 2009).IEEE,2009:248-255.
[27] WAN J,ZHAO Y,ZHOU S,et al.Chalearn looking at people rgb-d isolated and continuous datasets for gesture recognition[C]∥Proceedings of the IEEE Co
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!