Computer Science ›› 2018, Vol. 45 ›› Issue (5): 232-237.doi: 10.11896/j.issn.1002-137X.2018.05.040

Previous Articles     Next Articles

Application of Two-stream Faster R-CNN in RGB-D Hand Detection

LIU Zhuang, CHAI Xiu-juan and CHEN Xi-lin   

  • Online:2018-05-15 Published:2018-07-25

Abstract: In most vision tasks related to human hands,such as human computer interaction and sign language recognition,hand detection is a distinctly important preprocessing phase.With the development of RGB-D data acquisition equipment,the extra depth data can complement the color data effectively,so they can provide more powerful feature representation.The traditional detection methods based on hand-crafted features(skin color or HOG) cannot form a well hand representation.While a lot of detection methods based on deep learning can avoid such weakness by learning effective features from data.To combine the advantages of RGB-D data and deep learning,a two-stream Faster R-CNN detection framework was proposed in this paper.The proposed method adds an extra depth stream information,and combines it with RGB stream information in the feature level.The experiment results show that the proposed method can achieve a higher detection precision than the Faster R-CNN framework which uses RGB or fuses the RGB and Depth in the data level.Thus,the proposed method can fuse the color and depth data effectively,and improve the performance of hand detection.

Key words: Hand detection,Depth data,Deep learning,Two-stream Faster R-CNN

[1] KAKUMANU P,MAKROGIANNIS S,BOURBAKIS N.Asurvey of skin-color modeling and detection methods[J].Pattern recognition,2007,40(3):1106-1122.
[2] DAWOD A Y,ABDULLAH J,ALAM M J.Adaptive skin color model for hand segmentation[C]∥2010 International Confe-rence on Computer Applications and Industrial Electronics(ICCAIE).IEEE,2010:486-489.
[3] KLSCH M,TURK M.Robust Hand Detection[C]∥FGR.2004:614-619.
[4] SHOTTON J,BLAKE A,CIPOLLA R.Contour-based learning for object detection[C]∥Tenth IEEE International Conference on Computer Vision,2005(ICCV 2005).IEEE,2005:503-510.
[5] ONG E J,BOWDEN R.A boosted classifier tree for hand shape detection[C]∥Sixth IEEE International Conference on Automatic Face and Gesture Recognition,2004.IEEE,2004:889-894.
[6] SHEIKH Y,JAVED O,KANADE T.Background subtraction for freely moving cameras[C]∥2009 IEEE 12th International Conference on Computer Vision.IEEE,2009:1219-1225.
[7] FELZENSZWALB P F,GIRSHICK R B,MCALLESTER D,et al.Object detection with discriminatively trained part-based models[J].IEEE Transactions on Pattern Nnalysis and Machine Intelligence,2010,32(9):1627-1645.
[8] MITTAL A,ZISSERMAN A,TORR P H S.Hand detectionusing multiple proposals[C]∥Proceedings of British Machine Vision Conference.2011:1-11.
[9] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587.
[10] HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[C]∥European Conference on Computer Vision.Springer International Publishing,2014:346-361.
[11] GIRSHICK R.Fast r-cnn[C]∥IEEE International Conference on Computer Vision.2015:1440-1448.
[12] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towardsreal-time object detection with region proposal networks[C]∥Advances in Neural Information Processing Systems.2015:91-99.
[13] DAI J,LI Y,HE K,et al.R-FCN:Object Detection via Region-based Fully Convolutional Networks[J].arXiv preprint arXiv:1605.06409,2016.
[14] ZHANG L,LIN L,LIANG X,et al.Is Faster R-CNN DoingWell for Pedestrian Detection?[C]∥European Conference on Computer Vision.Springer International Publishing,2016:443-457.
[15] UIJLINGS J R R,VAN DE SANDE K E A,GEVERS T,et al.Selective search for obje湣晴攠牲敥湣捯敧?潩湴??潮浛灊畝琮敉牮?噥楲獮楡潴湩?慮湡摬?偊慯瑵瑲敮牡湬?副敦挠潃杯湭楰瑵楴潥湲?坖潩牳歩獯桮漬瀲猰?社???????????戭爱????孢??崊??噛?制?丠???????娠?匠卌?剄??乌???坐??????卢???????散瑡?慩汮?吠桯敢?健?却????癰楯猭畳慡汬?漠扦橲敯捭琠?捤汧慥獳獛敃獝?噅併???捥桡慮氠汃敯湮杦敥孲?嵮??渠瑯敮爠湃慯瑭楰潵湴慥汲??潩畳物湯慮氮?潰晲??潧浥灲甠瑉敮牴?噲楮獡楴潩湯??ぬㄠぐ???????の???????戳爹????戵爮?br> [17] NEUBECK A,VAN GOOL L.Efficient non-maximum suppression[C]∥18th International Conference on Pattern Recognition,2006(ICPR 2006).IEEE,2006,3:850-855.
[18] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]∥IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[19] LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single shot multibox detector[C]∥European Conference on Computer Vision.Springer International Publishing,2016:21-37.
[20] REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[J].arXiv preprint arXiv:1612.08242,2016.
[21] SHARIF RAZAVIAN A,AZIZPOUR H,SULLIVAN J,et al.CNN features off-the-shelf:an astounding baseline for recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition Workshops.2014:806-813.
[22] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[23] BRADSKI G,KAEHLER A.Learning OpenCV:Computer vision with the OpenCV library[M].Sebastopol:O’Reilly Media,Inc.,2008.
[24] JIA Y,SHELHAMER E,DONAHUE J,et al.Caffe:Convolutional architecture for fast feature embedding[C]∥22nd ACM International Conference on Multimedia.ACM,2014:675-678.
[25] ZEILER M D,FERGUS R.Visualizing and understanding convolutional networks[C]∥European Conference on Computer Vision.Springer International Publishing,2014:818-833.
[26] DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]∥ IEEE Conference on Computer Vision and Pattern Recognition,2009(CVPR 2009).IEEE,2009:248-255.
[27] WAN J,ZHAO Y,ZHOU S,et al.Chalearn looking at people rgb-d isolated and continuous datasets for gesture recognition[C]∥Proceedings of the IEEE Co
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!