Computer Science ›› 2014, Vol. 41 ›› Issue (3): 263-266.

Previous Articles     Next Articles

New Feature Extraction Method Based on Bottleneck Deep Belief Networks and its Application in Language Recognition

LI Jin-hui,YANG Jun-an and WANG Yi   

  • Online:2018-11-14 Published:2018-11-14

Abstract: In language recognition,due to the insufficiency of information in each frame,traditional MFCC feature extraction is easily suffered from noise pollution.Meanwhile,the general method of SDC feature extraction depends on artificially setting in parameter selection which increases the uncertainty of recognition performance.In order to overcome these drawbacks,the deep learning method was introduced and a novel feature extraction approach named BN-DBN which is based on deep learning was proposed.Finally, the relevant comparative experiments for the bottleneck layer size,the number of hidden layers and the position of the bottleneck layer were carried out in NIST2007database.Experimental results show that extraction method of the bottleneck features based on deep belief networks are more effective in language recognition,compared with traditional methods.

Key words: Language recognition,Bottleneck features,Deep belief networks

[1] Rabiner L R,Sambur M R.An algorithm for determining theendpoints of isolated utterances [J].The Bell System Technical Journal,1975,54(2):297-315
[2] Reynolds D A,Quatieri T F,Dunn R B.Speaker verification using adapted Gaussian mixture models[C]∥Digital Signal Processing.2000:19-41
[3] Campbell W M,Sturim D E,Reynolds D A.Support vector machines using GMM supervectors for speaker verification[J].IEEE Signal Processing Letters,2006,13:308-11
[4] Bilmes J A.Maximum mutual information based reduction strategies for cross-correlation based joint distribution modeling[C]∥IEEE Int.Conf.Acoust.,Speech,Signal Processing (ICASSP).Seattle,USA,May 1998
[5] Yang H H,Sharma S,van Vuuren S, et al.Relevance of time-frequency features for phonetic and speaker-channel classification[J].Speech Communication,2000,31(1):35-50
[6] 付强.基于高斯混合模型的语种识別的研究 [D].合肥:中国科学技术大学,2009
[7] Fousek P,Lamel L,Gauvain J-L.Transcribing Broadcast Data using MLP Features[C]∥Proceedings of Interspeech.2008
[8] Park J,Diehl F,Gales M,et al.Training and Adapting MLPFeatures for Arabic Speech Recognition[C]∥Proc.of IEEE Conf.Acoust.Speech Signal Process(ICASSP).2009
[9] Picheny M,Nahamoo D,Goel V,et al.Trends and Advances in Speech Recognition[J].IBM Journal of Research and Development,2011,55(5):2
[10] Deng L.An Overview of Deep-Structured Learning for Information Processing[C]∥APSIPA ASC 2011.Xi’an:2011
[11] Hinton G E,Osindero S,Teh Y.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2006,18:1527-1554
[12] Hinton G E,Salakhutdinov R.Reducing the Dimensionality ofData with Neural Networks[J].Science,Recognition,Ph.D.thesis,OGI,Portland,USA,2006,313(5786):504-507
[13] Grézl F,Karaflat M,Kontar S,et al.Probabilistic and bottle-neck features for LVCSR of meetings[C]∥Proc.IEEE Int.Conf.on Acoustics,Speech,and Signal Processing.Honolulu,HI,USA,2007:757-760
[14] Hinton G E,Osindero S,Teh Y.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2006,18:1527-1554
[15] Hinton G E,Salakhutdinov R.Reducing the Dimensionality ofData with Neural Networks[J].Science,2006,313(5786):504-507
[16] Pinto J,Sivaram G S V S,Doss M M,et al.Analysis of MLP Based Hierarchical Phoneme Posterior Probability Estimator[C]∥IEEE Transcations on Audio,Speech,and Language Proces-sing.2010
[17] Grezl F,Karafiat M,Kontar S,et al.Probabilistic and Bottleneck Features for LVCSR of Meetings[C]∥Proc.of IEEE Conf.Acoust.Speech Signal Process(ICASSP).2007:757-760
[18] The 2007NIST Language Recognition Evaluation Plan.http://www.itl.nist.gov/iad/mig//tests/lre/2007/LRE07E-valPlan-v8b.pdf
[19] 李思一,戴蓓蒨,王海祥.基于子带 GMM-UBM的广播语音多语种识别[J].数据采集与处理,2007,22(1):14-18

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!