一种新的基于瓶颈深度信念网络的特征提取方法及其在语种识别中的应用

计算机科学 ›› 2014, Vol. 41 ›› Issue (3): 263-266.

一种新的基于瓶颈深度信念网络的特征提取方法及其在语种识别中的应用

李晋徽,杨俊安,王一

电子工程学院合肥230037 电子制约技术安徽省重点实验室合肥230037;电子工程学院合肥230037 电子制约技术安徽省重点实验室合肥230037;电子工程学院合肥230037 电子制约技术安徽省重点实验室合肥230037

出版日期:2018-11-14 发布日期:2018-11-14
基金资助:
本文受国家自然科学基金项目(61272333)资助

New Feature Extraction Method Based on Bottleneck Deep Belief Networks and its Application in Language Recognition

LI Jin-hui,YANG Jun-an and WANG Yi

Online:2018-11-14 Published:2018-11-14

摘要/Abstract

摘要： 在语种识别中,传统的MFCC特征由于每帧信号上的信息量不足,很容易受到噪声污染,且抗噪能力较弱。同时,目前普遍使用的SDC特征提取方法在参数选择上需要人为设定,这增加了识别结果的不确定性。针对上述不足,将深度学习方法引入特征提取之中,提出了基于瓶颈深度信念网络的特征提取方法。最后在NIST2007数据库上对瓶颈层的大小、隐层数目以及瓶颈层位置进行了相关的对比实验,结果表明,提出的方法相对于传统的特征提取方法能够取得更高的识别率。

关键词: 语种识别,瓶颈特征,深度信念网络中图法分类号TM344.1文献标识码A

Abstract: In language recognition,due to the insufficiency of information in each frame,traditional MFCC feature extraction is easily suffered from noise pollution．Meanwhile,the general method of SDC feature extraction depends on artificially setting in parameter selection which increases the uncertainty of recognition performance．In order to overcome these drawbacks,the deep learning method was introduced and a novel feature extraction approach named BN-DBN which is based on deep learning was proposed．Finally, the relevant comparative experiments for the bottleneck layer size,the number of hidden layers and the position of the bottleneck layer were carried out in NIST2007database．Experimental results show that extraction method of the bottleneck features based on deep belief networks are more effective in language recognition,compared with traditional methods.

Key words: Language recognition,Bottleneck features,Deep belief networks

李晋徽,杨俊安,王一. 一种新的基于瓶颈深度信念网络的特征提取方法及其在语种识别中的应用[J]. 计算机科学, 2014, 41(3): 263-266. https://doi.org/

LI Jin-hui,YANG Jun-an and WANG Yi. New Feature Extraction Method Based on Bottleneck Deep Belief Networks and its Application in Language Recognition[J]. Computer Science, 2014, 41(3): 263-266. https://doi.org/

参考文献

[1] Rabiner L R,Sambur M R．An algorithm for determining theendpoints of isolated utterances [J]．The Bell System Technical Journal,1975,54(2):297-315
[2] Reynolds D A,Quatieri T F,Dunn R B．Speaker verification using adapted Gaussian mixture models[C]∥Digital Signal Processing．2000:19-41
[3] Campbell W M,Sturim D E,Reynolds D A．Support vector machines using GMM supervectors for speaker verification[J]．IEEE Signal Processing Letters,2006,13:308-11
[4] Bilmes J A．Maximum mutual information based reduction strategies for cross-correlation based joint distribution modeling[C]∥IEEE Int．Conf．Acoust.,Speech,Signal Processing (ICASSP)．Seattle,USA,May 1998
[5] Yang H H,Sharma S,van Vuuren S, et al.Relevance of time-frequency features for phonetic and speaker-channel classification[J]．Speech Communication,2000,31(1):35-50
[6] 付强．基于高斯混合模型的语种识別的研究 [D]．合肥:中国科学技术大学,2009
[7] Fousek P,Lamel L,Gauvain J-L．Transcribing Broadcast Data using MLP Features[C]∥Proceedings of Interspeech．2008
[8] Park J,Diehl F,Gales M,et al.Training and Adapting MLPFeatures for Arabic Speech Recognition[C]∥Proc．of IEEE Conf．Acoust．Speech Signal Process(ICASSP)．2009
[9] Picheny M,Nahamoo D,Goel V,et al．Trends and Advances in Speech Recognition[J]．IBM Journal of Research and Development,2011,55(5):2
[10] Deng L．An Overview of Deep-Structured Learning for Information Processing[C]∥APSIPA ASC 2011．Xi’an:2011
[11] Hinton G E,Osindero S,Teh Y．A Fast Learning Algorithm for Deep Belief Nets[J]．Neural Computation,2006,18:1527-1554
[12] Hinton G E,Salakhutdinov R．Reducing the Dimensionality ofData with Neural Networks[J]．Science,Recognition,Ph.D．thesis,OGI,Portland,USA,2006,313(5786):504-507
[13] Grézl F,Karaflat M,Kontar S,et al．Probabilistic and bottle-neck features for LVCSR of meetings[C]∥Proc．IEEE Int．Conf．on Acoustics,Speech,and Signal Processing．Honolulu,HI,USA,2007:757-760
[14] Hinton G E,Osindero S,Teh Y．A Fast Learning Algorithm for Deep Belief Nets[J]．Neural Computation,2006,18:1527-1554
[15] Hinton G E,Salakhutdinov R．Reducing the Dimensionality ofData with Neural Networks[J]．Science,2006,313(5786):504-507
[16] Pinto J,Sivaram G S V S,Doss M M,et al.Analysis of MLP Based Hierarchical Phoneme Posterior Probability Estimator[C]∥IEEE Transcations on Audio,Speech,and Language Proces-sing．2010
[17] Grezl F,Karafiat M,Kontar S,et al.Probabilistic and Bottleneck Features for LVCSR of Meetings[C]∥Proc．of IEEE Conf．Acoust．Speech Signal Process(ICASSP)．2007:757-760
[18] The 2007NIST Language Recognition Evaluation Plan．http://www.itl.nist.gov/iad/mig//tests/lre/2007/LRE07E-valPlan-v8b.pdf
[19] 李思一,戴蓓蒨,王海祥.基于子带 GMM-UBM的广播语音多语种识别[J]．数据采集与处理,2007,22(1):14-18

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed