基于多语言语音数据选择的资源稀缺蒙语语音识别研究

doi:10.11896／j.issn.1002-137X.2018.09.052

计算机科学 ›› 2018, Vol. 45 ›› Issue (9): 308-313.doi: 10.11896／j.issn.1002-137X.2018.09.052

基于多语言语音数据选择的资源稀缺蒙语语音识别研究

张爱英

山东财经大学数学与数量经济学院济南250014

收稿日期:2017-11-17 出版日期:2018-09-20 发布日期:2018-10-10
通讯作者: 张爱英(1980－),女,讲师,主要研究方向为模式识别、数字信号处理等,E-mail:ayzhang@sdufe.edu.cn
基金资助:
本文受国家自然科学基金(61305027),山东省自然科学基金(ZR2011FQ024),山东省高等学校科技计划项目(J17KB160)资助。

Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection

ZHANG Ai-ying

School of Mathematic and Quantitative Economics,Shandong University of Finance and Economics,Jinan 250014,China

Received:2017-11-17 Online:2018-09-20 Published:2018-10-10

摘要/Abstract

摘要： 利用多语言信息可以提高资源稀缺语言识别系统的性能。但是,在利用多语言信息提高资源稀缺目标语言识别系统的性能时,并不是所有语言的语音数据对资源稀缺目标语言语音识别系统的性能提高都有帮助。文中提出利用长短时记忆递归神经网络语言辨识方法选择多语言数据以提高资源稀缺目标语言识别系统的性能;选出更加有效的多语言数据用于训练多语言深度神经网络和深度Bottleneck神经网络。通过跨语言迁移学习获得的深度神经网络和通过深度Bottleneck神经网络获得的Bottleneck特征都对提高资源稀缺目标语言语音识别系统的性能有很大的帮助。与基线系统相比,在插值的Web语言模型解码条件下,所提系统的错误率分别有10.5%和11.4%的绝对减少。

关键词: 多语言深度神经网络, 深度Bottleneck神经网络, 数据选择, 资源稀缺

Abstract: The performance of low-resource speech recognition system is improved by the multilingual information.However,when the multilingual information is used to improve the performance of low-resource automatic speech re-cognition system,notall of the multilingual speech data could be utilized to improve the performance of low-resource automatic speech recognition system.In this paper,a data selection method which is based on long short-term memory recurrent neural network based language identification was proposed and used to improve the performance of low-resource automatic speech recognition system.More efficient multilingual speech data are selected and used to train multilingual deep neural network and deep Bottleneck neural network.The deep neural network model obtained by using transfer learning and the Bottleneck features extracted by using the deep bottleneck neural network are both helpful to improve the performance of low-resource target language speech recognition system.Comparing with the baseline system,there are 10.5% and 11.4% absolute word error rate reductions under the condition of interpolated web based language mo-del for decoding.

Key words: Data selection, Deep Bottleneck neural network, Low-resource, Multilingual deep neural network

中图分类号:

TP391.42

张爱英. 基于多语言语音数据选择的资源稀缺蒙语语音识别研究[J]. 计算机科学, 2018, 45(9): 308-313. https://doi.org/10.11896／j.issn.1002-137X.2018.09.052

ZHANG Ai-ying. Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection[J]. Computer Science, 2018, 45(9): 308-313. https://doi.org/10.11896／j.issn.1002-137X.2018.09.052

参考文献

[1]Ethnologue.Ethnologue languages of the world [OL].http://www.ethnologue.com.
[2]BRANDING C C.Summer Institute for Linguistics Ethnologue Survey1999 [OL].https://afrobranding.wordpress.com/tag/summer-institute-for-linguistics-sil-ethnologue-survey.
[3]ZHANG Y,CHUANGSUWANICH E,GLASS J.Language ID-based Training of Multilingual Stacked Bottleneck Features [C]∥Proceedings of INTERSPEECH.Singapore:IEEE Press,2014:1-5.
[4]KNILL K M,GALES M J F,RATH S P,et al.Investigation of Multilingual Deep Neural Networks for Spoken Term Detection [C]∥Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop.Olomouc IEEE Press,2013:138-143.
[5]GHOSHAL A,SWIETOJANSKI P,RENALS S.Multilingual Training of Deep Neural Networks [C]∥Proceedings of IEEE International Conference on Acoustics,Speech and Signal Proceessing.Vancouver:IEEE Press,2013:7319-7323.
[6]HUANG J T,LI J,YU D,et al.Cross-language Knowledge Transfer using Multilingual Deep Neural Network with Shared Hidden Layers [C]∥Proceedings of IEEE International Confe-rence on Acoustics,Speech and Signal Proceessing.Vancouver:IEEE Press,2013:7304-7308.
[7]VU N T,IMSENG D ,POVEY D,et al.Multilingual Deep Neural Network based Acoustic Modeling for Rapid LanguageAdaptation[C]∥Proceedings of IEEE International Conference on Acoustics,Speech and Signal Proceessing.Florence:IEEE Press,2014:7639-7643.
[8]CUI J,KINGSBURY B,RAMABHADRAN B,et al.Multilin-gual Representation for Low-resource Speech Recognition and Keyword Search [C]∥Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop.Scottsdale:IEEE Press,2015:259-266.
[9]SIBO T,PHILIP N G,HERVE B.An Investigation of Deep Neural Networks for Multilingual Speech Recognition and Adaptation [C]∥Proceedings of INTERSPEECH.Stockholm: IEEE Press,2017:714-718.
[10]LU Y,LU F,SEHGAL S,et al.Multitask Learning in Connectionist Speech Recognition [C]∥Proceedings of Australian International Conference on Speech Science and Technology.Sydney:IEEE Press,2004:312-315.
[11]CHEN D,MAK B,LEUNG C C,et al.Joint Acoustic Modeling of Triphones and Trigraphemes by Multi-task Learning Deep Neural Networks for Low-resource Speech Recognition [C]∥Proceedings of IEEE International Conference on Acoustics,Speech and Signal Proceessing.Florence:IEEE Press,2014:5592-5596.
[12]ZHANG A Y,NI C J.Research on Low-resource Mongolian Speech Recognition[J].Computer Science,2017,44(10):318-322.(in Chinese)
张爱英,倪崇嘉.资源稀缺蒙语语音识别研究[J].计算机科学,2017,44(10):318-322.
[13]NI C,LEUNG C C,WANG L,et al.Efficient Methods to Train Multilingual Bottleneck Feature Extractors for Low Resource Keyword Search[C]∥Proceedings of IEEE International Conference on Acoustics,Speech and Signal Proceessing.New Or-leans:IEEE Press,2017:5650-5654.
[14]NI C,WANG L,LEUNG C C,et al.Rapid Update of Multilingual Deep Neural Network for Low-Resource Keyword Search [C]∥Proceedings of INTERSPEECH.San Francisco:IEEE Press,2016:3698-3702.
[15]GONZALEZ-DOMINGUEZ J, LOPEZ-MORENO I,SAK H,et al. Automatic Language Identification Using Long Short-Term Memory Recurrent Neural Networks [C] ∥Proceedings of NTERSPEECH.Singapore:IEEE Press,2014:2155-2159.
[16]XU H,DO V H,XIAO X,et al.A Comparative Study of BNF and DNN Multilingual Training on Cross-lingual Low-resource Speech Recognition [C]∥Proceedings of INTERSPEECH.Dresden:IEEE Press, 2015:2132-2136.
[17]POVEY D,GHOSHAL A,BOULIANNE G,et al.The Kaldi Speech Recognition Toolkit [C]∥Proceedings of IEEE 2011 Workshop on Automatic Speech Recognition and Understan-ding.Hawaii:IEEE Press,2011:1-4.
[18]STOLCKE A.SRILM-An Extensible Language Modeling Toolkit [C]∥Proceedings of International Conference on Spoken Language Processing.Denver:IEEE Press,2002:901-904.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于多语言语音数据选择的资源稀缺蒙语语音识别研究

Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

Metrics

本文评价

推荐阅读 0

[1]	周玉, 任钦差, 牛会宾. 训练样本数据选择方法研究综述 Research on Training Sample Data Selection Methods 计算机科学, 2020, 47(11A): 402-408. https://doi.org/10.11896/jsjkx.191100094
[2]	拥措, 史晓东, 尼玛扎西. 短文本情感分析的研究现状 ——从社交媒体到资源稀缺语言 Research Status of Sentiment Analysis for Short Text ——From Social Media to Scarce Resource Language 计算机科学, 2018, 45(6A): 46-49.
[3]	张爱英,倪崇嘉. 资源稀缺蒙语语音识别研究 Research on Low-resource Mongolian Speech Recognition 计算机科学, 2017, 44(10): 318-322. https://doi.org/10.11896/j.issn.1002-137X.2017.10.057
[4]	杜薇,崔国华,刘伟,石飞燕,位凯志. 云环境下面向数据密集型应用的数据选择策略研究 Data Selection Strategy for Data-intensive Applications in Cloud 计算机科学, 2012, 39(6): 30-34.