基于多语言语音数据选择的资源稀缺蒙语语音识别研究

doi:10.11896／j.issn.1002-137X.2018.09.052

Abstract

Abstract: The performance of low-resource speech recognition system is improved by the multilingual information.However,when the multilingual information is used to improve the performance of low-resource automatic speech re-cognition system,notall of the multilingual speech data could be utilized to improve the performance of low-resource automatic speech recognition system.In this paper,a data selection method which is based on long short-term memory recurrent neural network based language identification was proposed and used to improve the performance of low-resource automatic speech recognition system.More efficient multilingual speech data are selected and used to train multilingual deep neural network and deep Bottleneck neural network.The deep neural network model obtained by using transfer learning and the Bottleneck features extracted by using the deep bottleneck neural network are both helpful to improve the performance of low-resource target language speech recognition system.Comparing with the baseline system,there are 10.5% and 11.4% absolute word error rate reductions under the condition of interpolated web based language mo-del for decoding.

Key words: Data selection, Deep Bottleneck neural network, Low-resource, Multilingual deep neural network

CLC Number:

TP391.42

ZHANG Ai-ying. Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection[J].Computer Science, 2018, 45(9): 308-313.

0
/ / Recommend

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

URL: https://www.jsjkx.com/EN/10.11896／j.issn.1002-137X.2018.09.052

https://www.jsjkx.com/EN/Y2018/V45/I9/308

References

[1]Ethnologue.Ethnologue languages of the world [OL].http://www.ethnologue.com.
[2]BRANDING C C.Summer Institute for Linguistics Ethnologue Survey1999 [OL].https://afrobranding.wordpress.com/tag/summer-institute-for-linguistics-sil-ethnologue-survey.
[3]ZHANG Y,CHUANGSUWANICH E,GLASS J.Language ID-based Training of Multilingual Stacked Bottleneck Features [C]∥Proceedings of INTERSPEECH.Singapore:IEEE Press,2014:1-5.
[4]KNILL K M,GALES M J F,RATH S P,et al.Investigation of Multilingual Deep Neural Networks for Spoken Term Detection [C]∥Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop.Olomouc IEEE Press,2013:138-143.
[5]GHOSHAL A,SWIETOJANSKI P,RENALS S.Multilingual Training of Deep Neural Networks [C]∥Proceedings of IEEE International Conference on Acoustics,Speech and Signal Proceessing.Vancouver:IEEE Press,2013:7319-7323.
[6]HUANG J T,LI J,YU D,et al.Cross-language Knowledge Transfer using Multilingual Deep Neural Network with Shared Hidden Layers [C]∥Proceedings of IEEE International Confe-rence on Acoustics,Speech and Signal Proceessing.Vancouver:IEEE Press,2013:7304-7308.
[7]VU N T,IMSENG D ,POVEY D,et al.Multilingual Deep Neural Network based Acoustic Modeling for Rapid LanguageAdaptation[C]∥Proceedings of IEEE International Conference on Acoustics,Speech and Signal Proceessing.Florence:IEEE Press,2014:7639-7643.
[8]CUI J,KINGSBURY B,RAMABHADRAN B,et al.Multilin-gual Representation for Low-resource Speech Recognition and Keyword Search [C]∥Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop.Scottsdale:IEEE Press,2015:259-266.
[9]SIBO T,PHILIP N G,HERVE B.An Investigation of Deep Neural Networks for Multilingual Speech Recognition and Adaptation [C]∥Proceedings of INTERSPEECH.Stockholm: IEEE Press,2017:714-718.
[10]LU Y,LU F,SEHGAL S,et al.Multitask Learning in Connectionist Speech Recognition [C]∥Proceedings of Australian International Conference on Speech Science and Technology.Sydney:IEEE Press,2004:312-315.
[11]CHEN D,MAK B,LEUNG C C,et al.Joint Acoustic Modeling of Triphones and Trigraphemes by Multi-task Learning Deep Neural Networks for Low-resource Speech Recognition [C]∥Proceedings of IEEE International Conference on Acoustics,Speech and Signal Proceessing.Florence:IEEE Press,2014:5592-5596.
[12]ZHANG A Y,NI C J.Research on Low-resource Mongolian Speech Recognition[J].Computer Science,2017,44(10):318-322.(in Chinese)
张爱英,倪崇嘉.资源稀缺蒙语语音识别研究[J].计算机科学,2017,44(10):318-322.
[13]NI C,LEUNG C C,WANG L,et al.Efficient Methods to Train Multilingual Bottleneck Feature Extractors for Low Resource Keyword Search[C]∥Proceedings of IEEE International Conference on Acoustics,Speech and Signal Proceessing.New Or-leans:IEEE Press,2017:5650-5654.
[14]NI C,WANG L,LEUNG C C,et al.Rapid Update of Multilingual Deep Neural Network for Low-Resource Keyword Search [C]∥Proceedings of INTERSPEECH.San Francisco:IEEE Press,2016:3698-3702.
[15]GONZALEZ-DOMINGUEZ J, LOPEZ-MORENO I,SAK H,et al. Automatic Language Identification Using Long Short-Term Memory Recurrent Neural Networks [C] ∥Proceedings of NTERSPEECH.Singapore:IEEE Press,2014:2155-2159.
[16]XU H,DO V H,XIAO X,et al.A Comparative Study of BNF and DNN Multilingual Training on Cross-lingual Low-resource Speech Recognition [C]∥Proceedings of INTERSPEECH.Dresden:IEEE Press, 2015:2132-2136.
[17]POVEY D,GHOSHAL A,BOULIANNE G,et al.The Kaldi Speech Recognition Toolkit [C]∥Proceedings of IEEE 2011 Workshop on Automatic Speech Recognition and Understan-ding.Hawaii:IEEE Press,2011:1-4.
[18]STOLCKE A.SRILM-An Extensible Language Modeling Toolkit [C]∥Proceedings of International Conference on Spoken Language Processing.Denver:IEEE Press,2002:901-904.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 4

Metrics

Comments

Recommended 0

[1]	XIAN Yan-tuan, GAO Fan-ya, XIANG Yan, YU Zheng-tao, WANG Jian. Improving Low-resource Dependency Parsing Using Multi-strategy Data Augmentation [J]. Computer Science, 2022, 49(1): 73-79.
[2]	ZHOU Yu, REN Qin-chai, NIU Hui-bin. Research on Training Sample Data Selection Methods [J]. Computer Science, 2020, 47(11A): 402-408.
[3]	ZHANG Ai-ying and NI Chong-jia. Research on Low-resource Mongolian Speech Recognition [J]. Computer Science, 2017, 44(10): 318-322.
[4]	. Data Selection Strategy for Data-intensive Applications in Cloud [J]. Computer Science, 2012, 39(6): 30-34.