资源稀缺蒙语语音识别研究

doi:10.11896/j.issn.1002-137X.2017.10.057

Abstract

Abstract: With the development of speech recognition technology,the research on low-resource speech recognition has gained extensive attention.Taking the Mongolian as the target language,we studied how to use the multilingual information to improve the performance of speech recognition in the low-resource condition,for example,only 10 hours of transcribed speech data are used for acoustic modeling.More discriminative acoustic model can be gotten by using cross-lingual transfer of multilingual deep neural network and multilingual deep bottleneck features.Large amount of web pages can be gotten by using the web search engine and Web crawler,which can help to get large amount of text data for improving the performance of language model.It can further improve the recognition results by fusing different number of recognition results from different recognizers.Comparing the fusion recognition result with the recognition result of baseline system,there are nearly 12% absolute word error rate (WER) reductions.

Key words: Low-resource,Multilingual deep neural network,Web based language model

ZHANG Ai-ying and NI Chong-jia. Research on Low-resource Mongolian Speech Recognition[J].Computer Science, 2017, 44(10): 318-322.

0
/ / Recommend

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

URL: https://www.jsjkx.com/EN/10.11896/j.issn.1002-137X.2017.10.057

https://www.jsjkx.com/EN/Y2017/V44/I10/318

References

[1] Ethnologue .http://www.ethnologue.com.
[2] Summer Institute for Linguistics Ethnologue Survey 1999.https://afrobranding.wordpress.com/tag/summer-institute-for-linguistics-sil-ethnologue-survey.
[3] BESACIER L,BARNARD E,KARPOV A,et al.AutomaticSpeech Recgnition for Under-resourced Languages:A Survey [J].Speech Communication,2014,56(1):85-100.
[4] HERMANSKY H,SHARMA S.Temporal Patterns (TRAPS) in ASR of Noisy Speech [C]∥Proc.of ICASSP 1999.1999:289-292.
[5] HERMANSKY H,SHARMA S,JAIN P.Data-derived Non-linear Mapping for Feature Extraction in HMM[C]∥Proc.of ASRU.1999.
[6] GRZL F,KARAFIA M,KONTAR S,et al.Probabilistic andBottle-Neck Features for LVCSR of Meetings[C]∥Proc.of ICASSP 2007.2007:757-760.
[7] THOMAS S,GANAPATHY S,HERMANSKY H.Multilin-gual MLP features for Low resource LVCSR Systems[C]∥Proc.of ICASSP 2012.2012:4269-4272.
[8] VESELY K,KARAFIAT M,GREZL F,et al.The Language-independent Bottleneck Features[C]∥Proc.of SLT 2012.2012:336-341.
[9] VU N T,BREITER W,METZE F,et al.An Investigation on Initialization Schemes for Multilayer Perceptron Training Using Multilingual Data and Their Effect on ASR Performance[J].Interspeech,2012,26(5):25681-25689.
[10] MIAO Y,METZE F.Improving Language-Universal FeatureExtraction with Deep Maxout and Convolutional Neural Networks [C]∥Proc.of Interspeech 2014.2014:800-804.
[11] YU D,DENG L.Automatic Speech Recognition-A DeepLearning Approach[M].Springer Press,2014.
[12] DAHL G E,YU D,DENG L,et al.Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech RecognitionJ].IEEE Transactions on Audio,Speech,and Language Processing,2012,20(1):33-42.
[13] HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].IEEE Signal Processing Magazine,2012,29(6):82-97.
[14] GHOSHAL A,SWIETOJANSKI P,RENTALS S.Multilingual Training of Deep Neural Networks[C]∥Proc.of ICASSP 2013.2013:7319-7323.
[15] HUANG J T,LI J,YU D,et al.Cross-language KnowledgeTransfer using Multilingual Deep Neural Network with Shared Hidden Layers [C]∥Proc.of ICASSP 2013.2013:7304-7308.
[16] LU Y,LU F,SEHGAL S,et al.Multitask Learning in Connectionist Speech Recognition[C]∥Proc.of Australian Internatio-nal Conference on Speech Science and Technology.2004.
[17] SELTZER M L,DROPPO J.Multi-task Learning in Deep Neural Networks for Improved Phoneme Recognition[C]∥Proc.of ICASSP 2013.2013:6965-6969.
[18] CHEN D,MAK B,LEUNG C C,et al.Joint Acoustic Modeling of Triphones and Trigraphemes by Multi-task Learning Deep Neural Networks for Low-resource Speech Recognition [C]∥Proc.of ICASSP 2014.2014:5592-5596.
[19] XU H,DO V H,XIAO X,et al.A Comparative Study of BNF and DNN Multilingual Training on Cross-lingual Low-resource Speech Recognition [C]∥Proc.of Interspeech 2015.2015:2132-2136.
[20] MENDELS G,COOPER E,SOTO V,et al.Improving SpeechRecognition and Keyword Search for Low-resource Languages Using Web Data[C]∥Proc.of Interspeech 2015.2015:829-833.
[21] CUCU H,BUZO A,BESACIER L,et al.SMT-based ASR Domain Adaptation Methods for Under-resourced Languages:Application to Romanian [J].Speech Communication,2014,56(1):195-212.
[22] OFLAZER K,EL-KAHLOUT I D.Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation[C]∥Proc.of Statistical Machine Translation Workshop at ACL 2007.2007:25-32.
[23] XIE C,GUO W,HU G,et al.Web Data Selection Based onWord Embedding for Low-resource Speech Recognition [C]∥Proc.of Interspeech 2016.2016:1340-1344.
[24] POVEY D,GHOSHAL A,BOULIANNE G,et al.The KaldiSpeech Recognition Toolkit [C]∥Proc.of IEEE 2011 Workshop on Automatic Speech Recognition and Understanding.2011.
[25] STOLCKE A.SRILM-An Extensible Language Modeling Toolkit [C]∥Proc.of ICSLP 2002.2002.
[26] ZHANG Y,CHUANGSUWANICH E,GLASS J.Language ID-based Training of Multilingual Stacked Bottleneck Features [C]∥Proc.of Interspeech 2014.2014:1-5.
[27] 曹道巴特尔.喀喇沁蒙古语研究[M].北京:民族出版社,2007.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research on Low-resource Mongolian Speech Recognition

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0