计算机科学 ›› 2017, Vol. 44 ›› Issue (10): 318-322.doi: 10.11896/j.issn.1002-137X.2017.10.057
• 图形图像与模式识别 • 上一篇
张爱英,倪崇嘉
ZHANG Ai-ying and NI Chong-jia
摘要: 随着语音识别技术的发展,资源稀缺语言的语音识别系统的研究吸引了更广泛的关注。以蒙语为目标识别语言,研究了在资源稀缺的情况下(如仅有10小时的带标注的语音)如何利用其他多语言信息提高识别系统的性能。借助基于多语言深度神经网络的跨语言迁移学习和基于多语言深度Bottleneck神经网络的抽取特征可以获得更具有区分度的声学模型。通过搜索引擎以及网络爬虫的定向抓取获得大量的网页数据,有助于获得文本数据,以增强语言模型的性能。融合多个不同识别结果以进一步提高识别精度。与基线系统相比,多种系统融合的识别绝对错误率减少12%。
[1] Ethnologue .http://www.ethnologue.com. [2] Summer Institute for Linguistics Ethnologue Survey 1999.https://afrobranding.wordpress.com/tag/summer-institute-for-linguistics-sil-ethnologue-survey. [3] BESACIER L,BARNARD E,KARPOV A,et al.AutomaticSpeech Recgnition for Under-resourced Languages:A Survey [J].Speech Communication,2014,56(1):85-100. [4] HERMANSKY H,SHARMA S.Temporal Patterns (TRAPS) in ASR of Noisy Speech [C]∥Proc.of ICASSP 1999.1999:289-292. [5] HERMANSKY H,SHARMA S,JAIN P.Data-derived Non-linear Mapping for Feature Extraction in HMM[C]∥Proc.of ASRU.1999. [6] GRZL F,KARAFIA M,KONTAR S,et al.Probabilistic andBottle-Neck Features for LVCSR of Meetings[C]∥Proc.of ICASSP 2007.2007:757-760. [7] THOMAS S,GANAPATHY S,HERMANSKY H.Multilin-gual MLP features for Low resource LVCSR Systems[C]∥Proc.of ICASSP 2012.2012:4269-4272. [8] VESELY K,KARAFIAT M,GREZL F,et al.The Language-independent Bottleneck Features[C]∥Proc.of SLT 2012.2012:336-341. [9] VU N T,BREITER W,METZE F,et al.An Investigation on Initialization Schemes for Multilayer Perceptron Training Using Multilingual Data and Their Effect on ASR Performance[J].Interspeech,2012,26(5):25681-25689. [10] MIAO Y,METZE F.Improving Language-Universal FeatureExtraction with Deep Maxout and Convolutional Neural Networks [C]∥Proc.of Interspeech 2014.2014:800-804. [11] YU D,DENG L.Automatic Speech Recognition-A DeepLearning Approach[M].Springer Press,2014. [12] DAHL G E,YU D,DENG L,et al.Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech RecognitionJ].IEEE Transactions on Audio,Speech,and Language Processing,2012,20(1):33-42. [13] HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].IEEE Signal Processing Magazine,2012,29(6):82-97. [14] GHOSHAL A,SWIETOJANSKI P,RENTALS S.Multilingual Training of Deep Neural Networks[C]∥Proc.of ICASSP 2013.2013:7319-7323. [15] HUANG J T,LI J,YU D,et al.Cross-language KnowledgeTransfer using Multilingual Deep Neural Network with Shared Hidden Layers [C]∥Proc.of ICASSP 2013.2013:7304-7308. [16] LU Y,LU F,SEHGAL S,et al.Multitask Learning in Connectionist Speech Recognition[C]∥Proc.of Australian Internatio-nal Conference on Speech Science and Technology.2004. [17] SELTZER M L,DROPPO J.Multi-task Learning in Deep Neural Networks for Improved Phoneme Recognition[C]∥Proc.of ICASSP 2013.2013:6965-6969. [18] CHEN D,MAK B,LEUNG C C,et al.Joint Acoustic Modeling of Triphones and Trigraphemes by Multi-task Learning Deep Neural Networks for Low-resource Speech Recognition [C]∥Proc.of ICASSP 2014.2014:5592-5596. [19] XU H,DO V H,XIAO X,et al.A Comparative Study of BNF and DNN Multilingual Training on Cross-lingual Low-resource Speech Recognition [C]∥Proc.of Interspeech 2015.2015:2132-2136. [20] MENDELS G,COOPER E,SOTO V,et al.Improving SpeechRecognition and Keyword Search for Low-resource Languages Using Web Data[C]∥Proc.of Interspeech 2015.2015:829-833. [21] CUCU H,BUZO A,BESACIER L,et al.SMT-based ASR Domain Adaptation Methods for Under-resourced Languages:Application to Romanian [J].Speech Communication,2014,56(1):195-212. [22] OFLAZER K,EL-KAHLOUT I D.Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation[C]∥Proc.of Statistical Machine Translation Workshop at ACL 2007.2007:25-32. [23] XIE C,GUO W,HU G,et al.Web Data Selection Based onWord Embedding for Low-resource Speech Recognition [C]∥Proc.of Interspeech 2016.2016:1340-1344. [24] POVEY D,GHOSHAL A,BOULIANNE G,et al.The KaldiSpeech Recognition Toolkit [C]∥Proc.of IEEE 2011 Workshop on Automatic Speech Recognition and Understanding.2011. [25] STOLCKE A.SRILM-An Extensible Language Modeling Toolkit [C]∥Proc.of ICSLP 2002.2002. [26] ZHANG Y,CHUANGSUWANICH E,GLASS J.Language ID-based Training of Multilingual Stacked Bottleneck Features [C]∥Proc.of Interspeech 2014.2014:1-5. [27] 曹道巴特尔.喀喇沁蒙古语研究[M].北京:民族出版社,2007. |
No related articles found! |
|