Computer Science ›› 2022, Vol. 49 ›› Issue (1): 47-52.doi: 10.11896/jsjkx.210900013
• Multilingual Computing Advanced Technology • Previous Articles Next Articles
CHENG Gao-feng1, YAN Yong-hong1,2
CLC Number:
[1]HINTON G,DENG L,YU D,et al.Deep neural networks foracoustic modeling in speech recognition:the shared views of four research groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97. [2]POVEY D,PEDDINTI V,GALVEZ D,et al.Purely sequence-trained neural networks for ASR based on lattice-free MMI[C]//Interspeech.2016:2751-2755. [3]GRAVES A,FERNÁNDEZ S,GOMEZ F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning.2006:369-376. [4]LIU C,ZHANG Q,ZHANG X,et al.Multilingual graphemic hybrid ASR with massive data augmentation[J].arXiv:1909.06522,2019. [5]TONG S,GARNER P N,BOURLARD H.An investigation of multilingual ASR using end-to-end LF-MMI[C]//IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP 2019).IEEE,2019:6061-6065. [6]TONG S,GARNER P N,BOURLARD H.Cross-lingual adaptation of a CTC-based multilingual acoustic model[J].Speech Communication,2018,104:39-46. [7]TONG S,GARNER P N,BOURLARD H.Fast LanguageAdaptation Using Phonological Information[C]//INTERSPEECH.2018:2459-2463. [8]HSU J Y,CHEN Y J,LEE H.Meta learning for end-to-end low-resource speech recognition[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Proces-sing (ICASSP).IEEE,2020:7844-7848. [9]DALMIA S,SANABRIA R,METZE F,et al.Sequence-basedmulti-lingual low resource speech recognition[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:4909-4913. [10]CHEN Y C,HSU J Y,LEE C K,et al.DARTS-ASR:Differen-tiable architecture search for multilingual speech recognition and adaptation[J].arXiv:2005.07029,2020. [11]THOMAS S,AUDHKHASI K,KINGSBURY B.Translitera-tion Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings[C]//INTERSPE-ECH.2020:4736-4740. [12]GRAVES A.Sequence transduction with recurrent neural networks[J].arXiv:1211.3711,2012. [13]CHAN W,JAITLY N,LE Q,et al.Listen,attend and spell:A neural network for large vocabulary conversational speech re-cognition[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2016:4960-4964. [14]PRATAP V,SRIRAM A,TOMASELLO P,et al.Massivelymultilingual ASR:50 languages,1 model,1 billion parameters[J].arXiv:2007.03001,2020. [15]LI B,PANG R,SAINATH T N,et al.Scaling end-to-end models for large-scale multilingual asr[J].arXiv:2104.14830,2021. [16]DATTA A,RAMABHADRAN B,EMOND J,et al.Langua-ge agnostic multilingual modeling[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2020:8239-8243. [17]KARAFIÁT M,BASKAR M K,WATANABE S,et al.Analysis of multilingual sequence-to-sequence speech recognition systems[J].arXiv:1811.03451,2018. [18]ADAMS O,WIESNER M,WATANABE S,et al.Massivelymultilingual adversarial speech recognition[J].arXiv:1904.02210,2019. [19]CHO J,BASKAR M K,LI R,et al.Multilingual sequence-to-sequence speech recognition:architecture,transfer learning,and language modeling[C]//2018 IEEE Spoken Language Techno-logy Workshop (SLT).IEEE,2018:521-527. [20]ZHOU S,XU S,XU B.Multilingual end-to-end speech recognition with a single transformer on low-resource languages[J].arXiv:1806.05059,2018. [21]LI B,ZHANG Y,SAINATH T,et al.Bytes are all you need:end-to-end multilingual speech recognition and synthesis with bytes[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2019:5621-5625. [22]HOU W,DONG Y,ZHUANG B,et al.Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning[C]//INTERSPEECH.2020:1037-1041. [23]WATANABE S,HORI T,HERSHEY J R.Language indepen-dent end-to-end architecture for joint language identification and speech recognition[C]//2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).IEEE,2017:265-271. [24]POVEY D,GHOSHAL A,BOULIANNE G,et al.The Kaldispeech recognition toolkit[C]//IEEE 2011 workshop on automatic speech recognition and understanding.IEEE Signal Processing Society,2011. [25]CAI W,CAI Z,ZHANG X,et al.A novel learnable dictionary encoding layer for end-to-end language identification[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:5189-5193. [26]CAI W,CAI Z,LIU W,et al.Insights in-to-end learning scheme for language identification[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:5209-5213. [27]MIAO X,MCLOUGHLIN I.Lstm-tdnn with convolutionalfront-end for dialect identification in the 2019 multi-genre broadcast challenge[J].arXiv:1912.09003,2019. [28]MIAO X,MCLOUGHLIN I,YAN Y.A New Time-Frequency Attention Tensor Network for Language Identification[J].Circuits,Systems,and Signal Processing,2020,39(5):2744-2758. [29]BEDYAKIN R,MIKHAYLOVSKIY N.Low-Resource Spoken Language Identification Using Self-Attentive Pooling and Deep 1D Time-Channel Separable Convolutions[J].arXiv:2106.00052,2021. [30]TJANDRA A,CHOUDHURY D G,ZHANG F,et al.Improved language identification through cross-lingual self-supervised learning[J].arXiv:2107.04082,2021. [31]KANNAN A,DATTA A,SAINATH T N,et al.Large-scalemultilingual speech recognition with a streaming end-to-end model[C]//Proc.Interspeech 2019,2019:2130-2134. [32]TOSHNIWAL S,SAINATH T N,WEISS R J,et al.Multilingual speech recognition with a single end-to-end model[C]//2018 IEEE international conference on acoustics,speech and signal processing (ICASSP).IEEE,2018:4904-4908. [33]PUNJABI S,ARSIKERE H,RAEESY Z,et al.Streaming end-to-end bilingual asr systems with joint language identification[J].arXiv:2007.03900,2020. [34]MIILLER M,STIIKER S,WAIBEL A.Multilingual adaptation of RNN based ASR systems[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:5219-5223. [35]SEKI H,WATANABE S,HORI T,et al.An end-to-end language-tracking speech recognizer for mixed-language speech[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:4919-4923. [36]WATERS A,GAUR N,HAGHANI P,et al.Leveraging lan-guage id in multilingual end-to-end speech recognition[C]//2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).IEEE,2019:928-935. [37]PUNJABI S,ARSIKERE H,RAEESY Z,et al.Joint ASR and language identification using RNN-T:An efficient approach to dynamic language switching[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics,Speech and Signal Proces-sing (ICASSP).IEEE,2021:7218-7222. [38]LIU D,WAN X,XU J,et al.Multilingual Speech Recognition Training and Adaptation with Language-Specific Gate Units[C]//2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).IEEE,2018:86-90. [39]LIU D,XU J,ZHANG P,et al.A unified system for multilingual speech recognition and language identification[J].Speech Communication,2021,127:17-28. [40]LIU D,XU J,ZHANG P.End-to-End Multilingual Speech Re-cognition System with Language Supervision Training[J].IEICE TRANSACTIONS on Information and Systems,2020,103(6):1427-1430. [41]KIM S,SELTZER M L.Towards language-universal end-to-end speech recognition[C]//Proc.of the IEEE International Confe-rence on Acoustics,Speech and Signal Processing.2018:4914-4918. [42]YI J,TAO J,WEN Z,et al.Adversarial multilingual training for low-resource speech recognition[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:4899-4903. [43]YI J,TAO J,WEN Z,et al.Language-adversarial transfer lear-ning for low-resource speech recognition[J].IEEE/ACM Tran-sactions on Audio,Speech,and Language Processing,2018,27(3):621-630. [44]STOLCKE A.Srilm-an extensible language modeling toolkit[C]//Proc.of the International Conference on Spoken Language Processing.2002:901-904. [45]WELLS J.SAMPA computer readable phonetic alphabet[M]//Handbook of Standards and Resources for Spoken Language Systems.Berlin and New York:Mouton de Gruyter,1997. [46]HAMPSHIRE W.A novel objective function for improved phoneme recognition using time delay neural networks[C]//Proc.of the International 1989 Joint Conference on Neural Networks.1989:235-241. [47]WAIBEL A,HANAZAWA T,HINTON G,et al.Phoneme re-cognition using time-delay neural networks[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1989,37(3):328-339. [48]HAMPSHIRE J B,WAIBEL A H.A novel objective functionfor improved phoneme recognition using time-delay neural networks[J].IEEE Transactions on Neural Networks,1990,1(2):216-228. [49]CHOROWSKI J,BAHDANAU D,SERDYUK D,et al.Atten-tion-based models for speech recognition[C]//Advances in Neural Information Processing Systems 28:Annual Conference on Neural Information Processing Systems 2015.2015:577-585. [50]LI J,YE G,DAS A,et al.Advancing acoustic-to-word CTCmodel[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:5794-5798. [51]YUAN Y,LEUNG C C,XIE L,et al.Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection[C]//2017 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2017:5645-5649. [52]RAM D,MICULICICH L,BOURLARD H.Multilingual bottleneck features for query by example spoken term detection[C]//2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).IEEE,2019:621-628. [53]RAM D,MICULICICH L,BOURLARD H.Neural networkbased end-to-end query by example spoken term detection[J].IEEE/ACM Transactionson Audio,Speech,and Language Processing,2020,28:1416-1427. [54]WATANABE S,HORI T,KIM S,et al.Hybrid CTC/attention architecture for end-to-end speech recognition[J].IEEE Journal of Selected Topics in Signal Processing,2017,11(8):1240-1253. [55]WATANABE S,HORI T,KARITA S,et al.Espnet:end-to-end speech processing toolkit[C]//Interspeech.2018:2207-2211. [56]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in neural information processing systems.2017:5998-6008. [57]GAGE P.A new algorithm for data compression[J].C Users Journal,1994,12(2):23-38. [58]SENNRICH R,HADDOW B,BIRCH A.Neural machine translation of rare words with subword units[J].arXiv:1508.07909,2015. |
[1] | XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141. |
[2] | LIU Jun-peng, SU Jin-song, HUANG De-gen. Incorporating Language-specific Adapter into Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 17-23. |
[3] | YU Dong, XIE Wan-ying, GU Shu-hao, FENG Yang. Similarity-based Curriculum Learning for Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 24-30. |
[4] | YANG Run-yan, CHENG Gao-feng, LIU Jian. Study on Keyword Search Framework Based on End-to-End Automatic Speech Recognition [J]. Computer Science, 2022, 49(1): 53-58. |
[5] | LIU Chuang, XIONG De-yi. Survey of Multilingual Question Answering [J]. Computer Science, 2022, 49(1): 65-72. |
[6] | ZHENG Chun-jun, WANG Chun-li, JIA Ning. Survey of Acoustic Feature Extraction in Speech Tasks [J]. Computer Science, 2020, 47(5): 110-119. |
[7] | CUI Yang, LIU Chang-hong. PIFA-based Evaluation Platform for Speech Recognition System [J]. Computer Science, 2020, 47(11A): 638-641. |
[8] | ZHANG Jing, YANG Jian, SU Peng. Survey of Monosyllable Recognition in Speech Recognition [J]. Computer Science, 2020, 47(11A): 172-174. |
[9] | SHI Yan-yan, BAI Jing. Speech Recognition Combining CFCC and Teager Energy Operators Cepstral Coefficients [J]. Computer Science, 2019, 46(5): 286-289. |
[10] | LONG Xing-yan, QU Dan, ZHANG Wen-lin. Attention Based Acoustics Model Combining Bottleneck Feature LONG Xing-yan QU Dan ZHANG Wen-lin [J]. Computer Science, 2019, 46(1): 260-264. |
[11] | ZHANG Ai-ying. Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection [J]. Computer Science, 2018, 45(9): 308-313. |
[12] | ZHANG Ai-ying and NI Chong-jia. Research on Low-resource Mongolian Speech Recognition [J]. Computer Science, 2017, 44(10): 318-322. |
[13] | LI Wei-lin, WEN Jian and MA Wen-kai. Speech Recognition System Based on Deep Neural Network [J]. Computer Science, 2016, 43(Z11): 45-49. |
[14] | WEI Ying, WANG Shuang-wei, PAN Di, ZHANG Ling, XU Ting-fa and LIANG Shi-li. Specific Two Words Chinese Lexical Recognition Based on Broadband and Narrowband Spectrogram Feature Fusion with Zoning Projection [J]. Computer Science, 2016, 43(Z11): 215-219. |
[15] | SUN Zhi-yuan, LU Cheng-xiang, SHI Zhong-zhi and MA Gang. Research and Advances on Deep Learning [J]. Computer Science, 2016, 43(2): 1-8. |
|