Computer Science ›› 2023, Vol. 50 ›› Issue (8): 111-117.doi: 10.11896/jsjkx.220600144
• Computer Graphics & Multimedia • Previous Articles Next Articles
TENG Sihang, WANG Lie, LI Ya
CLC Number:
[1]WANG H K,PAN J,LIU C.Research development and forecast of automatic speech recognition technologies[J].Telecommunications Science,2018,34(2):1-11. [2]HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition:The Shared Views of Four Research Groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97. [3]ZHENG C J,WANG C L,JIA N.Survey of Acoustic FeatureExtraction in Speech Tasks[J].Computer Science,2020,47(5):110-119. [4]LI S,CAO F.Analysis and Trend Research of End-to-EndFramework Model of Intelligent Speech Technology[J].Computer Science,2022,49(6A):331-336. [5]AMODEI D,ANANTHANARAYANAN S,ANUBHAI R,et al.Deep speech 2:End-to-end speech recognition in english and mandarin[C]//International Conference on Machine Lear-ning.PMLR.2016:173-182. [6]BAHDANAU D,CHOROWSKI J,SERDYUK D,et al.End-to-end attention-based large vocabulary speech recognition[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2016:4945-4949. [7]CHAN W,JAITLY N,LE Q,et al.Listen,attend and spell:A neural network for large vocabulary conversational speech re-cognition[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2016:4960-4964. [8]CHOROWSKI J K,BAHDANAU D,SERDYUK D,et al.At-tention-based models for speech recognition[J].Advances in Neural Information Processing Systems,2015,28:577-585. [9]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in Neural Information Processing Systems,2017,30:5998-6008. [10]GUO J,TAN X,HE D,et al.Non-autoregressive neural machine translation with enhanced decoder input[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):3723-3730. [11]DONG L,XU S,XU B.Speech-transformer:a no-recurrence sequence-to-sequence model for speech recognition[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2018:5884-5888. [12]SAK H,SENIOR A,BEAUFAYS F.Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition[J].arXiv:1402.1128,2014. [13]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014. [14]GRAVES A.Generating sequences with recurrent neural networks[J].arXiv:1308.0850,2013. [15]CHEN X,ZHANG S,SONG D,et al.Transformer with bidirectional decoder for speech recognition[J].arXiv:2008.04481,2020. [16]BAI Y,YI J,TAO J,et al.Fast end-to-end speech recognition via non-autoregressive models and cross-modal knowledge transferring from bert[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:1897-1911. [17]HIGUCHI Y,INAGUMA H,WATANABE S,et al.Improved mask-CTC for non-autoregressive end-to-end ASR[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2021:8363-8367. [18]LI J,WANG X,LI Y.The speech transformer for large-scale mandarin chinese speech recognition[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP 2019).IEEE,2019:7095-7099. [19]ZHOU P,FAN R,CHEN W,et al.Improving generalization of transformer for speech recognition with parallel schedule sampling and relative positional embedding[J].arXiv:1911.00203,2019. [20]DO M N,LU Y M.Multidimensional filter banks and multiscale geometric representations[J].Foundations and Trends in Signal Processing,2012,5(3):157-264. [21]GRAVES A,FERNÁNDEZ S,GOMEZ F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine learning.2006:369-376. [22]MIAO H,CHENG G,GAO C,et al.Transformer-based onlineCTC/attention end-to-end speech recognition architecture[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2020:6084-6088. [23]PETERS M E,NEUMANN M,IYYER M,et al.Deep Contextualized Word Representations[J].arXiv:1802.05365,2018. [24]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [25]BU H,DU J,NA X,et al.Aishell-1:An open-source mandarin speech corpus and a speech recognition baseline[C]//2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment(O-COCOSDA).IEEE,2017:1-5. [26]OTT M,EDUNOV S,GRANGIER D,et al.Scaling Neural Machine Translation[J].arXiv:1806.00187,2018. [27]PARK D S,CHAN W,ZHANG Y,et al.Specaugment:A simple data augmentation method for automatic speech recognition[J].arXiv:1904.08779,2019. [28]SHAN C,WENG C,WANG G,et al.Component fusion:Lear-ning replaceable language model component for end-to-end speech recognition system[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Proces-sing(ICASSP).IEEE,2019:5361-5635. [29]FAN C,YI J,TAO J,et al.Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:198-209. [30]TIAN Z,YI J,BAI Y,et al.Synchronous transformers for end-to-end speech recognition[C]//2020 IEEE International Confe-rence on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:7884-7888. |
[1] | YAN Mingqiang, YU Pengfei, LI Haiyan, LI Hongsong. Arbitrary Image Style Transfer with Consistent Semantic Style [J]. Computer Science, 2023, 50(7): 129-136. |
[2] | ZHU Yuying, GUO Yan, WAN Yizhao, TIAN Kai. New Word Detection Based on Branch Entropy-Segmentation Probability Model [J]. Computer Science, 2023, 50(7): 221-228. |
[3] | LI Fan, JIA Dongli, YAO Yumin, TU Jun. Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism [J]. Computer Science, 2023, 50(6A): 220500104-5. |
[4] | BAI Zhengyao, FAN Shenglan, LU Qianjie, ZHOU Xue. COVID-19 Instance Segmentation and Classification Network Based on CT Image Semantics [J]. Computer Science, 2023, 50(6A): 220600142-9. |
[5] | YANG Jingyi, LI Fang, KANG Xiaodong, WANG Xiaotian, LIU Hanqing, HAN Junling. Ultrasonic Image Segmentation Based on SegFormer [J]. Computer Science, 2023, 50(6A): 220400273-6. |
[6] | DOU Zhi, HU Chenguang, LIANG Jingyi, ZHENG Liming, LIU Guoqi. Lightweight Target Detection Algorithm Based on Improved Yolov4-tiny [J]. Computer Science, 2023, 50(6A): 220700006-7. |
[7] | YANG Bin, LIANG Jing, ZHOU Jiawei, ZHAO Mengci. Study on Interpretable Click-Through Rate Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(5): 12-20. |
[8] | WANG Xianwang, ZHOU Hao, ZHANG Minghui, ZHU Youwei. Hyperspectral Image Classification Based on Swin Transformer and 3D Residual Multilayer Fusion Network [J]. Computer Science, 2023, 50(5): 155-160. |
[9] | YANG Xiaoyu, LI Chao, CHEN Shunyao, LI Haoliang, YIN Guangqiang. Text-Image Cross-modal Retrieval Based on Transformer [J]. Computer Science, 2023, 50(4): 141-148. |
[10] | LIANG Weiliang, LI Yue, WANG Pengfei. Lightweight Face Generation Method Based on TransEditor and Its Application Specification [J]. Computer Science, 2023, 50(2): 221-230. |
[11] | CAO Jinjuan, QIAN Zhong, LI Peifeng. End-to-End Event Factuality Identification with Joint Model [J]. Computer Science, 2023, 50(2): 292-299. |
[12] | CAI Xiao, CEHN Zhihua, SHENG Bin. SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing [J]. Computer Science, 2023, 50(1): 105-113. |
[13] | ZHANG Jingyuan, WANG Hongxia, HE Peisong. Multitask Transformer-based Network for Image Splicing Manipulation Detection [J]. Computer Science, 2023, 50(1): 114-122. |
[14] | WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48. |
[15] | XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141. |
|