Computer Science ›› 2022, Vol. 49 ›› Issue (6A): 331-336.doi: 10.11896/jsjkx.210500180
• Image Processing & Multimedia Technology • Previous Articles Next Articles
LI Sun, CAO Feng
CLC Number:
[1] DAVIS K H.Automatic Recognition of Spoken Digits[J].Journal of the Acoustical Society of America,1952,24(6):669. [2] ATAL B S,HANAUER S L.Speech Analysis and Synthesis by Linear Prediction of the Speech Wave[J].J.Acoust.Soc.Am.,1971,50(2):637-655,. [3] ITAKURA F,SAITO S.A Statistical Method for Estimation of Speech Spectral Density and Format Frequencies[J].Electronics and Communications in Japan,1970,53(A):36-43. [4] HERMANSKY H.Perceptual Linear predictive(PLP) analysis of speech[J].Journal of the Acoustical Society of America,1990,87(4):1738-1752. [5] BAUM L E,EGON J A.An inequality with applications to statistical estimation for probabilistic functions of Markov process and to a model for ecology[J].Bull.Amer.Meteorol.Soc.m,1967,73:360-363. [6] BAUM L E,SHELL G R.Growth functions for transformations on manifolds[J].Pacific Journal of Mathematics,1968,27(2):211-227. [7] BAUM L E,PETRIE T,SOULES G,et al.A Maximizationtechnique ccurring in statistical analysis of probabilistic functions of Markov chains[J].Ann.Math.Stat.,1970,41(1):164-171. [8] BAUM L E.An inequality and associated maximization techniques in statistical estimation for probabilistic functions of Markov processes[M].Inequalities,1972,3:1-8. [9] BOURLARD H A,MORGAN N.Connectionist Speech Recognition:A Hybrid Approach[M].Springer US,1994. [10] HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition:The Shared Views of Four Research Groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97. [11] GRAVES A,FERNANDEZ S,GOMEZ F,et al.Connectionist Temporal Classification:Labelling Unsegmented Sequence Data with Recurrent Neural Networks[C]//ICML.Pittsburgh,USA,2006. [12] GRAVES A.Supervised sequence labeling with recurrent neural networks[M].vol.385,Springer,2012. [13] GRAVES A.Sequence transduction with recurrent neural networks[C]//ICML Representation Learning Worksop.2012. [14] GRAVES A.Sequence Transduction with Recurrent NeuralNetworks[J].Computerence,2012,58(3):235-242. [15] CHAN W,JAITLY N,LE Q,et al.Listen,attend and spell:A neural network for large vocabulary conversational speech reco-gnition[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2016. [16] ZHANG B,QUAN C Q,REN F J.Overview of speech synthesis methods and development[J].Minicomputer System,2016,37(1):186-192. [17] KLATT D H.Software for a cascade/parallel formant synthesizer[J].The Journal of the Acoustical Society of America,1980,67(3):971-971. [18] BLACK A W,CAMPBELLN.Optimising selection of units from speech databases for concatenative synthesis[J/OL].1996.https://www.researchgate.net/publication/2580972_Optimising_Selection_Of_Units_From_Speech_Databases_For_Concatenative_Synthesis. [19] MASUKO T,TOKUDA K,KOBAYASHI T,et al.HMM-based speech synthesis with various voice characteristics[J].The Journal of the Acoustical Society of America,1996,100(4):2760-2760. [20] WANG W F,XU S,XU B.First step towards end-to-end parametric TTS synthesis:Generating spectral parameters with neural attention[C]//Proceedings Interspeech.2016:2243-2247. [21] VAN DEN OORD A,DIELEMAN S,ZEN H,et al.WaveNet:A Generative Model for Raw Audio[C]//Proceedings of 9th ISCA Speech Synthesis Workshop.Seoul:ISCA,2016:125. [22] ARIK S O,CHRZANOWSKI M,COATES A,et al.Deep Voice:Real-time Neural Text-to-Speech[C]//Proceedings of the 34th International Conference on Machine Learning(ICMĹ17).Sydney:ACM,2017:195-204. [23] GIBIANSKY A,ARIK S,DIAMOS G,et al.Deep voice 2:Multi-speaker neural text-to-speech[C]//Advances in Neural Information Processing Systems.2017:2962-2970. [24] PING W,PENG K,GIBIANSKY A,et al.Deep Voice 3:Scaling Text-to-Speech with Convolutional Sequence Learning[J/OL].2017.https://arxiv.org/pdf/1710.07654.pdf. [25] SOTELO J,MEHRI S,KUMAR K,et al.Char2wav:End-to-End speech synthesis[C]//Proceedings of the ICLR 2017 Workshop.Toulon:ICLR,2017:24-26. [26] WANG Y,SKERRY-RYAN R,STANTON D,et al.Tacotron:Towards End-to-End Speech Synthesis[C]//Interspeech 2017 Stockholm.ISCA,2017:4006-4010. [27] SHEN J,PAN R,WEISS R J,et al.Natural TTS synthesis by conditioning WaveNet on Mel spectrogram predictions[C]//Proceedings of 2018 International Conference on Acoustics,Speech,and Signal Processing.Calgary:IEEE,2018:4779-4783. [28] JIA Y,JOHNSON M,MACHEREY W,et al.Leveraging Wea-kly Supervised Data to Improve End-to-end Speech-to-text Translation[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2019).IEEE,2019. [29] JIA Y,WEISS R J,BIADSYF,et al.Direct speech-to-speechtranslation with a sequence-to-sequence model[C]//Interspeech 2019.2019. [30] WANG Y,SKERRY-RYAN R,STANTON D,et al.Tacotron:Towards End-to-End Speech Synthesis[C]//Interspeech 2017.2017. [31] SHEN J,PANG R,WEISS R J,et al.Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2018).IEEE,2018. [32] PRABHAVALKARR,RAO K,SAINATH T N,et al.A Comparison of Sequence-to-Sequence Models for Speech Recognition[C]//Interspeech 2017.2017. [33] WATANABE S,HORI T,KARITA S,et al.ESPnet:End-to-End Speech Processing Toolkit[C]//Interspeech 2018.2018. [34] PHAM N Q,NGUYEN T S,NIEHUES J,et al.Very Deep Self-Attention Networks for End-to-End Speech Recognition[J/OL].2019.https://arxiv.org/abs/1904.13377. [35] YUAN Z,LYU Z,LI J,et al.An improved hybrid CTC-Attention model for speech recognition[J/OL].2018.https://arxiv.org/abs/1810.12020. [36] CHAN W,JAITLY N,LE Q,et al.Listen,attend and spell:A neural network for large vocabulary conversational speech reco-gnition[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2016. [37] XIAO X,WATANABE S,ERDOGAN H,et al.Deep beam-forming networks for multi-channel speech recognition[C]//IEEE International Conference on Acoustics.IEEE,2016. [38] ANUMANCHIPALLI G K,CHARTIER J,CHANG E F.Speech synthesis from neural decoding of spoken sentences[J].Nature,2019,568(7753):493-498. [39] CHIU C C,SAINATH T N,WU Y,et al.State-of-the-artSpeech Recognition With Sequence-to-Sequence Models[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2018).2017. |
[1] | ZHANG Ji-kai, LI Qi, WANG Yue-ming, LYU Xiao-qi. Survey of 3D Gesture Tracking Algorithms Based on Monocular RGB Images [J]. Computer Science, 2022, 49(4): 174-187. |
[2] | LIU Liang, PU Hao-yang. Real-time LSTM-based Multi-dimensional Features Gesture Recognition [J]. Computer Science, 2021, 48(8): 328-333. |
[3] | CHENG Shi-wei, CHEN Yi-jian, XU Jing-ru, ZHANG Liu-xin, WU Jian-feng, SUN Ling-yun. Approach to Classification of Eye Movement Directions Based on EEG Signal [J]. Computer Science, 2020, 47(4): 112-118. |
[4] | CHENG Shi-wei, QI Wen-jie. Dynamic Trajectory Based Implicit Calibration Method for Eye Tracking [J]. Computer Science, 2019, 46(8): 282-291. |
[5] | CHEN Tian-tian, YAO Huang, ZUO Ming-zhang, TIAN Yuan, YANG Meng-ting. Review of Dynamic Gesture Recognition Based on Depth Information [J]. Computer Science, 2018, 45(12): 42-51. |
[6] | MA Ding, ZHUANG Lei and LAN Ju-long. Research on End-to-End Model of Reconfigurable Information Communication Basal Network [J]. Computer Science, 2017, 44(6): 114-120. |
[7] | LIU Zhe and LI Zhi. Research and Development of Computer-aided Requirements Engineering Tool Based on Multi-modal Interaction Technologies [J]. Computer Science, 2017, 44(4): 177-181. |
[8] | XIN Yi-zhong, MA Ying and YU Xia. Comparison of Direct and Indirect Pen Tilt Input Performance [J]. Computer Science, 2015, 42(9): 50-55. |
[9] | GAO Yan-fei, CHEN Jun-jie and QIANG Yan. Dynamic Scheduling Algorithm in Hadoop Platform [J]. Computer Science, 2015, 42(9): 45-49. |
[10] | WANG Yu, REN Fu-ji and QUAN Chang-qin. Review of Dialogue Management Methods in Spoken Dialogue System [J]. Computer Science, 2015, 42(6): 1-7. |
[11] | HE Zheng-hai and LI Zhi. Research and Development of Computer-aided Requirements Analysis Tool Based on Human-computer Interaction [J]. Computer Science, 2015, 42(12): 181-183. |
[12] | LIU Xin-chen,FU Hui-yuan and MA Hua-dong. Real-time Fingertip Tracking and Gesture Recognition Using RGB-D Camera [J]. Computer Science, 2014, 41(10): 50-52. |
[13] | GAO Zeng-gui,SUN Shou-qian,ZHANG Ke-jun,SHE Duo-chun and YANG Zhong-liang. Gait Data System and Joint Movement Recognition Model for Human-exoskeleton Interaction [J]. Computer Science, 2014, 41(10): 42-44. |
[14] | CAO Juan,ZHANG Ying-chun and ZHAO Ling. Human-computer Hybrid Algorithm and its Application in Constrained Layout [J]. Computer Science, 2013, 40(7): 226-228. |
[15] | . Design of the Sign Language Template Library Based on Index Structure [J]. Computer Science, 2012, 39(12): 195-197. |
|