Computer Science ›› 2019, Vol. 46 ›› Issue (1): 260-264.doi: 10.11896/j.issn.1002-137X.2019.01.040

• Artificial Intelligence • Previous Articles     Next Articles

Attention Based Acoustics Model Combining Bottleneck Feature LONG Xing-yan QU Dan ZHANG Wen-lin

LONG Xing-yan, QU Dan, ZHANG Wen-lin   

  1. (Information System Engineering College,PLA Information Engineering University,Zhengzhou 450001,China)
  • Received:2017-12-08 Online:2019-01-15 Published:2019-02-25

Abstract: Currently,attention mechanism based sequence-to-sequence acoustic models has become a hotspot of speech recognition.In view of the problem of long training time and poor robustness,this paper proposed an acoustical model combining bottleneck features.The model is composed of the bottleneck feature extraction network based on deep belief network and the attention-based sequence-to-sequence model.DBN introduces the priori information of the traditional acoustic model to speed up the model convergence rate and enhance robustness and distinction of bottleneck feature.Attention model uses the timetemporal information of voice feature sequence to calculate the posterior probability of phoneme sequence.On the basis of the baseline system,the training time is decreased by reducing the layer number of the recurrent neural network in the attention model,and the recognition accuracy is optimized by changing the input dims and outputs of the bottleneck feature extraction network.Experiments on TIMIT dataset show that in the core test set,the phoneme error rate decreases to 17.80%,the average time training time during an iteration decreases by 52%,and the epochs of training iterations decreases to 89 from 139.

Key words: Acoustic model, Attention model, Bottleneck feature, Deep belief network

CLC Number: 

  • TP391
[1]HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition:The Shared Views of Four Research Groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.<br /> [2]CHOROWSKI J,BAHDANAU D,CHO K,et al.End-to-end Continuous Speech Recognition using Attention-based Recurrent NN:First Results[EB/OL].https://arxiv.org/abs/1412.1602.<br /> [3]BAHDANAU D,CHOROWSKI J,SERDYUK D,et al.End-to-end attention-based large vocabulary speech recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2016:4945-4949.<br /> [4]KIM S,HORI T,WATANABE S.Joint CTC-attention based end-to-end speech recognition using multi-task learning[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2017:4835-4839.<br /> [5]GREZL F,FOUSEK P.Optimizing bottle-neck features for lvcsr[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2008:4729-4732.<br /> [6]YU D,SELTZER M L.Improved Bottleneck Features Using Pretrained Deep Neural Networks[C]//2011 Twelfth Annual Conference of the International Speech Communication Association.2011:237-240<br /> [7]LI J H,YANG J A,WANG Y.New Feature Extraction Method Based on Bottleneck Deep Belief Networks and Its Applicationin Language Recognition[J].Computer Science,2014,41(3):263-266.(in Chinese)<br /> 李晋徽,杨俊安,王一.一种新的基于瓶颈深度信念网络的特征提取方法及其在语种识别中的应用[J].计算机科学,2014,41(3):263-266.<br /> [8]WANG Y,YANG J A,LIU H,et al.Bottleneck Feature Extraction Method Based on Hierarchical Deep Sparse Belief Network[J].Parttern Recognition and Artificial Intelligence,2015,28(2):173-180.(in Chinese)<br /> 王一,杨俊安,刘辉,等.基于层次稀疏DBN的瓶颈特征提取方法[J].模式识别与人工智能,2015,28(2):173-180.<br /> [9]CHEN L,YANG J A,WANG Y,et al.A Feature Extraction Method Based on Discriminative and Adaptive Bottleneck Deep Belief Network in Large Vocabulary Continuous Speech Recognition System[J].Journal of Signal Processing,2015,31(3):290-298.(in Chinese)<br /> 陈雷,杨俊安,王一,等.LVCSR 系统中一种基于区分性和自适应瓶颈深度置信网络的特征提取方法[J].信号处理,2015,31(3):290-298.<br /> [10]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[EB/OL].https://arxiv.org/abs/1409.0473.<br /> [11]CHO K,MERRIENBOER B V,GULCEHRE C et,al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[EB/OL].https://arxiv.org/abs/1406.1078.<br /> [12]MIAO Y.Kaldi+PDNN:Building DNN-based ASR Systems with Kaldi and PDNN[EB/OL].https://arxiv.org/abs/1401.6984.<br /> [13]PASCANU R,MIKOLOV T,BENGIO Y.On the difficulty of training Recurrent Neural Networks.https://arxiv.org/abs/1211.5063v2.<br /> [14]HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition:The Shared Views of Four Research Groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.<br /> [15]SUTSKEVER I,VINYALS O.Sequence to Sequence Learning with Neural Networ-ks[EB/OL].https://arxiv.org/abs/1409.3215.<br /> [16]GAROFOLO J S,LAMEL L F,FISHER W M,et al.TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)[J].Journal of the Acoustical Society of America,1993,88(88):210-221.<br /> [17]BERGSTRA J,BREULEUX O,BASTIEN F,et al.Theano:a CPU and GPU math compiler[EB/OL].http://conference.scipy.org/scipy2010/slides/james_bergstra_theano.pdf.<br /> [18]HINTON G E,OSINDERO S,TEH Y W.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2014,18(7):1527-1554.
[1] WU Zi-yi, LI Shao-mei, JIANG Meng-han, ZHANG Jian-peng. Ontology Alignment Method Based on Self-attention [J]. Computer Science, 2022, 49(9): 215-220.
[2] ZHAO Zheng-peng, LI Jun-gang, PU Yuan-yuan. Low-light Image Enhancement Based on Retinex Theory by Convolutional Neural Network [J]. Computer Science, 2022, 49(6): 199-209.
[3] CHENG Gao-feng, YAN Yong-hong. Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods [J]. Computer Science, 2022, 49(1): 47-52.
[4] LI Jia-qian, YAN Hua. Crowd Counting Method Based on Cross-column Features Fusion [J]. Computer Science, 2021, 48(6): 118-124.
[5] HONG Yao-qiu. Visual Human Action Recognition Based on Deep Belief Network [J]. Computer Science, 2021, 48(11A): 400-403.
[6] LI Zong-min, LI Si-yuan, LIU Yu-jie, LI Hua. Sketch-based Image Retrieval Based on Attention Model [J]. Computer Science, 2020, 47(11): 199-204.
[7] ZHANG Zong-mei, GUI Sheng-lin, REN Fei. Android Malware Detection Based on N-gram [J]. Computer Science, 2019, 46(2): 145-151.
[8] GAN Lu, ZANG Lie and LI Hang. Deep Belief Network Software Defect Prediction Model [J]. Computer Science, 2017, 44(4): 229-233.
[9] DU Yong-ping, CHEN Shou-qin and ZHAO Xiao-zheng. Method of Short Text Opinion Recognition Based on Feature Extension and Deep Learning [J]. Computer Science, 2017, 44(10): 283-288.
[10] QIN Yue, YU Long, TIAN Sheng-wei, ZHAO Jian-guo and FENG Guan-jun. Anaphoricity Determination of Uyghur Personal Pronouns Based on Deep Belief Network [J]. Computer Science, 2017, 44(10): 228-233.
[11] ZHU Chang-bao, CHENG Yong and GAO Qiang. Research on Image Classification Algorithm Based on Semi-supervised Deep Belief Network [J]. Computer Science, 2016, 43(Z6): 46-50.
[12] YANG Xu-hua and ZHONG Nan-yi. Forecasting of Hospital Outpatient Based on Deep Belief Network [J]. Computer Science, 2016, 43(Z11): 26-30.
[13] GUO Chao, YANG Yan and JIN Wei-dong. Fault Analysis of High Speed Train Based on EDBN-SVM [J]. Computer Science, 2016, 43(12): 281-286.
[14] CHAI Zhen-liang and ZANG Di. Localization of Causing-traffic-trouble Vehicle with Multi-level Cascaded Visual Attention Model [J]. Computer Science, 2015, 42(4): 285-291.
[15] ZHOU Ying, ZHANG Ji-hong, LIANG Yong-sheng and LIU Wei. Motion Characteristics Based Video Salient Region Extraction Method [J]. Computer Science, 2015, 42(11): 118-122.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!