计算机科学 ›› 2019, Vol. 46 ›› Issue (1): 260-264.doi: 10.11896/j.issn.1002-137X.2019.01.040
龙星延, 屈丹, 张文林
LONG Xing-yan, QU Dan, ZHANG Wen-lin
摘要: 目前基于注意力机制的序列到序列声学模型成为语音识别领域的研究热点。针对该模型训练耗时长和鲁棒性差等问题,提出一种结合瓶颈特征的注意力声学模型。该模型由基于深度置信网络(Deep Belief Network,DBN)的瓶颈特征提取网络和基于注意力的序列到序列模型两部分组成:DBN能够引入传统声学模型的先验信息来加快模型的收敛速度,同时增强瓶颈特征的鲁棒性和区分性;注意力模型利用语音特征序列的时序信息计算音素序列的后验概率。在基线系统的基础上,通过减少注意力模型中循环神经网络的层数来减少训练的时间,通过改变瓶颈特征提取网络的输入层单元数和瓶颈层单元数来优化识别准确率。在TIMIT数据库上的实验表明,该模型在测试集上的音素错误率降低至了17.80%,训练的平均迭代周期缩短了52%,训练迭代次数由139减少至89。
中图分类号:
[1]HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition:The Shared Views of Four Research Groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.<br /> [2]CHOROWSKI J,BAHDANAU D,CHO K,et al.End-to-end Continuous Speech Recognition using Attention-based Recurrent NN:First Results[EB/OL].https://arxiv.org/abs/1412.1602.<br /> [3]BAHDANAU D,CHOROWSKI J,SERDYUK D,et al.End-to-end attention-based large vocabulary speech recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2016:4945-4949.<br /> [4]KIM S,HORI T,WATANABE S.Joint CTC-attention based end-to-end speech recognition using multi-task learning[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2017:4835-4839.<br /> [5]GREZL F,FOUSEK P.Optimizing bottle-neck features for lvcsr[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2008:4729-4732.<br /> [6]YU D,SELTZER M L.Improved Bottleneck Features Using Pretrained Deep Neural Networks[C]//2011 Twelfth Annual Conference of the International Speech Communication Association.2011:237-240<br /> [7]LI J H,YANG J A,WANG Y.New Feature Extraction Method Based on Bottleneck Deep Belief Networks and Its Applicationin Language Recognition[J].Computer Science,2014,41(3):263-266.(in Chinese)<br /> 李晋徽,杨俊安,王一.一种新的基于瓶颈深度信念网络的特征提取方法及其在语种识别中的应用[J].计算机科学,2014,41(3):263-266.<br /> [8]WANG Y,YANG J A,LIU H,et al.Bottleneck Feature Extraction Method Based on Hierarchical Deep Sparse Belief Network[J].Parttern Recognition and Artificial Intelligence,2015,28(2):173-180.(in Chinese)<br /> 王一,杨俊安,刘辉,等.基于层次稀疏DBN的瓶颈特征提取方法[J].模式识别与人工智能,2015,28(2):173-180.<br /> [9]CHEN L,YANG J A,WANG Y,et al.A Feature Extraction Method Based on Discriminative and Adaptive Bottleneck Deep Belief Network in Large Vocabulary Continuous Speech Recognition System[J].Journal of Signal Processing,2015,31(3):290-298.(in Chinese)<br /> 陈雷,杨俊安,王一,等.LVCSR 系统中一种基于区分性和自适应瓶颈深度置信网络的特征提取方法[J].信号处理,2015,31(3):290-298.<br /> [10]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[EB/OL].https://arxiv.org/abs/1409.0473.<br /> [11]CHO K,MERRIENBOER B V,GULCEHRE C et,al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[EB/OL].https://arxiv.org/abs/1406.1078.<br /> [12]MIAO Y.Kaldi+PDNN:Building DNN-based ASR Systems with Kaldi and PDNN[EB/OL].https://arxiv.org/abs/1401.6984.<br /> [13]PASCANU R,MIKOLOV T,BENGIO Y.On the difficulty of training Recurrent Neural Networks.https://arxiv.org/abs/1211.5063v2.<br /> [14]HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition:The Shared Views of Four Research Groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.<br /> [15]SUTSKEVER I,VINYALS O.Sequence to Sequence Learning with Neural Networ-ks[EB/OL].https://arxiv.org/abs/1409.3215.<br /> [16]GAROFOLO J S,LAMEL L F,FISHER W M,et al.TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)[J].Journal of the Acoustical Society of America,1993,88(88):210-221.<br /> [17]BERGSTRA J,BREULEUX O,BASTIEN F,et al.Theano:a CPU and GPU math compiler[EB/OL].http://conference.scipy.org/scipy2010/slides/james_bergstra_theano.pdf.<br /> [18]HINTON G E,OSINDERO S,TEH Y W.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2014,18(7):1527-1554. |
[1] | 吴子仪, 李邵梅, 姜梦函, 张建朋. 基于自注意力模型的本体对齐方法 Ontology Alignment Method Based on Self-attention 计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190 |
[2] | 赵征鹏, 李俊钢, 普园媛. 基于卷积神经网络的Retinex低照度图像增强 Low-light Image Enhancement Based on Retinex Theory by Convolutional Neural Network 计算机科学, 2022, 49(6): 199-209. https://doi.org/10.11896/jsjkx.210400092 |
[3] | 程高峰, 颜永红. 多语言语音识别声学模型建模方法最新进展 Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods 计算机科学, 2022, 49(1): 47-52. https://doi.org/10.11896/jsjkx.210900013 |
[4] | 李佳倩, 严华. 基于跨列特征融合的人群计数方法 Crowd Counting Method Based on Cross-column Features Fusion 计算机科学, 2021, 48(6): 118-124. https://doi.org/10.11896/jsjkx.200700107 |
[5] | 李宗民, 李思远, 刘玉杰, 李华. 基于注意力模型的手绘图像检索方法 Sketch-based Image Retrieval Based on Attention Model 计算机科学, 2020, 47(11): 199-204. https://doi.org/10.11896/jsjkx.190800145 |
[6] | 章宗美, 桂盛霖, 任飞. 基于N-gram 的Android恶意检测 Android Malware Detection Based on N-gram 计算机科学, 2019, 46(2): 145-151. https://doi.org/10.11896/j.issn.1002-137X.2019.02.023 |
[7] | 秦越,禹龙,田生伟,赵建国,冯冠军. 基于深度置信网络的维吾尔语人称代词待消解项识别 Anaphoricity Determination of Uyghur Personal Pronouns Based on Deep Belief Network 计算机科学, 2017, 44(10): 228-233. https://doi.org/10.11896/j.issn.1002-137X.2017.10.041 |
[8] | 孙劲光,全纹敬. 基于耦合关系模型的文本分类研究 Research on Coupling Model of Text Classification 计算机科学, 2016, 43(8): 273-276. https://doi.org/10.11896/j.issn.1002-137X.2016.08.055 |
[9] | 曾安,郑齐弥. 基于MIC的深度置信网络研究 Deep Belief Networks Research Based on Maximum Information Coefficient 计算机科学, 2016, 43(8): 249-253. https://doi.org/10.11896/j.issn.1002-137X.2016.08.050 |
[10] | 李晋徽,杨俊安,王一. 一种新的基于瓶颈深度信念网络的特征提取方法及其在语种识别中的应用 New Feature Extraction Method Based on Bottleneck Deep Belief Networks and its Application in Language Recognition 计算机科学, 2014, 41(3): 263-266. |
[11] | . 大词汇量连续语音识别中搜索空间的表示及相关搜索方法的研究进展 计算机科学, 2008, 35(2): 191-195. |
|