计算机科学 ›› 2019, Vol. 46 ›› Issue (1): 260-264.doi: 10.11896/j.issn.1002-137X.2019.01.040

• 人工智能 • 上一篇    下一篇

结合瓶颈特征的注意力声学模型

龙星延, 屈丹, 张文林   

  1. (解放军信息工程大学信息系统工程学院 郑州450001)
  • 收稿日期:2017-12-08 出版日期:2019-01-15 发布日期:2019-02-25
  • 作者简介:龙星延(1992-),男,硕士生,主要研究方向为语音识别与人工智能;屈 丹(1974-),女,博士,副教授,主要研究方向为语音信号处理、智能信息处理等,E-mail:qudanqudan@sina.com(通信作者);张文林(1982-),男,博士,副教授,主要研究方向为语音识别、人工智能。
  • 基金资助:
    国家自然科学基金(61673395,61403415),河南省自然科学基金(162300410331)资助

Attention Based Acoustics Model Combining Bottleneck Feature LONG Xing-yan QU Dan ZHANG Wen-lin

LONG Xing-yan, QU Dan, ZHANG Wen-lin   

  1. (Information System Engineering College,PLA Information Engineering University,Zhengzhou 450001,China)
  • Received:2017-12-08 Online:2019-01-15 Published:2019-02-25

摘要: 目前基于注意力机制的序列到序列声学模型成为语音识别领域的研究热点。针对该模型训练耗时长和鲁棒性差等问题,提出一种结合瓶颈特征的注意力声学模型。该模型由基于深度置信网络(Deep Belief Network,DBN)的瓶颈特征提取网络和基于注意力的序列到序列模型两部分组成:DBN能够引入传统声学模型的先验信息来加快模型的收敛速度,同时增强瓶颈特征的鲁棒性和区分性;注意力模型利用语音特征序列的时序信息计算音素序列的后验概率。在基线系统的基础上,通过减少注意力模型中循环神经网络的层数来减少训练的时间,通过改变瓶颈特征提取网络的输入层单元数和瓶颈层单元数来优化识别准确率。在TIMIT数据库上的实验表明,该模型在测试集上的音素错误率降低至了17.80%,训练的平均迭代周期缩短了52%,训练迭代次数由139减少至89。

关键词: 瓶颈特征, 深度置信网络, 声学模型, 注意力模型

Abstract: Currently,attention mechanism based sequence-to-sequence acoustic models has become a hotspot of speech recognition.In view of the problem of long training time and poor robustness,this paper proposed an acoustical model combining bottleneck features.The model is composed of the bottleneck feature extraction network based on deep belief network and the attention-based sequence-to-sequence model.DBN introduces the priori information of the traditional acoustic model to speed up the model convergence rate and enhance robustness and distinction of bottleneck feature.Attention model uses the timetemporal information of voice feature sequence to calculate the posterior probability of phoneme sequence.On the basis of the baseline system,the training time is decreased by reducing the layer number of the recurrent neural network in the attention model,and the recognition accuracy is optimized by changing the input dims and outputs of the bottleneck feature extraction network.Experiments on TIMIT dataset show that in the core test set,the phoneme error rate decreases to 17.80%,the average time training time during an iteration decreases by 52%,and the epochs of training iterations decreases to 89 from 139.

Key words: Acoustic model, Attention model, Bottleneck feature, Deep belief network

中图分类号: 

  • TP391
[1]HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition:The Shared Views of Four Research Groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.<br /> [2]CHOROWSKI J,BAHDANAU D,CHO K,et al.End-to-end Continuous Speech Recognition using Attention-based Recurrent NN:First Results[EB/OL].https://arxiv.org/abs/1412.1602.<br /> [3]BAHDANAU D,CHOROWSKI J,SERDYUK D,et al.End-to-end attention-based large vocabulary speech recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2016:4945-4949.<br /> [4]KIM S,HORI T,WATANABE S.Joint CTC-attention based end-to-end speech recognition using multi-task learning[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2017:4835-4839.<br /> [5]GREZL F,FOUSEK P.Optimizing bottle-neck features for lvcsr[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2008:4729-4732.<br /> [6]YU D,SELTZER M L.Improved Bottleneck Features Using Pretrained Deep Neural Networks[C]//2011 Twelfth Annual Conference of the International Speech Communication Association.2011:237-240<br /> [7]LI J H,YANG J A,WANG Y.New Feature Extraction Method Based on Bottleneck Deep Belief Networks and Its Applicationin Language Recognition[J].Computer Science,2014,41(3):263-266.(in Chinese)<br /> 李晋徽,杨俊安,王一.一种新的基于瓶颈深度信念网络的特征提取方法及其在语种识别中的应用[J].计算机科学,2014,41(3):263-266.<br /> [8]WANG Y,YANG J A,LIU H,et al.Bottleneck Feature Extraction Method Based on Hierarchical Deep Sparse Belief Network[J].Parttern Recognition and Artificial Intelligence,2015,28(2):173-180.(in Chinese)<br /> 王一,杨俊安,刘辉,等.基于层次稀疏DBN的瓶颈特征提取方法[J].模式识别与人工智能,2015,28(2):173-180.<br /> [9]CHEN L,YANG J A,WANG Y,et al.A Feature Extraction Method Based on Discriminative and Adaptive Bottleneck Deep Belief Network in Large Vocabulary Continuous Speech Recognition System[J].Journal of Signal Processing,2015,31(3):290-298.(in Chinese)<br /> 陈雷,杨俊安,王一,等.LVCSR 系统中一种基于区分性和自适应瓶颈深度置信网络的特征提取方法[J].信号处理,2015,31(3):290-298.<br /> [10]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[EB/OL].https://arxiv.org/abs/1409.0473.<br /> [11]CHO K,MERRIENBOER B V,GULCEHRE C et,al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[EB/OL].https://arxiv.org/abs/1406.1078.<br /> [12]MIAO Y.Kaldi+PDNN:Building DNN-based ASR Systems with Kaldi and PDNN[EB/OL].https://arxiv.org/abs/1401.6984.<br /> [13]PASCANU R,MIKOLOV T,BENGIO Y.On the difficulty of training Recurrent Neural Networks.https://arxiv.org/abs/1211.5063v2.<br /> [14]HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition:The Shared Views of Four Research Groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.<br /> [15]SUTSKEVER I,VINYALS O.Sequence to Sequence Learning with Neural Networ-ks[EB/OL].https://arxiv.org/abs/1409.3215.<br /> [16]GAROFOLO J S,LAMEL L F,FISHER W M,et al.TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)[J].Journal of the Acoustical Society of America,1993,88(88):210-221.<br /> [17]BERGSTRA J,BREULEUX O,BASTIEN F,et al.Theano:a CPU and GPU math compiler[EB/OL].http://conference.scipy.org/scipy2010/slides/james_bergstra_theano.pdf.<br /> [18]HINTON G E,OSINDERO S,TEH Y W.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2014,18(7):1527-1554.
[1] 吴子仪, 李邵梅, 姜梦函, 张建朋.
基于自注意力模型的本体对齐方法
Ontology Alignment Method Based on Self-attention
计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[2] 赵征鹏, 李俊钢, 普园媛.
基于卷积神经网络的Retinex低照度图像增强
Low-light Image Enhancement Based on Retinex Theory by Convolutional Neural Network
计算机科学, 2022, 49(6): 199-209. https://doi.org/10.11896/jsjkx.210400092
[3] 程高峰, 颜永红.
多语言语音识别声学模型建模方法最新进展
Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods
计算机科学, 2022, 49(1): 47-52. https://doi.org/10.11896/jsjkx.210900013
[4] 李佳倩, 严华.
基于跨列特征融合的人群计数方法
Crowd Counting Method Based on Cross-column Features Fusion
计算机科学, 2021, 48(6): 118-124. https://doi.org/10.11896/jsjkx.200700107
[5] 李宗民, 李思远, 刘玉杰, 李华.
基于注意力模型的手绘图像检索方法
Sketch-based Image Retrieval Based on Attention Model
计算机科学, 2020, 47(11): 199-204. https://doi.org/10.11896/jsjkx.190800145
[6] 章宗美, 桂盛霖, 任飞.
基于N-gram 的Android恶意检测
Android Malware Detection Based on N-gram
计算机科学, 2019, 46(2): 145-151. https://doi.org/10.11896/j.issn.1002-137X.2019.02.023
[7] 秦越,禹龙,田生伟,赵建国,冯冠军.
基于深度置信网络的维吾尔语人称代词待消解项识别
Anaphoricity Determination of Uyghur Personal Pronouns Based on Deep Belief Network
计算机科学, 2017, 44(10): 228-233. https://doi.org/10.11896/j.issn.1002-137X.2017.10.041
[8] 孙劲光,全纹敬.
基于耦合关系模型的文本分类研究
Research on Coupling Model of Text Classification
计算机科学, 2016, 43(8): 273-276. https://doi.org/10.11896/j.issn.1002-137X.2016.08.055
[9] 曾安,郑齐弥.
基于MIC的深度置信网络研究
Deep Belief Networks Research Based on Maximum Information Coefficient
计算机科学, 2016, 43(8): 249-253. https://doi.org/10.11896/j.issn.1002-137X.2016.08.050
[10] 李晋徽,杨俊安,王一.
一种新的基于瓶颈深度信念网络的特征提取方法及其在语种识别中的应用
New Feature Extraction Method Based on Bottleneck Deep Belief Networks and its Application in Language Recognition
计算机科学, 2014, 41(3): 263-266.
[11] .
大词汇量连续语音识别中搜索空间的表示及相关搜索方法的研究进展

计算机科学, 2008, 35(2): 191-195.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!