计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 270-274.doi: 10.11896/jsjkx.210400041
刘晓璇, 季怡, 刘纯平
LIU Xiao-xuan, JI Yi, LIU Chun-ping
摘要: 声纹识别利用说话人生物特征的个体差异性,通过声音来识别说话人的身份。声纹具有非接触、易采集、特征稳定等特点,应用领域十分广泛。现有的统计模型方法具有提取特征单一、泛化能力不强等局限性。近年来,随着人工智能深度学习的快速发展,神经网络模型在声纹识别领域崭露头角。文中提出基于长短时记忆(Long Short-Term Memory,LSTM)神经网络的声纹识别方法,使用语谱图提取声纹特征作为模型输入,从而实现文本无关的声纹识别。语谱图能够综合表征语音信号在时间方向上的频率和能量信息,表达的声纹特征更加丰富。LSTM神经网络擅长捕捉时序特征,着重考虑了时间维度上的信息,相比其他神经网络模型,更契合语音数据的特点。文中将LSTM神经网络长期学习的优势与声纹语谱图的时序特征有效结合,实验结果表明,在THCHS-30语音数据集上取得了84.31%的识别正确率。在自然环境下,对于3 s的短语音,该方法的识别正确率达96.67%,与现有的高斯混合模型和卷积神经网络方法相比,所提方法的识别性能更优。
中图分类号:
[1]REYNOLDS D A.An overview of automatic speaker recognition technology[C]//IEEE International Conference on Acoustics.IEEE,2011. [2]FURUI S.Recent advances in speaker recognition[J].Pattern Recognition Letters,1997,18(9):859-872. [3]ATAL B S.Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification[J].The Journal of the Acoustical Society of America,1974,55(6):1304-1312. [4]VERGIN R,O'SHAUGHNESSY D,FARHAT A.Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition[J].IEEE Transactions on Speech and Audio Processing,1999,7(5):525-532. [5]RABINER L R.A tutorial on hidden Markov models and selected applications in speech recognition[J].Proceedings of the IEEE,1989,77(2):257-286. [6]REYNOLDS D A,ROSE R C.Robust text-independent speaker identification using Gaussian mixture speaker models[J].IEEE Transactions on Speech and Audio Processing,1995,3(1):72-83. [7]REYNOLDS D A,QUATIERI T F,DUNN R B.Speaker verification using adapted Gaussian mixture models[J].Digital Signal Processing,2000,10(1/2/3):19-41. [8]CHEN C,QI F.Review on Development of Convolutional Neural Network and Its Application in Computer Vision[J].Computer Science,2019,46(3):63-73. [9]GRAVES A,MOHAMED A,HINTON G.Speech recognitionwith deep recurrent neural networks[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:6645-6649. [10]LOPEZ M I,GONZALEZ D J,PLCHOT O,et al.Automatic language identification using deep neural networks[C]//2014 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2014:5337-5341. [11]ZHENG C J,WANG C L,JIA N.Survey of Acoustic FeatureExtraction in Speech Tasks[J].Computer Science,2020,47(5):110-119. [12]ROSENBERG A E,SOONG F K.Evaluation of a vector quanti- zation talker recognition system in text independent and text dependent modes[J].Computer Speech & Language,1987,2(3/4):143-157. [13]FURUI S.Cepstral analysis technique for automatic speakerverification[J].IEEE Transactions on Acoustics,Speech and Signal Processing,1981,29(2):254-272. [14]XIANG B,BERGER T.Efficient text-independent speaker verification with structural Gaussian mixture models and neural network[J].IEEE Transactions on Speech and Audio Processing,2003,11(5):447-456. [15]LUCK J E.Automatic speaker verification using cepstral mea-surements[J].The Journal of the Acoustical Society of America,1969,46(4B):1026-1032. [16]RICHARDSON F,REYNOLDS D,DEHAK N.Deep neuralnetwork approaches to speaker and language recognition[J].IEEE Signal Processing Letters,2015,22(10):1671-1675. [17]HUANG J T,LI J,GONG Y.An analysis of convolutional neural networks for speech recognition[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2015:4989-4993. [18]HEIGOLD G,MORENO I,BENGIO S,et al.End-to-end text-dependent speaker verification[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2016:5115-5119. [19]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [20]WU L Q,ZHANG D,LI S S,et al.Multi-modal Emotion Recognition Approach Based on Multi-task Learning[J].Computer Science,2019,46(11):284-290. [21]HUA M,LI D D,WANG Z,et al.End-to-End Speaker Recognition Based on Frame-level Features[J].Computer Science,2020,47(10):169-173. [22]WANG D,ZHANG X.Thchs-30:A free chinese speech corpus[J].arXiv:1512.01882,2015. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[3] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[4] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[5] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[6] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[7] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[8] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[9] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[10] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[11] | 王润安, 邹兆年. 基于物理操作级模型的查询执行时间预测方法 Query Performance Prediction Based on Physical Operation-level Models 计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074 |
[12] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[13] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[14] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[15] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
|