Computer Science ›› 2021, Vol. 48 ›› Issue (11A): 270-274.doi: 10.11896/jsjkx.210400041

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Voiceprint Recognition Based on LSTM Neural Network

LIU Xiao-xuan, JI Yi, LIU Chun-ping   

  1. School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
  • Online:2021-11-10 Published:2021-11-12
  • About author:LIU Xiao-xuan,born in 1999,undergraduate.Her main research interests include machine learning and pattern recognition.
    JI Yi,born in 1973,Ph.D,associate professor,is a member of China Computer Federation.Her main research interests include pattern recognition and computervision.
  • Supported by:
    Hui-Chun Chin and Tsung-Dao Lee Chinese Undergraduate Research Endowment(CURE),National Natural Science Foundation of China(61773272) and Natural Science Foundation of the Jiangsu Higher Education Institutions of China(19KJA230001).

Abstract: Voiceprint recognition determines the identification of the given speaker by voice,using the individual differences of biological characteristics.It has a wide range of use,with the characteristics of non-contact,simple acquisition,feature stability and so on.The existing statistical methods of voiceprint recognition have the limitations of single-source extracted feature and weak generalization ability.In recent years,with the rapid development of artificial intelligence and deep learning,neural networks are emerging in the field of voiceprint recognition.In this paper,a method based on Long Short-Term Memory (Long Short-Term Memory,LSTM) neural network was proposed to realize text-independent voiceprint recognition,using spectrograms to extract voiceprint features as the model input.Spectrograms can represent the frequency and energy information of voice signal in time direction comprehensively,and express more abundant voiceprint features.LSTM neural network is good at capturing temporal features,focusing on the information in time dimension,which is more consistent with the characteristics of voice data compared with other neural network models.The method in this paper combined the long-term learning of LSTM neural network with the sequential feature of voiceprint spectrograms effectively.The experimental results show that 84.31% accuracy is achieved on THCHS-30 voice data set.For three seconds short voice in natural environment,the accuracy of this method is 96.67%,which is better than the existing methods such as Gaussian Mixture Model and Convolutional Neural Network.

Key words: Deep learning, Long Short-Term Memory, Neural network, Spectrogram, Voiceprint recognition

CLC Number: 

  • TP391.4
[1]REYNOLDS D A.An overview of automatic speaker recognition technology[C]//IEEE International Conference on Acoustics.IEEE,2011.
[2]FURUI S.Recent advances in speaker recognition[J].Pattern Recognition Letters,1997,18(9):859-872.
[3]ATAL B S.Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification[J].The Journal of the Acoustical Society of America,1974,55(6):1304-1312.
[4]VERGIN R,O'SHAUGHNESSY D,FARHAT A.Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition[J].IEEE Transactions on Speech and Audio Processing,1999,7(5):525-532.
[5]RABINER L R.A tutorial on hidden Markov models and selected applications in speech recognition[J].Proceedings of the IEEE,1989,77(2):257-286.
[6]REYNOLDS D A,ROSE R C.Robust text-independent speaker identification using Gaussian mixture speaker models[J].IEEE Transactions on Speech and Audio Processing,1995,3(1):72-83.
[7]REYNOLDS D A,QUATIERI T F,DUNN R B.Speaker verification using adapted Gaussian mixture models[J].Digital Signal Processing,2000,10(1/2/3):19-41.
[8]CHEN C,QI F.Review on Development of Convolutional Neural Network and Its Application in Computer Vision[J].Computer Science,2019,46(3):63-73.
[9]GRAVES A,MOHAMED A,HINTON G.Speech recognitionwith deep recurrent neural networks[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:6645-6649.
[10]LOPEZ M I,GONZALEZ D J,PLCHOT O,et al.Automatic language identification using deep neural networks[C]//2014 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2014:5337-5341.
[11]ZHENG C J,WANG C L,JIA N.Survey of Acoustic FeatureExtraction in Speech Tasks[J].Computer Science,2020,47(5):110-119.
[12]ROSENBERG A E,SOONG F K.Evaluation of a vector quanti-
zation talker recognition system in text independent and text dependent modes[J].Computer Speech & Language,1987,2(3/4):143-157.
[13]FURUI S.Cepstral analysis technique for automatic speakerverification[J].IEEE Transactions on Acoustics,Speech and Signal Processing,1981,29(2):254-272.
[14]XIANG B,BERGER T.Efficient text-independent speaker verification with structural Gaussian mixture models and neural network[J].IEEE Transactions on Speech and Audio Processing,2003,11(5):447-456.
[15]LUCK J E.Automatic speaker verification using cepstral mea-surements[J].The Journal of the Acoustical Society of America,1969,46(4B):1026-1032.
[16]RICHARDSON F,REYNOLDS D,DEHAK N.Deep neuralnetwork approaches to speaker and language recognition[J].IEEE Signal Processing Letters,2015,22(10):1671-1675.
[17]HUANG J T,LI J,GONG Y.An analysis of convolutional neural networks for speech recognition[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2015:4989-4993.
[18]HEIGOLD G,MORENO I,BENGIO S,et al.End-to-end text-dependent speaker verification[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2016:5115-5119.
[19]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[20]WU L Q,ZHANG D,LI S S,et al.Multi-modal Emotion Recognition Approach Based on Multi-task Learning[J].Computer Science,2019,46(11):284-290.
[21]HUA M,LI D D,WANG Z,et al.End-to-End Speaker Recognition Based on Frame-level Features[J].Computer Science,2020,47(10):169-173.
[22]WANG D,ZHANG X.Thchs-30:A free chinese speech corpus[J].arXiv:1512.01882,2015.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[3] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[4] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[5] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[6] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[7] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[8] WANG Xin-tong, WANG Xuan, SUN Zhi-xin. Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network [J]. Computer Science, 2022, 49(8): 314-322.
[9] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[10] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[11] WANG Run-an, ZOU Zhao-nian. Query Performance Prediction Based on Physical Operation-level Models [J]. Computer Science, 2022, 49(8): 49-55.
[12] CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[13] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[14] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[15] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!