Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 211000194-5.doi: 10.11896/jsjkx.211000194

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Study on Voiceprint Recognition Based on Mixed Features of LFBank and FBank

CUI Lin, WANG Zhi-yue   

  1. School of Electronic Information,Xi’an Polytechnic University,Xi’an 710699,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:CUI Lin,born in 1984,lecturer.Her main research interests include speech signal processing,array signal proces-sing and so on.
  • Supported by:
    National Natural Science Foundation of China(61901347).

Abstract: Speech feature extraction is an important step in the process of voiceprint recognition.There is a large gap between men and women in the distribution of sound frequency,but the existing feature extraction algorithms have not made correspon-ding improvements for the sound frequency characteristics of different genders.To solve the above problems,a speech feature extraction algorithm LFBank designed for female voiceprint recognition is proposed.The linear filter banks is introduced into the feature extraction process,and its linear distribution is used to make up for the deficiency of the traditional Mel filter banks in extracting high-frequency region information.On the other hand,in order to break through the limitation of single gender and broaden the application scenarios,combining the advantages of linear filter banks and Mel filter banks,LFBank and FBank features are combined to obtain mixed feature vectors for voiceprint recognition.The LFBank is compared with the commonly used feature FBank and MFCC,and experimental results show that the feature vector based on linear filter bank has more advantages in recognizing female voice.For mixed features,in the comparison experiment with single features,they can achieve better recognition effect than single features and have a wider range of application scenarios.

Key words: Voiceprint recognition, Feature extraction, Sound frequency, Linear filter banks, Mel filter banks, Mixed feature

CLC Number: 

  • TN912
[1]PODDAR A,SAHIDULLAH M,SAHA G.Speaker verificationwith short utterances:a review of challenges,trends and opportunities[J].Pattern Recognition,2018,7(2):91-101.
[2]HANSEN J H,HASAN T.Speaker recognition by machinesand humans:A Tutorial Review[J].IEEE Signal Processing Magazine,2015,32(6):74-99.
[3]AI J Q,ZUO Y,LIU J X,et al.A hierarchical clustering approach for speech feature extraction based on cosine similarity[J].Application Research of Computers,2020,37(S2):147-149.
[4]CHOWDHURY A,ROSS A.Fusing MFCC and LPC featuresusing 1D triplet CNN for speaker recognition in severely degraded audio signals[J].IEEE Transactions on Information Forensics and Security,2020,15(1):1616-1629.
[5]ZHOU P,SHEN H,ZHENG K P.Speaker recognition based on combination of MFCC and GFCC feature parameters[J].Journal of Applied Sciences,2019,37(1):24-32.
[6]ESTEVA A,ROBICQUET A,RAMSUNDAR B,et al.A guide to deep learning in healthcare[J].Nature Medicine,2019,25(1):24-29.
[7]CARLEO G,CIRAC I,CRANMER K,et al.Machine learningand the physical sciences[J].Reviews of Modern Physics,2019,91(4):045002.
[8]ZHAO F,YU Y.Two-level voiceprint recognition algorithmbased on VQ and HMM[J].Journal of Guilin University of Electronic Technology,2017,37(1):8-14.
[9]ZEINALI H,SAMET H,BURGET L.HMM-based phrase-independent i-vector extractor for text-dependent speaker verification[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2017,25(7):1421-1435.
[10]CHEN N,VILLALBAI J,DEHAK N.An Investigation of Non-linear i-vectors for Speaker Verification[C]//Interspeech.2018:87-91.
[11]WANG J,LI L,WANG D,et al.Research on generalizationproperty of time-varying Fbank-weighted MFCC for i-vector based speaker verification[C]//The 9th International Sympo-sium on Chinese Spoken Language Processing.IEEE,2014:423-423.
[12]WANG G B,ZHANG W Q.An RNN and CRNN based ap-proach to robust voice activity detection[C]//2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.IEEE,2019:1347-1350.
[13]CAO J,CAO M,WANG J,et al.Urban noise recognition with convolutional neural network[J].Multimedia Tools and Applications,2019,78(20):29021-29041.
[14]CHENG T,WANG X,HUANG L,et al.Boundary-preserving mask r-cnn[C]//European Conference on Computer Vision.2020:660-676.
[15]KARITA S,CHEN N,HAYASHI T,et al.A comparative study on transformer vs rnn in speech applications[C]//2019 IEEE Automatic Speech Recognition and Understanding Workshop.IEEE,2019:449-456.
[16]RAVANELLI M,BENGIO Y.Speaker recognition from rawwaveform with sincnet[C]//2018 IEEE Spoken Language Technology Workshop.IEEE,2018:1021-1028.
[17]YU L F,LIU Q.Research and application of deep recurrent neural networks based voiceprint recognition[J].Application Research of Computers,2019,36(1):153-158.
[18]ZHANG C,KOISHIDA K,HANSEN J H.Text-independentspeaker verification based on triplet convolutional neural network embeddings[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2018,26(9):1633-1644.
[19]KUMAR A,SHAHNA W S.Robust detection of vowel onsetand end points[C]//2020 International Conference on Signal Processing and Communications(ICSPCC),IEEE,2020:1-5.
[20]YU Y,SI X,HU C,et al.A review of recurrent neural networks:LSTM cells and network architectures[J].Neural Computation,2019,31(7):1235-1270.
[21]SISMAN B,ZHANG M,LI H Z.Group sparse representationwithwavenet vocoder adaptation for spectrum and prosody conversion[J].IEEE/ACM Trans on Audio,Speech,and Language Processing,2019,27(6):1085-1097.
[22]LUO H T.Pre-processing of speech signal[J].Journal of Fujian Computer,2018,34(5):91-92.
[23]XIE X J.Research on feature combination method in speakerrecognition [D].Xiangtan:Xiangtan University,2016.
[24]LOU C W,CHAN C K,CHENG P H,et al.FFT-based multirate signal processing for 18-band quasi-ansi sl.11 1/3-octave filter bank[J].IEEE Trans on Circuits and Systems II:Express Briefs,2019,66(5):878-882.
[25]SHI L,AHMAD I,HE Y,et al.Hidden Markov model based drone sound recognition using MFCC technique in practical noisyenvironments[J].Journal of Communications and Networks,2018,20(5):509-518.
[1] ZHANG Yuan, KANG Le, GONG Zhao-hui, ZHANG Zhi-hong. Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM [J]. Computer Science, 2022, 49(7): 31-39.
[2] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[3] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[4] LIU Wei-ye, LU Hui-min, LI Yu-peng, MA Ning. Survey on Finger Vein Recognition Research [J]. Computer Science, 2022, 49(6A): 1-11.
[5] GAO Yuan-hao, LUO Xiao-qing, ZHANG Zhan-cheng. Infrared and Visible Image Fusion Based on Feature Separation [J]. Computer Science, 2022, 49(5): 58-63.
[6] ZUO Jie-ge, LIU Xiao-ming, CAI Bing. Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion [J]. Computer Science, 2022, 49(3): 197-203.
[7] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[8] XU Yue, ZHOU Hui1 School of Computer Science, Technology, Xi’an Jiaotong University, Xi’an 710049, China. Static Gesture Recognition Based on OpenCV in Simple Background [J]. Computer Science, 2022, 49(11A): 210800185-6.
[9] MIAO Lan-xin, LEI Yu, ZENG Peng-peng, LI Xiao-yu, SONG Jing-kuan. Granularity-aware and Semantic Aggregation Based Image-Text Retrieval Network [J]. Computer Science, 2022, 49(11): 134-140.
[10] HE Yu-lin, LI Xu, JIN Yi, HUANG Zhe-xue. Handwritten Character Recognition Based on Decomposition Extreme Learning Machine [J]. Computer Science, 2022, 49(11): 148-155.
[11] ZHANG Min, YU Zeng, HAN Yun-xing, LI Tian-rui. Overview of Person Re-identification for Complex Scenes [J]. Computer Science, 2022, 49(10): 138-150.
[12] ZHANG Shi-peng, LI Yong-zhong. Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions [J]. Computer Science, 2021, 48(9): 345-351.
[13] FENG Xia, HU Zhi-yi, LIU Cai-hua. Survey of Research Progress on Cross-modal Retrieval [J]. Computer Science, 2021, 48(8): 13-23.
[14] BAO Yu-xuan, LU Tian-liang, DU Yan-hui, SHI Da. Deepfake Videos Detection Method Based on i_ResNet34 Model and Data Augmentation [J]. Computer Science, 2021, 48(7): 77-85.
[15] ZHANG Li-qian, LI Meng-hang, GAO Shan-shan, ZHANG Cai-ming. Summary of Computer-assisted Tongue Diagnosis Solutions for Key Problems [J]. Computer Science, 2021, 48(7): 256-269.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!