基于LFBank与FBank混合特征的声纹识别研究

doi:10.11896/jsjkx.211000194

摘要/Abstract

摘要： 语音特征提取是声纹识别过程中的重要步骤,对于声音频率的分布男性与女性差距较大,但现有的特征提取算法并没有针对不同性别声音频率特性做出相应改进。针对上述问题,提出了为女性声纹识别所设计的语音特征提取算法LFBank,将线性滤波器组用于特征提取过程,利用其线性分布的特点弥补传统梅尔滤波器组提取高频区域信息时的不足。另一方面,为了突破单一性别局限,拓宽应用场景,综合线性滤波器组与梅尔滤波器组的优势,将LFBank与FBank特征结合得到混合特征向量进行声纹识别。将LFBank和常用特征FBank与MFCC进行实验对比,实验结果表明,基于线性滤波器组的特征向量在识别女性声音时更有优势。对于混合特征而言,在与单一特征的对比实验中,混合特征能够达到比单一特征更好的识别效果,具有更广泛的应用场景。

关键词: 声纹识别, 特征提取, 声音频率, 线性滤波器组, 梅尔滤波器组, 混合特征

Abstract: Speech feature extraction is an important step in the process of voiceprint recognition.There is a large gap between men and women in the distribution of sound frequency,but the existing feature extraction algorithms have not made correspon-ding improvements for the sound frequency characteristics of different genders.To solve the above problems,a speech feature extraction algorithm LFBank designed for female voiceprint recognition is proposed.The linear filter banks is introduced into the feature extraction process,and its linear distribution is used to make up for the deficiency of the traditional Mel filter banks in extracting high-frequency region information.On the other hand,in order to break through the limitation of single gender and broaden the application scenarios,combining the advantages of linear filter banks and Mel filter banks,LFBank and FBank features are combined to obtain mixed feature vectors for voiceprint recognition.The LFBank is compared with the commonly used feature FBank and MFCC,and experimental results show that the feature vector based on linear filter bank has more advantages in recognizing female voice.For mixed features,in the comparison experiment with single features,they can achieve better recognition effect than single features and have a wider range of application scenarios.

Key words: Voiceprint recognition, Feature extraction, Sound frequency, Linear filter banks, Mel filter banks, Mixed feature

中图分类号:

TN912

崔琳, 王芷悦. 基于LFBank与FBank混合特征的声纹识别研究[J]. 计算机科学, 2022, 49(11A): 211000194-5. https://doi.org/10.11896/jsjkx.211000194

CUI Lin, WANG Zhi-yue. Study on Voiceprint Recognition Based on Mixed Features of LFBank and FBank[J]. Computer Science, 2022, 49(11A): 211000194-5. https://doi.org/10.11896/jsjkx.211000194

参考文献

[1]PODDAR A,SAHIDULLAH M,SAHA G.Speaker verificationwith short utterances:a review of challenges,trends and opportunities[J].Pattern Recognition,2018,7(2):91-101.
[2]HANSEN J H,HASAN T.Speaker recognition by machinesand humans:A Tutorial Review[J].IEEE Signal Processing Magazine,2015,32(6):74-99.
[3]AI J Q,ZUO Y,LIU J X,et al.A hierarchical clustering approach for speech feature extraction based on cosine similarity[J].Application Research of Computers,2020,37(S2):147-149.
[4]CHOWDHURY A,ROSS A.Fusing MFCC and LPC featuresusing 1D triplet CNN for speaker recognition in severely degraded audio signals[J].IEEE Transactions on Information Forensics and Security,2020,15(1):1616-1629.
[5]ZHOU P,SHEN H,ZHENG K P.Speaker recognition based on combination of MFCC and GFCC feature parameters[J].Journal of Applied Sciences,2019,37(1):24-32.
[6]ESTEVA A,ROBICQUET A,RAMSUNDAR B,et al.A guide to deep learning in healthcare[J].Nature Medicine,2019,25(1):24-29.
[7]CARLEO G,CIRAC I,CRANMER K,et al.Machine learningand the physical sciences[J].Reviews of Modern Physics,2019,91(4):045002.
[8]ZHAO F,YU Y.Two-level voiceprint recognition algorithmbased on VQ and HMM[J].Journal of Guilin University of Electronic Technology,2017,37(1):8-14.
[9]ZEINALI H,SAMET H,BURGET L.HMM-based phrase-independent i-vector extractor for text-dependent speaker verification[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2017,25(7):1421-1435.
[10]CHEN N,VILLALBAI J,DEHAK N.An Investigation of Non-linear i-vectors for Speaker Verification[C]//Interspeech.2018:87-91.
[11]WANG J,LI L,WANG D,et al.Research on generalizationproperty of time-varying Fbank-weighted MFCC for i-vector based speaker verification[C]//The 9th International Sympo-sium on Chinese Spoken Language Processing.IEEE,2014:423-423.
[12]WANG G B,ZHANG W Q.An RNN and CRNN based ap-proach to robust voice activity detection[C]//2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.IEEE,2019:1347-1350.
[13]CAO J,CAO M,WANG J,et al.Urban noise recognition with convolutional neural network[J].Multimedia Tools and Applications,2019,78(20):29021-29041.
[14]CHENG T,WANG X,HUANG L,et al.Boundary-preserving mask r-cnn[C]//European Conference on Computer Vision.2020:660-676.
[15]KARITA S,CHEN N,HAYASHI T,et al.A comparative study on transformer vs rnn in speech applications[C]//2019 IEEE Automatic Speech Recognition and Understanding Workshop.IEEE,2019:449-456.
[16]RAVANELLI M,BENGIO Y.Speaker recognition from rawwaveform with sincnet[C]//2018 IEEE Spoken Language Technology Workshop.IEEE,2018:1021-1028.
[17]YU L F,LIU Q.Research and application of deep recurrent neural networks based voiceprint recognition[J].Application Research of Computers,2019,36(1):153-158.
[18]ZHANG C,KOISHIDA K,HANSEN J H.Text-independentspeaker verification based on triplet convolutional neural network embeddings[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2018,26(9):1633-1644.
[19]KUMAR A,SHAHNA W S.Robust detection of vowel onsetand end points[C]//2020 International Conference on Signal Processing and Communications(ICSPCC),IEEE,2020:1-5.
[20]YU Y,SI X,HU C,et al.A review of recurrent neural networks:LSTM cells and network architectures[J].Neural Computation,2019,31(7):1235-1270.
[21]SISMAN B,ZHANG M,LI H Z.Group sparse representationwithwavenet vocoder adaptation for spectrum and prosody conversion[J].IEEE/ACM Trans on Audio,Speech,and Language Processing,2019,27(6):1085-1097.
[22]LUO H T.Pre-processing of speech signal[J].Journal of Fujian Computer,2018,34(5):91-92.
[23]XIE X J.Research on feature combination method in speakerrecognition [D].Xiangtan:Xiangtan University,2016.
[24]LOU C W,CHAN C K,CHENG P H,et al.FFT-based multirate signal processing for 18-band quasi-ansi sl.11 1/3-octave filter bank[J].IEEE Trans on Circuits and Systems II:Express Briefs,2019,66(5):878-882.
[25]SHI L,AHMAD I,HE Y,et al.Hidden Markov model based drone sound recognition using MFCC technique in practical noisyenvironments[J].Journal of Communications and Networks,2018,20(5):509-518.

相关文章 15

[1]	张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[2]	曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[3]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[4]	刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[5]	高元浩, 罗晓清, 张战成. 基于特征分离的红外与可见光图像融合算法 Infrared and Visible Image Fusion Based on Feature Separation 计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148
[6]	左杰格, 柳晓鸣, 蔡兵. 基于图像分块与特征融合的户外图像天气识别 Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion 计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263
[7]	任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[8]	徐玥, 周辉. 简单背景下基于OpenCV的静态手势识别 Static Gesture Recognition Based on OpenCV in Simple Background 计算机科学, 2022, 49(11A): 210800185-6. https://doi.org/10.11896/jsjkx.210800185
[9]	缪岚芯, 雷雨, 曾鹏鹏, 李晓瑜, 宋井宽. 基于粒度感知和语义聚合的图像-文本检索网络 Granularity-aware and Semantic Aggregation Based Image-Text Retrieval Network 计算机科学, 2022, 49(11): 134-140. https://doi.org/10.11896/jsjkx.220600010
[10]	何玉林, 李旭, 金一, 黄哲学. 基于分解极限学习机的手写字符识别方法 Handwritten Character Recognition Based on Decomposition Extreme Learning Machine 计算机科学, 2022, 49(11): 148-155. https://doi.org/10.11896/jsjkx.211200265
[11]	张敏, 余增, 韩云星, 李天瑞. 面向复杂场景的行人重识别综述 Overview of Person Re-identification for Complex Scenes 计算机科学, 2022, 49(10): 138-150. https://doi.org/10.11896/jsjkx.211200207
[12]	张师鹏, 李永忠. 基于降噪自编码器和三支决策的入侵检测方法 Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions 计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059
[13]	冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述 Survey of Research Progress on Cross-modal Retrieval 计算机科学, 2021, 48(8): 13-23. https://doi.org/10.11896/jsjkx.200800165
[14]	暴雨轩, 芦天亮, 杜彦辉, 石达. 基于i_ResNet34模型和数据增强的深度伪造视频检测方法 Deepfake Videos Detection Method Based on i_ResNet34 Model and Data Augmentation 计算机科学, 2021, 48(7): 77-85. https://doi.org/10.11896/jsjkx.210300258
[15]	张丽倩, 李孟航, 高珊珊, 张彩明. 面向计算机辅助舌诊关键问题的解决方案综述 Summary of Computer-assisted Tongue Diagnosis Solutions for Key Problems 计算机科学, 2021, 48(7): 256-269. https://doi.org/10.11896/jsjkx.200800223

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed