计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231000065-6.doi: 10.11896/jsjkx.231000065

• 图像处理&多媒体技术 • 上一篇    下一篇

基于改进MFCC和能量算子倒谱的语种识别

陈思竹1,2, 龙华1, 邵玉斌1   

  1. 1 昆明理工大学信息工程与自动化学院 昆明 650500
    2 云南省无线电监测中心 昆明 650228
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 龙华(1670931890@qq.com)
  • 作者简介:(1652720478@qq.com)
  • 基金资助:
    云南省媒体融合重点实验室开放基金(320225403)

Language Recognition Based on Improved MFCC and Energy Operator Cepstrum

CHEN Sizhu1,2, LONG Hua1, SHAO Yubin1   

  1. 1 Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
    2 Radio Monitoring Center of Yunnan Province,Kunming 650228,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:CHEN Sizhu,born in 1996,postgra-duate.Her main research interests include wireless signal processing and language recognition.
    LONG Hua,born in 1963,Ph.D,professor,is a member of CCF(No.B3460M).Her main research interests include Audio signal processing and analysis,big data and wireless network.
  • Supported by:
    Yunnan Key Laboratory of Media Convergence Open Fund(320225403).

摘要: 针对广播语音信号低信噪比下语种识别准确率低和鲁棒性差的问题,提出了基于小波包变换改进MFCC和能量算子倒谱特征的语种识别算法。首先,采用小波包变换代替MFCC中的傅里叶变换和Mel滤波得到WMFCC特征参数。在保留人耳听觉感知特性的基础上提升语音信号的高频分析能力和分析精确度,克服傅里叶变换的局限性。其次,提取Teager能量算子倒谱,得到语音瞬时能量的特性,与改进的MFCC特征参数融合得到新的特征参数TWMFCC。最后,为进一步提升低信噪比语音的识别效果,提出了VMD自适应维纳滤波去噪算法。通过实验对比了所提特征与传统特征的识别效果,所提特征的平均识别准确率显著提升,带噪语音在未进行语音去噪处理的情况下较传统MFCC高13.02%,有效改善了传统特征在低信噪比下识别准确率低的问题,具有较强的抗噪性和鲁棒性。

关键词: 语种识别, MFCC, 小波包变换, 能量算子倒谱, GMM-UBM

Abstract: Aiming at the problem of low accuracy and poor robustness of language recognition under low signal-to-noise ratio of broadcast speech signals,a language recognition algorithm based on wavelet packet transform to improve MFCC and energy operator cepstrum features is proposed.Firstly,the WMFCC feature parameters are obtained by using wavelet packet transform instead of Fourier transform and Mel filter in MFCC.On the basis of retaining the auditory perception characteristics of the human ear,the high-frequency analysis ability and analysis accuracy of the speech signal are improved,and the limitations of the Fourier transform are overcomed.Secondly,the Teager energy operator cepstrum is extracted to obtain the characteristics of the instantaneous energy of the speech,which is fused with the improved MFCC feature parameters to obtain a new feature parameter TWMFCC.Finally,in order to further improve the recognition effect of low SNR speech,a VMD adaptive Wiener filtering denoising algorithm is proposed.The experiment compares the recognition effect of the proposed features with the traditional features.The average recognition accuracy of the proposed features is significantly improved,which is 13.02 % higher than that of the traditional MFCC without speech denoising.It effectively alleviates the problem of low recognition accuracy of traditional features under low signal-to-noise ratio,and has strong anti-noise and robustness.

Key words: Language recognition, MFCC, Wavelet packet transform, Energy operator cepstrum, GMM-UBM

中图分类号: 

  • TN912.34
[1]LI H,MA B,LEE K A.Spoken Language Recognition:From Fundamentals to Practice[J].Proceedings of the IEEE,2013,101(5):1136-1159.
[2]DESHWAL D,SANGWAN P,KUMAR D.Feature ExtractionMethods in Language Identification:A Survey[J].Wireless Personal Communications,2019,107(4):2071-2103.
[3]SRINIVAS N S S,SUGAN N,KAR N,et al.Recognition ofSpoken Languages from Acoustic Speech Signals Using Fourier Parameters[J].Circuits,Systems,and Signal Processing,2019,38(11):5018-5067.
[4]TAWAQAL B,SUYANTO S.Recognizing Five Major Dialects in Indonesia Based on MFCC and DRNN[J].Journal of Physics:Conference Series,2021,1844(1):012003.
[5]GUPTA J,PATHAK S,KUMAR G.Deep Learning(CNN) and Transfer Learning:A Review[J].Journal of Physics:Conference Series,2022,2273(1):012029.
[6]BISWAS M,RAHAMAN S,AHMADIAN A,et al.Automaticspoken language identification using MFCC based time series features[J].MultimediaTools and Applications,2023,82(7):9565-9595.
[7]DEEPTI D,PARDEEP S,DIVYA K.A Language Identification System using Hybrid Features and Back-Propagation Neural Network[J].Applied Acoustics,2020,164:107289.
[8]ZHU J,LIU Z.Analysis of Hybrid Feature Research Based on Extraction LPCC and MFCC[C]//2014 Tenth International Conference on Computational Intelligence and Security.2014:732-735.
[9]TZUDIR M,BAGHEL S,SARMAH P,et al.Analyzing RMFCC Feature for Dialect Identification inAo,an Under-Resourced Language[C]//2022 National Conference on Communications(NCC).2022:308-313.
[10]SUYANTO S,ARIFIANTO A,SIRWAN A,et al.End-to-End Speech Recognition Models for a Low-Resourced Indonesian Language[C]//2020 8thInternational Conference on Information and Communication Technology(ICoICT).Yogyakarta,Indonesia:IEEE,2020:1-6.
[11]ALKHATIB B,KAMAL EDDIN M M W.Voice Identification Using MFCC and Vector Quantization[J].Baghdad Science Journal,2020,17(3(Suppl.)):1019.
[12]MANCHALA S,KAMAKSHI PRASAD V,JANAKI V.GMM based language identification system using robust features[J].International Journal of Speech Technology,2014,17(2):99-105.
[13]MUKHERJEE H,OBAIDULLAH S M,SANTOSH K C,et al.A lazy learning-based language identification from speech using MFCC-2 features[J].International Journal of Machine Learning and Cybernetics,2020,11(1):1-14.
[14]SANGWAN P,DESHWAL D,DAHIYA N.Performance of a language identification system using hybrid features and ANN learning algorithms[J].Applied Acoustics,2021,175:107815.
[15]LIU X,CHEN C,HE Y.Temporal feature extraction based on CNN-BLSTM and temporal pooling for language identification[J].Applied Acoustics,2022,195:108854.
[16]LIU J,SHAO Y,LONG H,et al.Language identification basedon GFCC and energy operator cepstrum[J].Journal of Yunnan University(Natural Science Edition).2022,44(2):254-261.
[17]SHI Y,BAl J.Speech recognition combining CFCC and Teager energy operator cepstral coefficients [J].Computer Science,2019,46(5):286-289.
[18]FAROOQ O,DATTA S.Mel filter-like admissible waveletpacket structure for speech recognition[J].IEEE Signal Processing Letters,2001,8(7):196-198.
[19]PRÉAUX Y,BOUDRAA A O,LARKIN K G.On the positivity of Teager-Kaiser's energy operator[J].Signal Processing,2022,201:108702.
[20]ABD EL-FATTAH M A,DESSOUKY M I,ABBAS A M,et al.Speech enhancement with an adaptive Wiener filter[J].International Journal of Speech Technology,2014,17(1):53-64.
[21]DOUGLAS A R,RICHARD C R.Robust Text-IndependentSpeaker Identification Using Gaussian Mixture Speaker Models[J].IEEE Transactions on Speech and Audio Processing,1995,3(1):72-83.
[22]REYNOLDS D A,QUATIERI T F,DUNN R B.Speaker Verification Using Adapted Gaussian Mixture Models[J].Digital Signal Processing,2000,10(1/2/33):19-41.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!