计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231000065-6.doi: 10.11896/jsjkx.231000065
陈思竹1,2, 龙华1, 邵玉斌1
CHEN Sizhu1,2, LONG Hua1, SHAO Yubin1
摘要: 针对广播语音信号低信噪比下语种识别准确率低和鲁棒性差的问题,提出了基于小波包变换改进MFCC和能量算子倒谱特征的语种识别算法。首先,采用小波包变换代替MFCC中的傅里叶变换和Mel滤波得到WMFCC特征参数。在保留人耳听觉感知特性的基础上提升语音信号的高频分析能力和分析精确度,克服傅里叶变换的局限性。其次,提取Teager能量算子倒谱,得到语音瞬时能量的特性,与改进的MFCC特征参数融合得到新的特征参数TWMFCC。最后,为进一步提升低信噪比语音的识别效果,提出了VMD自适应维纳滤波去噪算法。通过实验对比了所提特征与传统特征的识别效果,所提特征的平均识别准确率显著提升,带噪语音在未进行语音去噪处理的情况下较传统MFCC高13.02%,有效改善了传统特征在低信噪比下识别准确率低的问题,具有较强的抗噪性和鲁棒性。
中图分类号:
[1]LI H,MA B,LEE K A.Spoken Language Recognition:From Fundamentals to Practice[J].Proceedings of the IEEE,2013,101(5):1136-1159. [2]DESHWAL D,SANGWAN P,KUMAR D.Feature ExtractionMethods in Language Identification:A Survey[J].Wireless Personal Communications,2019,107(4):2071-2103. [3]SRINIVAS N S S,SUGAN N,KAR N,et al.Recognition ofSpoken Languages from Acoustic Speech Signals Using Fourier Parameters[J].Circuits,Systems,and Signal Processing,2019,38(11):5018-5067. [4]TAWAQAL B,SUYANTO S.Recognizing Five Major Dialects in Indonesia Based on MFCC and DRNN[J].Journal of Physics:Conference Series,2021,1844(1):012003. [5]GUPTA J,PATHAK S,KUMAR G.Deep Learning(CNN) and Transfer Learning:A Review[J].Journal of Physics:Conference Series,2022,2273(1):012029. [6]BISWAS M,RAHAMAN S,AHMADIAN A,et al.Automaticspoken language identification using MFCC based time series features[J].MultimediaTools and Applications,2023,82(7):9565-9595. [7]DEEPTI D,PARDEEP S,DIVYA K.A Language Identification System using Hybrid Features and Back-Propagation Neural Network[J].Applied Acoustics,2020,164:107289. [8]ZHU J,LIU Z.Analysis of Hybrid Feature Research Based on Extraction LPCC and MFCC[C]//2014 Tenth International Conference on Computational Intelligence and Security.2014:732-735. [9]TZUDIR M,BAGHEL S,SARMAH P,et al.Analyzing RMFCC Feature for Dialect Identification inAo,an Under-Resourced Language[C]//2022 National Conference on Communications(NCC).2022:308-313. [10]SUYANTO S,ARIFIANTO A,SIRWAN A,et al.End-to-End Speech Recognition Models for a Low-Resourced Indonesian Language[C]//2020 8thInternational Conference on Information and Communication Technology(ICoICT).Yogyakarta,Indonesia:IEEE,2020:1-6. [11]ALKHATIB B,KAMAL EDDIN M M W.Voice Identification Using MFCC and Vector Quantization[J].Baghdad Science Journal,2020,17(3(Suppl.)):1019. [12]MANCHALA S,KAMAKSHI PRASAD V,JANAKI V.GMM based language identification system using robust features[J].International Journal of Speech Technology,2014,17(2):99-105. [13]MUKHERJEE H,OBAIDULLAH S M,SANTOSH K C,et al.A lazy learning-based language identification from speech using MFCC-2 features[J].International Journal of Machine Learning and Cybernetics,2020,11(1):1-14. [14]SANGWAN P,DESHWAL D,DAHIYA N.Performance of a language identification system using hybrid features and ANN learning algorithms[J].Applied Acoustics,2021,175:107815. [15]LIU X,CHEN C,HE Y.Temporal feature extraction based on CNN-BLSTM and temporal pooling for language identification[J].Applied Acoustics,2022,195:108854. [16]LIU J,SHAO Y,LONG H,et al.Language identification basedon GFCC and energy operator cepstrum[J].Journal of Yunnan University(Natural Science Edition).2022,44(2):254-261. [17]SHI Y,BAl J.Speech recognition combining CFCC and Teager energy operator cepstral coefficients [J].Computer Science,2019,46(5):286-289. [18]FAROOQ O,DATTA S.Mel filter-like admissible waveletpacket structure for speech recognition[J].IEEE Signal Processing Letters,2001,8(7):196-198. [19]PRÉAUX Y,BOUDRAA A O,LARKIN K G.On the positivity of Teager-Kaiser's energy operator[J].Signal Processing,2022,201:108702. [20]ABD EL-FATTAH M A,DESSOUKY M I,ABBAS A M,et al.Speech enhancement with an adaptive Wiener filter[J].International Journal of Speech Technology,2014,17(1):53-64. [21]DOUGLAS A R,RICHARD C R.Robust Text-IndependentSpeaker Identification Using Gaussian Mixture Speaker Models[J].IEEE Transactions on Speech and Audio Processing,1995,3(1):72-83. [22]REYNOLDS D A,QUATIERI T F,DUNN R B.Speaker Verification Using Adapted Gaussian Mixture Models[J].Digital Signal Processing,2000,10(1/2/33):19-41. |
|