Computer Science ›› 2024, Vol. 51 ›› Issue (11A): 231000065-6.doi: 10.11896/jsjkx.231000065

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Language Recognition Based on Improved MFCC and Energy Operator Cepstrum

CHEN Sizhu1,2, LONG Hua1, SHAO Yubin1   

  1. 1 Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
    2 Radio Monitoring Center of Yunnan Province,Kunming 650228,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:CHEN Sizhu,born in 1996,postgra-duate.Her main research interests include wireless signal processing and language recognition.
    LONG Hua,born in 1963,Ph.D,professor,is a member of CCF(No.B3460M).Her main research interests include Audio signal processing and analysis,big data and wireless network.
  • Supported by:
    Yunnan Key Laboratory of Media Convergence Open Fund(320225403).

Abstract: Aiming at the problem of low accuracy and poor robustness of language recognition under low signal-to-noise ratio of broadcast speech signals,a language recognition algorithm based on wavelet packet transform to improve MFCC and energy operator cepstrum features is proposed.Firstly,the WMFCC feature parameters are obtained by using wavelet packet transform instead of Fourier transform and Mel filter in MFCC.On the basis of retaining the auditory perception characteristics of the human ear,the high-frequency analysis ability and analysis accuracy of the speech signal are improved,and the limitations of the Fourier transform are overcomed.Secondly,the Teager energy operator cepstrum is extracted to obtain the characteristics of the instantaneous energy of the speech,which is fused with the improved MFCC feature parameters to obtain a new feature parameter TWMFCC.Finally,in order to further improve the recognition effect of low SNR speech,a VMD adaptive Wiener filtering denoising algorithm is proposed.The experiment compares the recognition effect of the proposed features with the traditional features.The average recognition accuracy of the proposed features is significantly improved,which is 13.02 % higher than that of the traditional MFCC without speech denoising.It effectively alleviates the problem of low recognition accuracy of traditional features under low signal-to-noise ratio,and has strong anti-noise and robustness.

Key words: Language recognition, MFCC, Wavelet packet transform, Energy operator cepstrum, GMM-UBM

CLC Number: 

  • TN912.34
[1]LI H,MA B,LEE K A.Spoken Language Recognition:From Fundamentals to Practice[J].Proceedings of the IEEE,2013,101(5):1136-1159.
[2]DESHWAL D,SANGWAN P,KUMAR D.Feature ExtractionMethods in Language Identification:A Survey[J].Wireless Personal Communications,2019,107(4):2071-2103.
[3]SRINIVAS N S S,SUGAN N,KAR N,et al.Recognition ofSpoken Languages from Acoustic Speech Signals Using Fourier Parameters[J].Circuits,Systems,and Signal Processing,2019,38(11):5018-5067.
[4]TAWAQAL B,SUYANTO S.Recognizing Five Major Dialects in Indonesia Based on MFCC and DRNN[J].Journal of Physics:Conference Series,2021,1844(1):012003.
[5]GUPTA J,PATHAK S,KUMAR G.Deep Learning(CNN) and Transfer Learning:A Review[J].Journal of Physics:Conference Series,2022,2273(1):012029.
[6]BISWAS M,RAHAMAN S,AHMADIAN A,et al.Automaticspoken language identification using MFCC based time series features[J].MultimediaTools and Applications,2023,82(7):9565-9595.
[7]DEEPTI D,PARDEEP S,DIVYA K.A Language Identification System using Hybrid Features and Back-Propagation Neural Network[J].Applied Acoustics,2020,164:107289.
[8]ZHU J,LIU Z.Analysis of Hybrid Feature Research Based on Extraction LPCC and MFCC[C]//2014 Tenth International Conference on Computational Intelligence and Security.2014:732-735.
[9]TZUDIR M,BAGHEL S,SARMAH P,et al.Analyzing RMFCC Feature for Dialect Identification inAo,an Under-Resourced Language[C]//2022 National Conference on Communications(NCC).2022:308-313.
[10]SUYANTO S,ARIFIANTO A,SIRWAN A,et al.End-to-End Speech Recognition Models for a Low-Resourced Indonesian Language[C]//2020 8thInternational Conference on Information and Communication Technology(ICoICT).Yogyakarta,Indonesia:IEEE,2020:1-6.
[11]ALKHATIB B,KAMAL EDDIN M M W.Voice Identification Using MFCC and Vector Quantization[J].Baghdad Science Journal,2020,17(3(Suppl.)):1019.
[12]MANCHALA S,KAMAKSHI PRASAD V,JANAKI V.GMM based language identification system using robust features[J].International Journal of Speech Technology,2014,17(2):99-105.
[13]MUKHERJEE H,OBAIDULLAH S M,SANTOSH K C,et al.A lazy learning-based language identification from speech using MFCC-2 features[J].International Journal of Machine Learning and Cybernetics,2020,11(1):1-14.
[14]SANGWAN P,DESHWAL D,DAHIYA N.Performance of a language identification system using hybrid features and ANN learning algorithms[J].Applied Acoustics,2021,175:107815.
[15]LIU X,CHEN C,HE Y.Temporal feature extraction based on CNN-BLSTM and temporal pooling for language identification[J].Applied Acoustics,2022,195:108854.
[16]LIU J,SHAO Y,LONG H,et al.Language identification basedon GFCC and energy operator cepstrum[J].Journal of Yunnan University(Natural Science Edition).2022,44(2):254-261.
[17]SHI Y,BAl J.Speech recognition combining CFCC and Teager energy operator cepstral coefficients [J].Computer Science,2019,46(5):286-289.
[18]FAROOQ O,DATTA S.Mel filter-like admissible waveletpacket structure for speech recognition[J].IEEE Signal Processing Letters,2001,8(7):196-198.
[19]PRÉAUX Y,BOUDRAA A O,LARKIN K G.On the positivity of Teager-Kaiser's energy operator[J].Signal Processing,2022,201:108702.
[20]ABD EL-FATTAH M A,DESSOUKY M I,ABBAS A M,et al.Speech enhancement with an adaptive Wiener filter[J].International Journal of Speech Technology,2014,17(1):53-64.
[21]DOUGLAS A R,RICHARD C R.Robust Text-IndependentSpeaker Identification Using Gaussian Mixture Speaker Models[J].IEEE Transactions on Speech and Audio Processing,1995,3(1):72-83.
[22]REYNOLDS D A,QUATIERI T F,DUNN R B.Speaker Verification Using Adapted Gaussian Mixture Models[J].Digital Signal Processing,2000,10(1/2/33):19-41.
[1] CUI Lin, CUI Chenlu, LIU Zhengwei, XUE Kai. Speech Emotion Recognition Based on Improved MFCC and Parallel Hybrid Model [J]. Computer Science, 2023, 50(6A): 220800211-7.
[2] WANG Xueguang, ZHU Junwen, ZHANG Aixin. Three-dimensional AI Clone Speech Source Identification Method Based on Improved MFCCFeature Model [J]. Computer Science, 2023, 50(11): 177-184.
[3] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[4] WANG Xue-guang, ZHU Jun-wen, ZHANG Ai-xin. Identification Method of Voiceprint Identity Based on ARIMA Prediction of MFCC Features [J]. Computer Science, 2022, 49(5): 92-97.
[5] WANG Shuai, ZHANG Shu-jun, YE Kang, GUO Qi. Continuous Sign Language Recognition Method Based on Improved Transformer [J]. Computer Science, 2022, 49(11A): 211200198-6.
[6] XIAO Zheng-ye, LIN Shi-quan, WAN Xiu-an, FANGYu-chun, NI Lan. Temporal Relation Guided Knowledge Distillation for Continuous Sign Language Recognition [J]. Computer Science, 2022, 49(11): 156-162.
[7] GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng. Review of Sign Language Recognition, Translation and Generation [J]. Computer Science, 2021, 48(3): 60-70.
[8] RAN Meng-yuan, LIU Li, LI Yan-de, WANG Shan-shan. Deaf Sign Language Recognition Based on Inertial Sensor Fusion Control Algorithm [J]. Computer Science, 2021, 48(2): 231-237.
[9] WANG Xue-guang, ZHU Jun-wen, ZHANG Ai-xin. Identification Method of Voiceprint Identity Based on MFCC Features [J]. Computer Science, 2021, 48(12): 343-348.
[10] TIAN Chun-yuan, YU Jiang, CHANG Jun, WANG Yan-shun. NWI:CSI Based Non-line-of-sight Signal Recognition Method [J]. Computer Science, 2020, 47(11): 327-332.
[11] LI Kun, LI Xiang-feng. Lamp Language Recognition Technology Based on Daytime Driving [J]. Computer Science, 2019, 46(11A): 277-282.
[12] JIANG Xian-wei, ZHANG Miao-xian, ZHU Zhao-song. Recognition of Chinese Finger Sign Language Based on Gray Level Co-occurrence Matrix and Fine Gaussian Support Vector Machine [J]. Computer Science, 2019, 46(11A): 303-308.
[13] XU Xin-xin, HUANG Yuan-yuan, HU Zuo-jin. Extraction Algorithm of Key Actions in Continuous and Complex Sign Language [J]. Computer Science, 2018, 45(11A): 189-193.
[14] LIANG Wen-le, HUANG Yuan-yuan and HU Zuo-jin. Real-time Dynamic Sign Language Recognition Based on Hierarchical Matching Strategy [J]. Computer Science, 2017, 44(7): 299-303.
[15] LI Jin-hui,YANG Jun-an and WANG Yi. New Feature Extraction Method Based on Bottleneck Deep Belief Networks and its Application in Language Recognition [J]. Computer Science, 2014, 41(3): 263-266.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!