计算机科学 ›› 2018, Vol. 45 ›› Issue (4): 278-284.doi: 10.11896/j.issn.1002-137X.2018.04.047

• 图形图像与模式识别 • 上一篇    下一篇

基于小波变换和倒谱分析的腭裂高鼻音等级自动识别

赵利博,刘奇,付方玲,何凌   

  1. 四川大学电气信息学院 成都610065,四川大学电气信息学院 成都610065,四川大学电气信息学院 成都610065,四川大学电气信息学院 成都610065
  • 出版日期:2018-04-15 发布日期:2018-05-11
  • 基金资助:
    本文受国家自然基金青年科学基金项目(61503264)资助

Automatic Detection of Hypernasality Grades Based on Discrete Wavelet Transformation and Cepstrum Analysis

ZHAO Li-bo, LIU Qi, FU Fang-ling and HE Ling   

  • Online:2018-04-15 Published:2018-05-11

摘要: 为实现对腭裂高鼻音等级的自动识别,通过对语音信号小波处理和特征提取方法的综合研究,提出基于小波分解系数倒谱特征的腭裂高鼻音等级自动识别算法。目前,研究人员对腭裂语音的研究多基于MFCC、Teager能量、香农能量等特征,识别正确率偏低,且计算量过大。文中对4种等级腭裂高鼻音的1789个元音\a\语音数据提取小波分解系数倒谱特征参数,使用KNN分类器对4种不同等级的高鼻音进行自动识别,将识别结果与MFCC、LPCC、基音周期、共振峰和短时能量共5种经典声学特征的识别结果作比较,同时使用SVM分类器对不同等级的腭裂高鼻音进行自动识别,并与KNN分类器进行对比。实验结果表明,基于小波分解系数倒谱特征的识别结果优于经典声学特征,且KNN分类器的识别结果优于SVM分类器。小波分解系数倒谱特征在KNN中的识别率最高达到91.67%,在SVM中达到87.60%,经典声学特征在KNN分类器中的识别率为21.69%~84.54%,在SVM中的识别率为30.61%~78.24%。

关键词: 腭裂,高鼻音,分类系统,小波分解系数倒谱

Abstract: This paper proposed an automatic hypernasality grades classification algorithm in cleft palate speech based on discrete wavelet decomposition coefficients and cepstrum analysis.Currently,the widely used features to classify hypernasality grades include MFCC,Teager energy,Shannon energy and so on.However,the classification accuracy is low,and the computation amount is large.The speech data tested in this work include 1789 Mandarin syllables with the final \a\,which are spoken by cleft palate patients with four grades of hypernasality.The wavelet decomposition coefficientcepstrum was extracted as the acoustic feature,and then KNN classifier was applied to identify four grades of hyperna-sality.The classification performance was compared with five acoustic features:MFCC,LPCC,pitch period,formant and short-time energy.Meanwhile,the performance of KNN was compared with SVM classifier.The experimental results indicate that the recognition accuracy obtained by using wavelet decomposition coefficient cepstrum feature is higher than that obtained by using five classical acoustics features.The classification accuracy is higher when using KNN than SVM classifier.Recognition accuracy obtained by using wavelet decomposition coefficient cepstrum feature combined with KNN is 91.67%,and 87.60% combined with SVM.Recognition accuracy using classical acoustics features combined with KNN is only 21.69%~84.54%,and 30.61%~78.24% combined with SVM.

Key words: Cleft palate,Hypernasality,Recognition system,Wavelet decomposition coefficient cepstrum

[1] CHEN R J.The state and consider about speech therapy of cleft palate in China[J].International Journal of Stomatology,2012,39(1):1-5.
[2] ARIAS-LONDOO J D,GODINO-LLORENTE J I,SAeNz-LECHON N,et al.Automatic Detection of Pathological Voices Using Complexity Measures,Noise Parameters,and Mel-Cepstral Coefficients[J].IEEE Transactions on Bio-medical Engineering,2011,58(2):370-379.
[3] SMYTH A.Clinical grading system for submucous cleft palate[J].British Journal of Oral & Maxillofacial Surgery,2014,52(3):275-276.
[4] VILLAFUERTE GONZALEZ R,et al.Acoustic analysis ofvoice in children with cleft palate and velopharyngeal insufficiency[J].International Journal of Pediatric Otorhinolaryngology,2015,79(7):1073-1076.
[5] MAIER A,HONIG F,HACKER C,et al.Automatic evaluation of characteristic speech disorders in children with cleft lip and palate[C]∥Conference of the International Speech Communication Association.Brisbance,Australia,2008:270-278.
[6] ARROYAVE J R O,BONILLA J F V,et al.Automatic detection of hypernasality in children[C]∥International Conference on Interplay Between Natural and Artificial Computation:New Challenges on Bioinspired Applications.Spain,2011:167-174.
[7] HE L,ZHANG J,LIU Q,et al.Automatic Evaluation of Hypernasality and Consonant Misarticulation in Cleft Palate Speech[J].IEEE Signal Processing Letters,2014,21(10):1298-1301.
[8] KUMARI V S R,DEVARAKONDA D K.A Wavelet Based Denoising of Speech Signal[J].International Journal of Enginee-ring Trends & Technology,2013,5(2):107-115.
[9] CHEN Z,WANG S,YIN F.A Time Delay Estimation Method Based on Wavelet Transform and Speech Envelope for Distributed Microphone Arrays[J].Advances in Electrical & Computer Engineering,2013,13(3):39-44.
[10] MALLAT S G.A Theory for Multiresolution Signal Decomposition:The Wavelet Representation[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,1989,1(7):674-693.
[11] 成礼智,王红霞.小波的理论与应用[M].北京:科学出版社,2004.
[12] 刘明才.小波分析及其应用(第2版)[M].北京:清华大学出版社,2013.
[13] ZHAO L.Speech Signal Processing[M].Beijing:China Machine Press,2012.
[14] DAVE N.Feature Extraction Methods LPC,PLP and MFCC In Speech Recognition[J].Ijaret Org,2013,1(6):1-5.
[15] ALI Z,ABBAS A W,THASLEEMA A W,et al.Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN[J].International Journal of Speech Technology,2015,18(2):271-275.
[16] AARON M,GANESH B,RATNADEEP R.Automatic SpeechRecognition and Verification using LPC,MFCC and SVM[J].International Journal of Computer Applications,2015,127(8):47-52.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!