计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 11-19.doi: 10.11896/jsjkx.230700161

• 学科前沿 • 上一篇    下一篇

构音障碍说话人自适应研究进展及展望

康新晨1, 董雪燕1, 姚登峰1,2,3, 钟经华1   

  1. 1 北京联合大学北京市信息服务工程重点实验室 北京 100101
    2 清华大学人文学院计算语言学实验室 北京 100084
    3 清华大学心理学与认知科学研究中心 北京 100084
  • 收稿日期:2023-07-20 修回日期:2023-12-02 出版日期:2024-08-15 发布日期:2024-08-13
  • 通讯作者: 董雪燕(tjtxueyan@buu.edu.cn)
  • 作者简介:(kxc4088@163.com)
  • 基金资助:
    北京市自然科学基金(4202028);国家语言文字工作委员会项目(YB145-25);国家自然科学基金(62036001);国家社会科学基金(21BYY106, 21&ZD292);2019年度北京市教育委员会科技一般项目(KM201911417005)

Advancements and Prospects in Dysarthria Speaker Adaptation

KANG Xinchen1, DONG Xueyan1, YAO Dengfeng1,2,3, ZHONG Jinghua1   

  1. 1 Beijing Key Laboratory of Information Service Engineering,Beijing Union University,Beijing 100101,China
    2 Lab of Computational Linguistics,School of Humanities,Tsinghua University,Beijing 100084,China
    3 Center for Psychology and Cognitive Science,Tsinghua University,Beijing 100084,China
  • Received:2023-07-20 Revised:2023-12-02 Online:2024-08-15 Published:2024-08-13
  • About author:KANG Xinchen,born in 1996,postgra-duate,is a member of CCF(No.P5697G).Her main research interests include information accessibility and speech recognition.
    DONG Xueyan,born in 1986,Ph.D,se-nior lecturer.Her main research interests include information accessibility and speech recognition.
  • Supported by:
    Natural Science Foundation of Beijing,China(4202028),General Project of the National Language Committee(YB145-25),National Natural Science Foundation of China (62036001),National Social Science Foundation of China(21BYY106,21&ZD292) and 2019 Science and Technology Plan of Beijing Municipal Education Commission(KM201911417005).

摘要: 自动化语音识别工具让构音障碍者和正常人的沟通变得顺畅,因此,近年来构音障碍语音识别成为了一项热门研究。构音障碍语音识别的研究包括:收集构音障碍者和正常人的发音数据,对构音障碍者和正常人的语音进行声学特征表示,利用机器学习模型比较和识别发音的内容并定位出差异性,以帮助构音障碍者改善发音。然而,由于收集构音障碍者的大量语音数据非常困难,且构音障碍者存在发音的强变异性,导致通用语音识别模型的效果往往不佳。为了解决这一问题,许多研究提出将说话人自适应方法引入构音障碍语音识别。对大量相关文献进行调研发现,当前此类研究主要围绕特征域和模型域对构音障碍语音进行分析。文中重点分析特征变换和辅助特征如何解决语音特征的差异性表示,以及声学模型的线性变换、微调声学模型参数和基于数据选择的域自适应方法如何提高模型识别的准确率。最后总结出构音障碍说话人自适应研究当前遇到的问题,并指出未来的研究可以从语音变异性的分析、多特征多模态数据的融合以及基于小数量的自适应方法的角度,提升构音障碍语音识别模型的有效性。

关键词: 构音障碍, 说话人自适应, 辅助特征, 变换, 微调, 域自适应

Abstract: Automatic speech recognition tools make communication between dysarthria and normal individuals smoother,therefore,dysarthric speech recognition has become a hot research topic in recent years.The research on dysarthric speech recognition includes:collecting pronunciation data from dysarthria and normal individuals,representing acoustic features of dysarthria speech and normal speech,comparing and recognizing the content of pronunciation by machine learning model,and locating differences,so as to help dysarthria to improve their pronunciation.However,due to the significant difficulties in collecting a large amount of speech data from dysarthria,and the strong variability of their pronunciation,the performance of universal speech recognition models is often poor.To address this issue,many studies have proposed to introduce speaker adaptation methods into dysarthric speech recognition.Through extensive research on relevant literature,it has been found that current research mainly focuses on analyzing dysarthria speech in the feature domain and model domain.This paper focuses on analyzing how feature transformation and auxiliary features solve the differential representation of speech features,how linear transformation of acoustic models,fine-tuning of acoustic model parameters,and domain adaptation methods based on data selection improve the accuracy of model recognition.Finally,the current problems encountered in the research of dysarthria speaker adaptation are summarized,and it is pointed out that future research can improve the effectiveness of dysarthric speech recognition models from the perspectives of analyzing speech variability,fusing multi-feature and multi-modal data,and using a small number of speaker adaptation methods.

Key words: Dysarthria, Speaker adaptation, Auxiliary features, Transformation, Fine-tuning, Domain adaptation

中图分类号: 

  • TP183
[1]RIGOLL G.Speaker adaptation for large vocabulary speech re-cognition systems using speaker Markov models[C]//International Conference on Acoustics,Speech,and Signal Processing.IEEE,1989:5-8.
[2]ZHU F Y,MA Z Q,CHEN Y,et al.A Survey of Speaker Adaptation Methods in Speech Recognition[J].Journal of Frontiers of Computer Science and Technology,2021,15(12):2241-2255.
[3]HAHM S,HEITZMAN D,WANG J.Recognizing dysarthricspeech due to amyotrophic lateral sclerosis with across-speaker articulatory normalization[C]//Proceedings of SLPAT 2015:6th Workshop on Speech and Language Processing for Assistive Technologies.2015:47-54.
[4]GALES M J F.Maximum likelihood linear transformations for HMM-based speech recognition[J].Computer speech & language,1998,12(2):75-98.
[5]BHAT C,VACHHANI B,KOPPARAPU S K.Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral Estimation[C]//Interspeech.2016:228-232.
[6]SAON G,SOLTAU H,NAHAMOO D,et al.Speaker adaptation of neural network acoustic models using i-vectors[C]//2013 IEEE Workshop on Automatic Speech Recognition and Understanding.IEEE,2013:55-59.
[7]WANG D,YU J,WU X,et al.Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization[C]//2021 12th International Symposium on Chinese Spoken Language Processing(ISCSLP).Hong Kong:IEEE,2021:1-5.
[8]YILMAZ E,MITRA V,SIVARAMAN G,et al.Articulatoryand bottleneck features for speaker-independent ASR of dysarthric speech[J].Computer Speech & Language,2019,58:319-334.
[9]LIANG Z Y,LI Y X,SUN Y,et al.Speech recognition of dysarthria based on multi feature combination[J].Computer Engineering and Design,2022,43(2):567-572.
[10]AL-QATAB B A,MUSTAFA M B.Classification of Dysarthric Speech According to the Severity of Impairment:an Analysis of Acoustic Features[J].IEEE Access,2021,9:18183-18194.
[11]KONG A P H,TSE C W K,KONG A P H,et al.Clinician survey on speech pathology services for people with aphasia in Hong Kong[J].Clinical Archives of Communication Disorders,2018,3(3):201-212.
[12]ZHENG W,TIAN X,YANG B,et al.A few shot classification methods based on multiscale relational networks[J].Applied Sciences,2022,12(8):4059.
[13]YAO D,CHI W,KHISHE M.Parkinson’s disease and cleft lip and palate of pathological speech diagnosis using deep convolutional neural networks evolved by IPWOA[J].Applied Acoustics,2022,199:109003.
[14]KODRASI I,BOURLARD H.Spectro-Temporal Sparsity Cha-racterization for Dysarthric Speech Detection[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2020,28:1210-1222.
[15]CHANDRASHEKAR H M,KARJIGI V,SREEDEVI N.Spectro-Temporal Representation of Speech for Intelligibility Assessment of Dysarthria[J].IEEE Journal of Selected Topics in Signal Processing,2020,14(2):390-399.
[16]KIM H,HASEGAWA-JOHNSON M,PERLMAN A,et al.Dysarthric speech database for universal access research[C]//Ninth Annual Conference of the International Speech Communication Association.2008.
[17]RUDZICZ F,NAMASIVAYAM A K,WOLFF T.The TORGO database of acoustic and articulatory speech from speakers with dysarthria[J].Language Resources and Evaluation,2012,46(4):523-541.
[18]CHANDRASHEKAR H M,KARJIGI V,SREEDEVI N.Investigation of different time-frequency representations for intelligibility assessment of dysarthric speech[J].Ieee transactions on neural systems and rehabilitation engineering,2020,28(12):2880-2889.
[19]GENG M,XIE X,YE Z,et al.Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2022,30:2597-2611.
[20]GENG M Z,JIN Z R,WANG T Z,et al.Use of Speech Impairment Severity for Dysarthric Speech Recognition[J].arXiv:2305.10659,2023.
[21]HERNANDEZ A,PÉREZ-TORO P A,NÖTH E,et al.Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition[C]//Interspeech 2022.ISCA,2022:51-55.
[22]KARTHICK BASKAR M,HERZIG T,NGUYEN D,et al.Speaker adaptation for Wav2vec2 based dysarthric ASR[C]//Interspeech 2022.ISCA,2022:3403-3407.
[23]GENG M,XIE X,SU R,et al.On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition[J].arXiv:2203.14593,2022.
[24]WU L D.Multi-view dysarthria speech recognition based ondeep temporal network [D].Shanghai:East China Normal University,2022.
[25]ZHAO J X,XUE P Y,BAI J,et al.A multiscale feature extraction algorithm for dysarthric speech recognition[J].Journal of Biomedical Engineering,2023,40(1):44-50.
[26]FERNÁNDEZ-DÍAZ M,GALLARDO-ANTOLÍN A.An attention Long Short-Term Memory based system for automatic classification of speech intelligibility[J].Engineering Applications of Artificial Intelligence,2020,96:103976.
[27]FRITSCH J,MAGIMAI-DOSS M.Utterance Verification-Based Dysarthric Speech Intelligibility Assessment Using Phonetic Posterior Features[J].IEEE Signal Processing Letters,2021,28:224-228.
[28]KODRASI I.Temporal Envelope and Fine Structure Cues forDysarthric Speech Detection Using CNNs[J].IEEE Signal Processing Letters,2021,28:1853-1857.
[29]KARAN B,SAHU S S,OROZCO-ARROYAVE J R,et al.Non-negative matrix factorization-based time-frequency feature extraction of voice signal for Parkinson’s disease prediction[J].Computer Speech & Language,2021,69:101216.
[30]JANBAKHSHI P,KODRASI I,BOURLARD H.Subspace-Based Learning for Automatic Dysarthric Speech Detection[J].IEEE Signal Processing Letters,2021,28:96-100.
[31]SAHU L P,PRADHAN G.Analysis of Short-Time Magnitude Spectra for Improving Intelligibility Assessment of Dysarthric Speech[J].Circuits,Systems,and Signal Processing,2022,41:5676-5698.
[32]YUE Z,LOWEIMI E,CHRISTENSEN H,et al.Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2022,30:2968-2980.
[33]HU S J,XIE X R,JIN Z R,et al.Exploring self-supervised pre-trained asr models for dysarthric and elderly speech recognition[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2023:1-5.
[34]NARENDRA N P,ALKU P.Glottal Source Information forPathological Voice Detection[J].IEEE Access,2020,8:67745-67755.
[35]NARENDRA N P,ALKU P.Automatic intelligibility assess-ment of dysarthric speech using glottal parameters[J].Speech Communication,2020,123:1-9.
[36]AIRAKSINEN M,RAITIO T,STORY B,et al.Quasi closedphase glottal inverse filtering analysis with weighted linear prediction[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2013,22(3):596-607.
[37]DUAN S F,WANG J Q,DINGAM C,et al.Disease DegreeClassification of Dysarthria Based on Spatial Features of Articulation [J].Journal of Fudan University(Natural Science),2021,60(3):288-296.
[38]XIE X,RUZI R,LIU X,et al.Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition[C]//Interspeech 2021.ISCA,2021:4808-4812.
[39]LEGGETTER C J,WOODLAND P C.Maximum likelihood li-near regression for speaker adaptation of continuous density hidden Markov models[J].Computer Speech & Language,1995,9(2):171-185.
[40]NETO J,ALMEIDA L,HOCHBERG M,et al.Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system[C]//4th European Conference on Speech Communication and Technology(Eurospeech 1995).1995:2171-2174.
[41]YU J,XIE X,LIU S,et al.Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus[C]//Interspeech 2018.ISCA,2018:2938-2942.
[42]GENG M,XIE X,LIU S,et al.Investigation of data augmentation techniques for disordered speech recognition[C]//Interspeech 2020.ISCA,2020:696-700.
[43]XIE X,LIU X,LEE T,et al Bayesian learning for deep neural network adaptation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:2096-2110.
[44]ZHANG C,WOODLAND P C.DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2016).IEEE,2016:5300-5304.
[45]LIU S,GENG M,HU S,et al.Recent Progress in the CUHK Dysarthric Speech Recognition System[J].IEEE/ACM Tran-sactions on Audio,Speech,and Language Processing,2021,29:2267-2281.
[46]GAUVAIN J L,LEE C H.MAP estimation of continuous density HMM:theory and applications[C]//Speech and Natural Language:Proceedings of a Workshop Held at Harriman.New York,1992.
[47]MENGISTU K T,RUDZICZ F.Adapting acoustic lexical mo-dels to dysarthric speech[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2011).IEEE,2011:4924-4927.
[48]KIM M J,YOO J,KIM H.Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models[C]//Interspeech.2013:3622-3626.
[49]SEHGAL S,CUNNINGHAM S.Model adaptation and adaptive training for the recognition of dysarthric speech[C]//Procee-dings of SLPAT 2015:6th Workshop on Speech and Language Processing for Assistive Technologies.2015:65-71.
[50]TAKASHIMA R,TAKIGUCHI T,ARIKI Y.Two-step acoustic model adaptation for dysarthric speech recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:6104-6108.
[51]XIONG F,BARKER J,YUE Z,et al.Source domain data selection for improved transfer learning targeting dysarthric speech recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:7424-7428.
[52]SHOR J,EMANUEL D,LANG O,et al.Personalizing ASR for dysarthric and accented speech with limited data[C]//Interspeech 2019.ISCA,2019:784-788.
[53]GREEN J R,MACDONALD R L,JIANG P P,et al.Automatic Speech Recognition of Disordered Speech:Personalized Models Outperforming Human Listeners on Short Phrases[C]//Interspeech.2021:4778-4782.
[54]DENG J,GUTIERREZ F R,HU S,et al.Bayesian Parametricand Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition[C]//Interspeech.2021:4818-4822.
[55]WANG T Z,HU S K,DENG J J,et al.Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition[J].arXiv:2306.15265,2023.
[56]KIM M,KIM Y,YOO J,et al.Regularized speaker adaptation of KL-HMM for dysarthric speech recognition[J].IEEE Transactions on Neural Systems and Rehabilitation Engineering,2017,25(9):1581-1591.
[57]QI J Z,HAMME H V.Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation[J].arXiv:2306.07090,2023.
[58]CHRISTENSEN H,CASANUEVA I,CUNNINGHAM S,et al.Automatic selection of speakers for improved acoustic modelling:Recognition of disordered speech with sparse data[C]//IEEE Spoken Language Technology Workshop(SLT 2014).IEEE,2014:254-259.
[59]WANG D,DENG L,YEUNG Y T,et al.Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization[J].arXiv:2106.10127,2021.
[60]WANG D,LIU S,WU X,et al.Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2022).IEEE,2022:6677-6681.
[61]TURRISI R,BADINO L.Interpretable Dysarthric SpeakerAdaptation based on Optimal-Transport[C]//Interspeech 2022.ISCA,2022:26-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!