构音障碍说话人自适应研究进展及展望

doi:10.11896/jsjkx.230700161

Abstract

Abstract: Automatic speech recognition tools make communication between dysarthria and normal individuals smoother,therefore,dysarthric speech recognition has become a hot research topic in recent years.The research on dysarthric speech recognition includes:collecting pronunciation data from dysarthria and normal individuals,representing acoustic features of dysarthria speech and normal speech,comparing and recognizing the content of pronunciation by machine learning model,and locating differences,so as to help dysarthria to improve their pronunciation.However,due to the significant difficulties in collecting a large amount of speech data from dysarthria,and the strong variability of their pronunciation,the performance of universal speech recognition models is often poor.To address this issue,many studies have proposed to introduce speaker adaptation methods into dysarthric speech recognition.Through extensive research on relevant literature,it has been found that current research mainly focuses on analyzing dysarthria speech in the feature domain and model domain.This paper focuses on analyzing how feature transformation and auxiliary features solve the differential representation of speech features,how linear transformation of acoustic models,fine-tuning of acoustic model parameters,and domain adaptation methods based on data selection improve the accuracy of model recognition.Finally,the current problems encountered in the research of dysarthria speaker adaptation are summarized,and it is pointed out that future research can improve the effectiveness of dysarthric speech recognition models from the perspectives of analyzing speech variability,fusing multi-feature and multi-modal data,and using a small number of speaker adaptation methods.

Key words: Dysarthria, Speaker adaptation, Auxiliary features, Transformation, Fine-tuning, Domain adaptation

CLC Number:

TP183

KANG Xinchen, DONG Xueyan, YAO Dengfeng, ZHONG Jinghua. Advancements and Prospects in Dysarthria Speaker Adaptation[J].Computer Science, 2024, 51(8): 11-19.

References

[1]RIGOLL G.Speaker adaptation for large vocabulary speech re-cognition systems using speaker Markov models[C]//International Conference on Acoustics,Speech,and Signal Processing.IEEE,1989:5-8.
[2]ZHU F Y,MA Z Q,CHEN Y,et al.A Survey of Speaker Adaptation Methods in Speech Recognition[J].Journal of Frontiers of Computer Science and Technology,2021,15(12):2241-2255.
[3]HAHM S,HEITZMAN D,WANG J.Recognizing dysarthricspeech due to amyotrophic lateral sclerosis with across-speaker articulatory normalization[C]//Proceedings of SLPAT 2015:6th Workshop on Speech and Language Processing for Assistive Technologies.2015:47-54.
[4]GALES M J F.Maximum likelihood linear transformations for HMM-based speech recognition[J].Computer speech & language,1998,12(2):75-98.
[5]BHAT C,VACHHANI B,KOPPARAPU S K.Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral Estimation[C]//Interspeech.2016:228-232.
[6]SAON G,SOLTAU H,NAHAMOO D,et al.Speaker adaptation of neural network acoustic models using i-vectors[C]//2013 IEEE Workshop on Automatic Speech Recognition and Understanding.IEEE,2013:55-59.
[7]WANG D,YU J,WU X,et al.Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization[C]//2021 12th International Symposium on Chinese Spoken Language Processing(ISCSLP).Hong Kong:IEEE,2021:1-5.
[8]YILMAZ E,MITRA V,SIVARAMAN G,et al.Articulatoryand bottleneck features for speaker-independent ASR of dysarthric speech[J].Computer Speech & Language,2019,58:319-334.
[9]LIANG Z Y,LI Y X,SUN Y,et al.Speech recognition of dysarthria based on multi feature combination[J].Computer Engineering and Design,2022,43(2):567-572.
[10]AL-QATAB B A,MUSTAFA M B.Classification of Dysarthric Speech According to the Severity of Impairment:an Analysis of Acoustic Features[J].IEEE Access,2021,9:18183-18194.
[11]KONG A P H,TSE C W K,KONG A P H,et al.Clinician survey on speech pathology services for people with aphasia in Hong Kong[J].Clinical Archives of Communication Disorders,2018,3(3):201-212.
[12]ZHENG W,TIAN X,YANG B,et al.A few shot classification methods based on multiscale relational networks[J].Applied Sciences,2022,12(8):4059.
[13]YAO D,CHI W,KHISHE M.Parkinson’s disease and cleft lip and palate of pathological speech diagnosis using deep convolutional neural networks evolved by IPWOA[J].Applied Acoustics,2022,199:109003.
[14]KODRASI I,BOURLARD H.Spectro-Temporal Sparsity Cha-racterization for Dysarthric Speech Detection[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2020,28:1210-1222.
[15]CHANDRASHEKAR H M,KARJIGI V,SREEDEVI N.Spectro-Temporal Representation of Speech for Intelligibility Assessment of Dysarthria[J].IEEE Journal of Selected Topics in Signal Processing,2020,14(2):390-399.
[16]KIM H,HASEGAWA-JOHNSON M,PERLMAN A,et al.Dysarthric speech database for universal access research[C]//Ninth Annual Conference of the International Speech Communication Association.2008.
[17]RUDZICZ F,NAMASIVAYAM A K,WOLFF T.The TORGO database of acoustic and articulatory speech from speakers with dysarthria[J].Language Resources and Evaluation,2012,46(4):523-541.
[18]CHANDRASHEKAR H M,KARJIGI V,SREEDEVI N.Investigation of different time-frequency representations for intelligibility assessment of dysarthric speech[J].Ieee transactions on neural systems and rehabilitation engineering,2020,28(12):2880-2889.
[19]GENG M,XIE X,YE Z,et al.Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2022,30:2597-2611.
[20]GENG M Z,JIN Z R,WANG T Z,et al.Use of Speech Impairment Severity for Dysarthric Speech Recognition[J].arXiv:2305.10659,2023.
[21]HERNANDEZ A,PÉREZ-TORO P A,NÖTH E,et al.Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition[C]//Interspeech 2022.ISCA,2022:51-55.
[22]KARTHICK BASKAR M,HERZIG T,NGUYEN D,et al.Speaker adaptation for Wav2vec2 based dysarthric ASR[C]//Interspeech 2022.ISCA,2022:3403-3407.
[23]GENG M,XIE X,SU R,et al.On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition[J].arXiv:2203.14593,2022.
[24]WU L D.Multi-view dysarthria speech recognition based ondeep temporal network [D].Shanghai:East China Normal University,2022.
[25]ZHAO J X,XUE P Y,BAI J,et al.A multiscale feature extraction algorithm for dysarthric speech recognition[J].Journal of Biomedical Engineering,2023,40(1):44-50.
[26]FERNÁNDEZ-DÍAZ M,GALLARDO-ANTOLÍN A.An attention Long Short-Term Memory based system for automatic classification of speech intelligibility[J].Engineering Applications of Artificial Intelligence,2020,96:103976.
[27]FRITSCH J,MAGIMAI-DOSS M.Utterance Verification-Based Dysarthric Speech Intelligibility Assessment Using Phonetic Posterior Features[J].IEEE Signal Processing Letters,2021,28:224-228.
[28]KODRASI I.Temporal Envelope and Fine Structure Cues forDysarthric Speech Detection Using CNNs[J].IEEE Signal Processing Letters,2021,28:1853-1857.
[29]KARAN B,SAHU S S,OROZCO-ARROYAVE J R,et al.Non-negative matrix factorization-based time-frequency feature extraction of voice signal for Parkinson’s disease prediction[J].Computer Speech & Language,2021,69:101216.
[30]JANBAKHSHI P,KODRASI I,BOURLARD H.Subspace-Based Learning for Automatic Dysarthric Speech Detection[J].IEEE Signal Processing Letters,2021,28:96-100.
[31]SAHU L P,PRADHAN G.Analysis of Short-Time Magnitude Spectra for Improving Intelligibility Assessment of Dysarthric Speech[J].Circuits,Systems,and Signal Processing,2022,41:5676-5698.
[32]YUE Z,LOWEIMI E,CHRISTENSEN H,et al.Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2022,30:2968-2980.
[33]HU S J,XIE X R,JIN Z R,et al.Exploring self-supervised pre-trained asr models for dysarthric and elderly speech recognition[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2023:1-5.
[34]NARENDRA N P,ALKU P.Glottal Source Information forPathological Voice Detection[J].IEEE Access,2020,8:67745-67755.
[35]NARENDRA N P,ALKU P.Automatic intelligibility assess-ment of dysarthric speech using glottal parameters[J].Speech Communication,2020,123:1-9.
[36]AIRAKSINEN M,RAITIO T,STORY B,et al.Quasi closedphase glottal inverse filtering analysis with weighted linear prediction[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2013,22(3):596-607.
[37]DUAN S F,WANG J Q,DINGAM C,et al.Disease DegreeClassification of Dysarthria Based on Spatial Features of Articulation [J].Journal of Fudan University(Natural Science),2021,60(3):288-296.
[38]XIE X,RUZI R,LIU X,et al.Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition[C]//Interspeech 2021.ISCA,2021:4808-4812.
[39]LEGGETTER C J,WOODLAND P C.Maximum likelihood li-near regression for speaker adaptation of continuous density hidden Markov models[J].Computer Speech & Language,1995,9(2):171-185.
[40]NETO J,ALMEIDA L,HOCHBERG M,et al.Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system[C]//4th European Conference on Speech Communication and Technology(Eurospeech 1995).1995:2171-2174.
[41]YU J,XIE X,LIU S,et al.Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus[C]//Interspeech 2018.ISCA,2018:2938-2942.
[42]GENG M,XIE X,LIU S,et al.Investigation of data augmentation techniques for disordered speech recognition[C]//Interspeech 2020.ISCA,2020:696-700.
[43]XIE X,LIU X,LEE T,et al Bayesian learning for deep neural network adaptation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:2096-2110.
[44]ZHANG C,WOODLAND P C.DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2016).IEEE,2016:5300-5304.
[45]LIU S,GENG M,HU S,et al.Recent Progress in the CUHK Dysarthric Speech Recognition System[J].IEEE/ACM Tran-sactions on Audio,Speech,and Language Processing,2021,29:2267-2281.
[46]GAUVAIN J L,LEE C H.MAP estimation of continuous density HMM:theory and applications[C]//Speech and Natural Language:Proceedings of a Workshop Held at Harriman.New York,1992.
[47]MENGISTU K T,RUDZICZ F.Adapting acoustic lexical mo-dels to dysarthric speech[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2011).IEEE,2011:4924-4927.
[48]KIM M J,YOO J,KIM H.Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models[C]//Interspeech.2013:3622-3626.
[49]SEHGAL S,CUNNINGHAM S.Model adaptation and adaptive training for the recognition of dysarthric speech[C]//Procee-dings of SLPAT 2015:6th Workshop on Speech and Language Processing for Assistive Technologies.2015:65-71.
[50]TAKASHIMA R,TAKIGUCHI T,ARIKI Y.Two-step acoustic model adaptation for dysarthric speech recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:6104-6108.
[51]XIONG F,BARKER J,YUE Z,et al.Source domain data selection for improved transfer learning targeting dysarthric speech recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:7424-7428.
[52]SHOR J,EMANUEL D,LANG O,et al.Personalizing ASR for dysarthric and accented speech with limited data[C]//Interspeech 2019.ISCA,2019:784-788.
[53]GREEN J R,MACDONALD R L,JIANG P P,et al.Automatic Speech Recognition of Disordered Speech:Personalized Models Outperforming Human Listeners on Short Phrases[C]//Interspeech.2021:4778-4782.
[54]DENG J,GUTIERREZ F R,HU S,et al.Bayesian Parametricand Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition[C]//Interspeech.2021:4818-4822.
[55]WANG T Z,HU S K,DENG J J,et al.Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition[J].arXiv:2306.15265,2023.
[56]KIM M,KIM Y,YOO J,et al.Regularized speaker adaptation of KL-HMM for dysarthric speech recognition[J].IEEE Transactions on Neural Systems and Rehabilitation Engineering,2017,25(9):1581-1591.
[57]QI J Z,HAMME H V.Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation[J].arXiv:2306.07090,2023.
[58]CHRISTENSEN H,CASANUEVA I,CUNNINGHAM S,et al.Automatic selection of speakers for improved acoustic modelling:Recognition of disordered speech with sparse data[C]//IEEE Spoken Language Technology Workshop(SLT 2014).IEEE,2014:254-259.
[59]WANG D,DENG L,YEUNG Y T,et al.Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization[J].arXiv:2106.10127,2021.
[60]WANG D,LIU S,WU X,et al.Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2022).IEEE,2022:6677-6681.
[61]TURRISI R,BADINO L.Interpretable Dysarthric SpeakerAdaptation based on Optimal-Transport[C]//Interspeech 2022.ISCA,2022:26-30.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Advancements and Prospects in Dysarthria Speaker Adaptation

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	TIAN Qing, LU Zhanghu, YANG Hong. Unsupervised Domain Adaptation Based on Entropy Filtering and Class Centroid Optimization [J]. Computer Science, 2024, 51(7): 345-353.
[2]	LIU Hongli, WANG Yulin, SHAO Lei, LI Ji. Study on Monocular Vision Vehicle Ranging Based on Lower Edge of Detection Frame [J]. Computer Science, 2024, 51(6A): 231000077-6.
[3]	WU Lei, WANG Hairui, ZHU Guifu, ZHAO Jianghe. Person Re-identification Method Based on Multi-scale Local Feature Fusion [J]. Computer Science, 2024, 51(6A): 230300236-6.
[4]	CHEN Bingting, ZOU Weiqin, CAI Biyu, LIU Wenjie. Bug Report Severity Prediction Based on Fine-tuned Embedding Model with Domain Knowledge [J]. Computer Science, 2024, 51(6A): 230400068-7.
[5]	WANG Jiahao, FU Yifu, FENG Hainan, REN Yuheng. Indoor Location Algorithm in Dynamic Environment Based on Transfer Learning [J]. Computer Science, 2024, 51(5): 277-283.
[6]	HAO Jiangwei, YANG Hongru, XIA Yuanyuan, LIU Yi, XU Jinchen , PANG Jianmin. Floating-point Expression Precision Optimization Method Based on Multi-type Calculation Rewriting [J]. Computer Science, 2024, 51(4): 86-94.
[7]	JING Yeyiran, YU Zeng, SHI Yunxiao, LI Tianrui. Review of Unsupervised Domain Adaptive Person Re-identification Based on Pseudo-labels [J]. Computer Science, 2024, 51(1): 72-83.
[8]	CUI Fuwei, WU Xuanxuan, CHEN Yufeng, LIU Jian, XU Jin'an. Survey of Domain Adaptive Methods with Knowledge Integrating [J]. Computer Science, 2023, 50(8): 142-149.
[9]	JIN Jiexi, XIE Hehu, DU Peibing, QUAN Zhe, JIANG Hao. QR Decomposition Based on Double-double Precision Gram-Schmidt Orthogonalization Method [J]. Computer Science, 2023, 50(6): 45-51.
[10]	LIU Songyue, WANG Huan. Leaf Classification and Ranking Method Based on Multi-granularity Feature Fusion [J]. Computer Science, 2023, 50(3): 216-222.
[11]	CHEN Shifei, LIU Dong, JIANG He. CodeBERT-based Language Model for Design Patterns [J]. Computer Science, 2023, 50(12): 75-81.
[12]	TANG Junkun, ZHANG Hui, ZHANG Zhouquanand WU Tianyue. Image Classification for Unsupervised Domain Adaptation Based on Task Relevant FeatureSeparation Network [J]. Computer Science, 2023, 50(11A): 230100068-8.
[13]	CHEN Jun, HE Qing, LI Shou-yu. Archimedes Optimization Algorithm Based on Adaptive Feedback Adjustment Factor [J]. Computer Science, 2022, 49(8): 237-246.
[14]	LIU Yun, DONG Shou-jie. Acceleration Algorithm of Multi-channel Video Image Stitching Based on CUDA Kernel Function [J]. Computer Science, 2022, 49(6A): 441-446.
[15]	LENG Jia-xu, TAN Ming-pi, HU Bo, GAO Xin-bo. Video Anomaly Detection Based on Implicit View Transformation [J]. Computer Science, 2022, 49(2): 142-148.