低资源语言自动语音识别中的数据处理与数据增强综述

doi:10.11896/jsjkx.240900009

摘要/Abstract

摘要： 由于标注语音数据不足,端到端自动语音识别(Automatic Speech Recognition,ASR)技术难以直接应用到低资源语言场景,低资源语言ASR也成为NLP领域的热点问题。目前,低资源环境下ASR的研究可以从数据增强和模型改进两方面开展,以低资源语言ASR中的训练数据处理为主要研究对象,重点从数据增强、样本处理、特征工程等角度,对近年来该领域的重要研究成果进行梳理和总结。分析了不同类型的数据增强方案,强调未配对语音和文本的利用,并从特征抽取、嵌入和融合等不同方面对低资源环境下ASR的特征工程进行分析和总结,阐述了低资源语音语料库建设等问题,并对低资源环境下用于语音识别的数据增强技术未来可以进一步深入研究的重要方向进行展望。

关键词: 低资源, 自动语音识别, 数据增强, 特征表示

Abstract: Due to the absence of transcribed speech,applying end-to-end ASR technology to low-resource language is challenging,making low-resource language ASR is a prominent research topic in NLP.Research on ASR in low-resource settings can be approached from two main aspects:data augmentation and model improvement.This paper focuses on the processing of training data in low-resource language ASR and summarizes the important research results in this field in recent years from the perspectives of data augmentation,sample processing,and feature engineering.Different types of data augmentation schemes are analyzed,and the utilization of unpaired speech and unpaired text is elaborated in detail.The feature engineering of ASR in low-resource scenarios is analyzed and summarized from different aspects such as feature extraction,embedding,and fusion.Finally,additional issues such as the construction of low-resource speech corpora are elaborated,and important directions for further research in low-resource language ASR are prospected.

Key words: Low-resource, Automatic speech recognition, Data augmentation, Feature representation

中图分类号:

TP391

杨健, 孙浏, 张丽芳. 低资源语言自动语音识别中的数据处理与数据增强综述[J]. 计算机科学, 2025, 52(8): 86-99. https://doi.org/10.11896/jsjkx.240900009

YANG Jian, SUN Liu, ZHANG Lifang. Survey on Data Processing and Data Augmentation in Low-resource Language Automatic Speech Recognition[J]. Computer Science, 2025, 52(8): 86-99. https://doi.org/10.11896/jsjkx.240900009

参考文献

[1]KIPYATKOVA I,KAGIROV I.Deep Models for Low-Re-sourced Speech Recognition:Livvi-Karelian Case[J].Mathema-tics,2023,11(18):1-21.
[2]JOSHI P,SANTY S,BUDHIRAJA A,et al.The State and Fate of Linguistic Diversity and Inclusion in the NLP World[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics(ACL'20).2020:6282-6293.
[3]SEBASTIAN R.The 4 Biggest Open Problems in NLP[EB/OL].(2019-01-18) [2023-12-15].https://ruder.io/4-biggest-open-problems-in-nlp/.
[4]ROUHE A,VIRKKUNEN A,LEINONEN J,et al.Low Re-source Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0[C]//Proceedings of the 23rd Annual Confe-rence of the International Speech Communication Association,Interspeech 2022.2022:3543-3547.
[5]HEDDERICH M A,LANGE L,ADEL H,et al.A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.ACL,2021:2545-2568.
[6]BAAS M,KAMPER H.Voice Conversion Can Improve ASR in Very Low-Resource Settings[C]//Proceedings of the 23rd Annual Conference of the International Speech Communication Association.ISCA,2022:3513-3517.
[7]ZHANG J L,MAIRIDAN W,GULANBAIER T.Review ofSpeech Synthesis Methods Under Low-Resource Condition[J].Computer Engineering and Applications,2023,59(15):1-16.
[8]DU Y Q,ZHANG J,FANG X,et al.A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2023,31:3908-3921.
[9]LI W,DI H,WANG L,et al.Boost Transformer with BERTand Copying Mechanism for ASR Error Correction[C]//Proceedings of International Joint Conference on Neural Networks.IEEE,2021:1-6.
[10]GIOLLO M,GUNCELER D,LIU Y L,et al.Bootstrap an End-to-end ASR System by Multilingual Training,Transfer Lear-ning,Text-to-text Mapping and Synthetic Audio[C]//Procee-dings of 22nd Annual Conference of the International Speech Communication Association.2020:2416-2420.
[11]JIN H.Multimodal Enhancement Techniques for Low-Resource Cross-Language Speech Translation Scenarios[D].Foshan:Fo-shan University,2022.
[12]HEMANT Y,SUNAYANA S.A Survey of Multilingual Modelsfor Automatic Speech Recognition[C]//Proceedings of the 13th Language Resources and Evaluation Conference.Marseille France:ELRA,2022:5071-5079.
[13]SLAM W,LI Y N,UROUVAS N.Frontier Research on Low-Resource Speech Recognition Technology[J].Sensors,2023,23(22):1-47.
[14]ZHAO J,ZHANG W Q.Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models[J].IEEE Journal of Selected Topics in Signal Processing,2022,16(6):1227-1241.
[15]KAROL N,MICHAL P,KYOKO M,et al.Adapting Multilingual Speech Representation Model for a New,Underresourced Language Through Multilingual Fine-tuning and Continued Pretraining[J].Information Processing & Management,2023,60(2):1-12.
[16]WANG H Y,JEON E,ZHANG W Q,et al.Zero Resource Korean ASR Based on Acoustic Model Sharing[J].Journal of Data Acquisition and Processing.2023,38(1):93-100.
[17]D'SA A G,ILLINA I,FOHR D,et al.Exploration ofMulti-corpus Learning for Hate Speech Classification in Low Resource Scenarios[C]//Proceedings of 2022 International Conference on Text,Speech,and Dialogue.Springer,2022:238-250.
[18]SINGH S,WANG R,HOU F.Improved Meta Learning for Low Resource Speech Recognition[C]//Proceedings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing.Singapore:IEEE Computer Society,2022:4798-4802.
[19]CHEN Y Q,YANG X K,CHEN Q C.Meta Adversarial Lear-ning Improves Low-Resource Speech Recognition[J].Computer speech & language,2024,84:1-12.
[20]XU F,DAN Y J,YAN K Y,et al.Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation[J].Transactions on Asian and Low-Resource Language Information Processing,2021,21(2):1-21.
[21]KO T,PEDDINTI V,POVEY D,et al.Audio Augmentation for Speech Recognition[C]//Proceedings of INTERSPEECH 2015.2015:3586-3589.
[22]FAZEL A,YANG W,LIU Y L,et al.SynthASR:UnlockingSynthetic Data for Speech Recognition[C]//Proceedings of INTERSPEECH 2021.2021:896-900.
[23]YU K.Research on Low-resource Mandarin Dialect Speech Re-cognition Method and Application[D].Xi'an:Chang'an University,2021.
[24]PARK D S,CHAN W,ZHANG Y,et al.SpecAugment:A Simple Data Augmentation Method for Automatic Speech Recognition[C]//Proceedings of INTERSPEECH 2019.2019:2613-2617.
[25]LI R,MA G,ZHAO D,et al.A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR[C]//Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.2022:630-635.
[26]ZEYER A,BAHAR P,IRIE K,et al.A Comparison of Transformer and LSTM Encoder Decoder Models for ASR[C]//Proceedings of 2019 IEEE Automatic Speech Recognition and Understanding Workshop.IEEE,2019:8-15.
[27]DAMANIA R,HOMAN C,PRUD H E.Combining Simple but Novel Data Augmentation Methods for Improving Conformer ASR[C]//Proceedings of INTERSPEECH 2022.2022:4890-4894.
[28]ZHONG G,SONG H,WANG R,et al.External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge[C]//Procee-dings of INTERSPEECH 2022.2022:4860-4864.
[29]HU T Y,ASHISH S,CHANG R J,et al.Sapaugment:Learning a Sample Adaptive Policy for Data Augmentation[C]//Procee-dings of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2021:4040-4044.
[30]FAN C,DING M,YI J,et al.Two-stage Deep Spectrum Fusion for Noise-Robust End-To-End Speech Recognition[J].Applied Acoustics,2023,212:1-10.
[31]SUN L,YOLWAS N,JIANG L.A Method Improves Speech Recognition with Contrastive Learning in Low-Resource Languages[J].Applied Sciences,2023,13(8):1-14.
[32]YANG H,ZHANG M,TAO S,et al.Chinese ASR and NER Improvement Based on Whisper Fine-Tuning[C]//Proceedings of 2023 25th International Conference on Advanced Communication Technology.IEEE,2023:213-217.
[33]GAO H,WANG X,KANG S,et al.Seamless Equal Accuracy Ratio for Inclusive CTC Speech Recognition[J].Speech Communication:An International Journal,2022,136:76-83.
[34]MIRKO A,SIMONE B,LUIGI C,et al.Semi-Supervised Cross-Lingual Speech Emotion Recognition[J].Expert Systems With Applications,2024,237(A):1-11.
[35]SHI X,LIU X,XU C,et al.Cross-Lingual Offensive SpeechIdentificationwith Transfer Learning for Low-Resource Languages[J].Computers and Electrical Engineering,2022,101:1-10.
[36]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.New York:ACM,2009:41-48.
[37]XU C,HU B,JIANG Y,et al.Dynamic Curriculum Learning for Low-resource Neural Machine Translation[C]//Proceedings of the 28th International Conference on Computational Linguistics.ACM,2020:3977-3989.
[38]KARAKASIDIS G,GRÓSZ T,KURIMO M.Comparison andAnalysis of New Curriculum Criteria for End-to-End ASR[C]//Proceedings of INTERSPEECH 2022.2022:66-70.
[39]KUZNETSOVA A,KUMAR A,FOX J D,et al.Curriculum Optimization for Low-Resource Speech Recognition[C]//Procee-dings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2022:8187-8191.
[40]QIAN Y,ZHOU Z.Optimizing Data Usage for Low-Resource Speech Recognition[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2022,30:394-403.
[41]ZHOU Z,WANG W,ZHANG W,et al.Exploring Effective Data Utilization for Low-Resource Speech Recognition[C]//Proceedings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2022:8192-8196.
[42]MOHAMMADI S H,KAIN A.An Overviewof Voice Conver-sion Systems[J].Speech Communication,2017,88:65-82.
[43]KANEKO T,KAMEOKA H.CycleGANVC:Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks[C]//Proceedings of the 26th European Signal Processing Conference.Piscataway,NJ:IEEE,2018:2100-2104.
[44]ZHANG J X,LING Z H,LIU L J,et al.Sequence-to-se-quence Acoustic Modeling for Voice Conversion[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2019,27(3):631-644.
[45]LI J Y,TU W P,XIAO L.Freevc:Towards High-Quality Text-Free One-Shot Voice Conversion[C]//Proceedings of 2023 IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway,NJ:IEEE,2023:1-5.
[46]THIENPONDT J,DEMUYNCK K.Transfer Learning for Ro-bust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping[C]//Proceedings of INTERSPEECH 2022.2022:2213-2217.
[47]AZIZAH K,JATMIKO W.Transfer Learning,Style Control,and Speaker Reconstruction Loss for Zero-Shot Multilingual Multi-Speaker Text-to-Speech on Low-Resource Languages[J].IEEE Access,2022,10:5895-5911.
[48]HALPERN B,FENG S,SON R V,et al.Low-resource Automatic Speech Recognitionand Error Analyses of Oral Cancer Speech[J].Speech Communication,2022,141:14-27.
[49]QIN S Q.Research on Modeling Unit for Tibetan Speech Recognition[D].Tianjin:Tianjin University,2022.
[50]QIN S Q,WANG L B,LI S,et al.Improving Low-resource Tibetan End-to-end ASR by Multilingual and Multilevel Unit Modeling[J].EURASIP Journal on Audio,Speech,and Music Processing,2022,2:1-10.
[51]SOKY K,GONG Z,LI S.Nict-Tib1:A Public Speech Corpusof Lhasa Dialect for Benchmarking Tibetan Language Speech Re-cognition Systems[C]//Proceedings of 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques.IEEE,2022:1-5.
[52]ANOOP C S,RAMAKRISHNAN A G.Suitability of Syllable-Based Modeling Units for End-To-End Speech Recognition in Sanskrit and Other Indian Languages[J].Expert Systems with Application,2023,220:1-9.
[53]SHETTY V M,METILDA S M N J,UMESH S.Investigation of Speaker-adaptation methods in Transformer based ASR[J].arXiv:2008.03247,2020.
[54]NING J W.Research on End-To-End Semi-Supervised SpeechRecognition Under Low-Resource Condition[D].Lanzhou:Northwest Minzu University,2023.
[55]QIAN Y,LIU J.Articulatory Feature Based Multilingual MLPS for Low-Resource Speech Recognition[C]//Proceedings of INTERSPEECH 2012.2012:2602-2605.
[56]LI S,DING C,LU X,et al.End-toend Articulatory AttributeModeling for Low-Resource Multilingual Speech Recognition[C]//Proceedings of INTERSPEECH 2019.2019:2145-2149.
[57]SUBI A.Research on Uyghur Speech Recognition Based on End-to-End Modeling[D].Urumqi:Xinjiang University,2021.
[58]SHEN Z J,GUO W.Vietnamese Speech Recognition Based on Pre-training and Phone-Based Byte-Pair Encoding[J].Journal of Data Acquisition and Processing,2023,38(1):101-110.
[59]BALÁZS T,GYRGY S,TIBOR F,et al.Deep TransformerBased Data Augmentation with Subword Units for Morphologically Rich Online ASR[J].arXiv:2007.06949,2020.
[60]SABRINA J M,JASON E.Spell Once,Summon Anywhere:A Two-Level Open-Vocabulary Language Model[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence.ACM,2018:6843-6850.
[61]EGOROVA E,VYDANA H K,BURGET L,et al.Spelling-Aware Word-Based End-to-End ASR[J].IEEE Signal Processing Letters,2022,29:1729-1733.
[62]DIWAN A,JYOTHI P.Reduce and Reconstruct:ASR for Low-Resource Phonetic Languages[C]//Proceedings of INTERSPEECH 2021.2021:3445-3449.
[63]CHUNG H,LI J,LIU P F,et al.Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition[C]//Proceedings of the 13th International Symposium on Chinese Spoken Language Processing.IEEE,2022:26-30.
[64]CHUNGY,ZHANG Y,HAN W,et al.w2v-BERT:Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training[C]//Proceedings of 2021 IEEE Automatic Speech Recognition and Understanding Workshop.IEEE,2021:244-250.
[65]ANOOPC S,RAMAKRISHNAN A G.Exploring a UnifiedASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language Models[C]//Proceedings of 2022 IEEE Spoken Language Technology Workshop.IEEE,2023:830-837.
[66]MORIYA T,SATO H,TANAKA T,et al.Self-Distilling Attention Weights For CTC-Based ASR Systems[C]//Proceedings of INTERSPEECH 2020.ISCA,2020:6894-6898.
[67]MORIYA T,OCHIAI T,KARITA S,et al.Self-Distillation for Improving CTC-Transformer-Based ASR Systems[C]//Proceedings of INTERSPEECH 2020.ISCA,2020:546-550.
[68]AHMAD R,JALAL M A,FAROOQ M U,et al.Towards Domain Generalisation in ASR with Elitist Sampling and Ensemble Knowledge Distillation[C]//Proceedings of 2023 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2023:1-5.
[69] Z·ELASKO P,FENG S,VELÁZQUEZ L M,et al.Discovering Phonetic Inventories with Crosslingual Automatic Speech Re-cognition[J].Computer Speech & Language,2022,74:1-23.
[70]ZENG Z,PHAM V T,XU H,et al.Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning[C]//Proceedings of the 12th International Symposium on Chinese Spoken Language Processing.IEEE,2021:1-5.
[71]YUSUF B,GANDHE A,SOKOLOV A.Usted:Improving ASR with a Unified Speech and Text Encoder-Decoder[C]//Procee-dings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2022:8297-8301.
[72]SUN E,LI J,XUE J,et al.Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text[J].arXiv:2307.16332,2023.
[73]LU X,SHEN P,YU T.Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-Based ASR[C]//Proceedings of 2024 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2024:13116-13120.
[74]LAPTEV A,KOROSTIK R,SVISCHEV A,et al.You Do Not Need More Data:Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation[C]//Proceedings of the 13th International Congress on Image and Signal Processing,BioMedical Engineering and Informatics.IEEE,2020:439-444.
[75]QU L,WEBER C,WERMTER S.Emphasizing Unseen Words:New Vocabulary Acquisition for End-To-End Speech Recognition[J].Neural Networks,2023,161:494-504.
[76]HEIGOLD G,MORENO I,BENGIO S,et al.End-to-end Text-Dependent Speaker Verification[C]//Proceedings of 2016 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2016:5115-5119.
[77]NATHANIEL R,PEREZ O,SWETHA G,et al.When Is TTS Augmentation Through a Pivot Language Useful?[C]//Proceedings of INTERSPEECH 2022.ISCA,2022:3538-3542.
[78]MI C,XIE L,ZHANG Y.Improving Data Augmentation forLow Resource Speech-To-Text Translation with Diverse Paraphrasing[J].Neural Networks,2022,148:194-205.
[79]SOKY K,LI S,MIMURA M,et al.Leveraging SimultaneousTranslation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism[C]//Proceedings of INTERSPEECH 2022.ISCA,2022:1362-1366.
[80]WANG J,ZHU Y,FAN R,et al.Low Resource German ASR with Untranscribed Data Spoken by Non-Native Children[C]//Proceedings of INTERSPEECH 2021.ISCA,2021:1279-1283.
[81]CHEN T,KORNBLITH S,NOROUZI M,et al.A SimpleFramework for Contrastive Learning of Visual Representations[C]//Proceedings of the International Conference on Machine Learning.ACM,2020:1597-1607.
[82]HE K,FAN H,WU Y,et al.Momentum Contrast for Unsupervised Visual Representation Learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:9729-9738.
[83]LIN C E,CHEN K Y.A Lexical-aware Non-autoregressiveTransformer-based ASR Model[C]//Proceedings of INTERSPEECH 2023.ISCA,2023:1434-1438.
[84]BERREBBI D,SHI J,YAN B,et al.Combining Spectral andSelf-Supervised Features for Low Resource Speech Recognition and Translation[C]//Proceedings of INTERSPEECH 2022.ISCA,2022:3533-3537.
[85]ZATVORNITSKIY A.Low-Cost Training of Speech Recognition System for Hindi ASR Challenge 2022[C]//Proceedings of SPECOM 2022.Cham:Springer,2022,13721:712-718.
[86]HAIDAR M A,XING C,REZAGHOLIZADEH M.Transformer-Based ASR Incorporating Time-Reduction Layer and Fine-Tuning with Self-Knowledge Distillation[C]//Proceedings of INTERSPEECH 2021.ISCA,2021:2102-2106.
[87]ZHU W,JIN H,CHEN J,et al.A Hybrid Acoustic Model Based on PDP Coding For Resolving Articulation Differences in Low-Resource Speech Recognition[J].Applied Acoustics,2022,192:1-11.
[88]AMBUJ M,NAVONIL M,RISHABH B,et al.A Review ofDeep Learning Techniques for Speech Processing[J].Information Fusion,2023,99:1-55.
[89]YADAV H,GUPTA A,RALLABANDI S K,et al.Intent Classification Using Pre-Trained Language Agnostic Embeddings For Low Resource Languages[C]//Proceedings of INTERSPEECH 2022.ISCA,2022:3473-3477.
[90]XIANG Z H,GU X,RAO C Z,et al.Research on Low-resource Qingdao Dialect Speech Recognition Method[J].Computer Technology and Development,2024,34(4):146-152.
[91]LEEC Y,GLASS J.A Nonparametric Bayesian Approach to Acoustic Model Discovery[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics.ACL,2012:40-49.
[92]WU B,SAKTI S,NAKAMURA S.Incorporating Discriminative DPGMM Posteriorgrams for Low-Resource ASR[C]//Procee-dings of 2021 IEEE Spoken Language Technology Workshop.IEEE,2021:201-208.
[93]WU B,SAKTI S,ZHANG J,et al.Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2022,30:901-916.
[94]XU L,ZHAO Y,XU X,et al.Latent Regression Bayesian Network for Speech Representation[J].Electronics,2023,12(15):1-12.
[95]KRISHNA V,SAI T,GANAPATHY S.Representation Lear-ning With Hidden Unit Clustering for Low Resource Speech Applications[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2024,32:1036-1047.
[96]LI X,DALMIA S,LI J,et al.Universal Phone Recognition with a Multilingual Allophone System[C]//Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2020:8249-8253.
[97]HARDIK S,KIRAN P T,VIKAS A,et al.SRI-B End-to-EndSystem for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages[C]//Proceedings of INTERSPEECH 2021.ISCA,2021:2456-2460.
[98]HUANG R.Integrating Categorical Features in End-To-EndASR[J].arXiv:2110.03047,2021.
[99]CHEN Z,LI J,LIU H,et al.Learning Multi-Scale Features for Speech Emotion Recognition with Connection Attention Mechanism[J].Expert Systems with Applications,2023,214:1-10.
[100]LIU M,ALEX N J R,VIJAYARAJAN R,et al.Multiscale-multichannel Feature Extraction and Classification through One-Dimensional Convolutional Neural Network for Speech Emotion Recognition[J].Speech Communication,2024,156:1-14.
[101]SUNDARARAMAN M N,KUMAR A,VEPA J.Phoneme-BERT:Joint Language Modelling of Phoneme Sequence and ASR Transcript[C]//Proceedings of INTERSPEECH 2021.ISCA,2021:3236-3240.
[102]TSAI H S,CHANG H J,HUANG W C,et al.SUPERB-SG:Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities[C]//Proceedings of the 60th Annual Meeting of the Association for Computatio-nal Linguistics.ACL,2022:8479-8492.
[103]DOSSOU B,TONJA A L,EMEZUE C,et al.Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection[J].arXiv:2306.02105,2023.
[104]FAROOQ M U,NARAYANA D A H,HAIN T.Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion[C]//Proceedings of INTERSPEECH 2022.ISCA,2022:4850-4854.
[105]LIN Y Q,DANG J W,WANG L B,et al.Disordered Speech Recognition Considering Low Resources and Abnormal Articulation[J].Speech Communication,2023,155:1-9.
[106]LYU Y,WANG L,GE M,et al.Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation[C]//Proceedings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2022:7992-7996.
[107]KAK S,ZHUO G,SHENG L.Nict-Tib1:A Public Speech Corpusof Lhasa Dialect for Benchmarking Tibetan Language Speech Recognition Systems[C]//Proceedings of the 25th Conference of the Oriental COCOSDA.IEEE,2022:1-5.
[108]YANG J,LI H G,ZHANG X L.The Research on the Construction of Bai Language Speech Corpus[J].Journal of Dali University,2017,2(12):21-26.
[109]REN P,XIAO Y,CHANG X,et al.A Survey of Deep Active Learning[J].ACM Computing Surveys,2021,54(9):1-40.
[110]SEVERIN I,PUSKURUK,CEDRIC J,et al.Universal DataTool[EB/OL].(2021-02-25) [2024-01-15].https://github.com/UniversalDataTool/universal-data-tool.
[111]NOVAK J R,MINEMATSU N,HIROSE K.WFST-basedGrapheme-To-Phoneme Conversion:Open Source Tools For Alignment,Model-Building And Decoding[C]//Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing.ACM,2012:45-49.
[112]MORTENSEN D R,DALMIA S,LITTELL P.Epitran:Precision G2P for many languages[C]//Proceedings of the 7th International Conference on Language Resources and Evaluation.ACM,2018:2710-2714.
[113]HAN X,WANG Y T,FENG J L,et al.A Survey of Transformer-Based Multimodal Pre-Trained Modals[J].Neurocomputing,2023,515:89-106.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed