Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240700138-6.doi: 10.11896/jsjkx.240700138
• Large Language Model Technology and Its Application • Previous Articles Next Articles
ZOU Rui, YANG Jian, ZHANG Kai
CLC Number:
[1]WANG Y,SKERRY-RYAN R J,STANTON D,et al.Tacotron:Towards End-to-End Speech Synthesis[C]//Interspeech.2017. [2]SHEN J,PANG R,WEISS R J,et al."Natural tts synthesis by conditioning wavenet on mel spectrogram predictions[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2018).IEEE,2018. [3]PING W,PENG K W,GIBIANSKY A,et al.Deep voice 3:Scaling text-to-speech with convolutional sequence learning[J].arXiv:1710.07654,2017. [4]REN Y,RUAN Y,TAN X,et al.FastSpeech:Fast,robust and controllable text to speech[C]//Advances in Neural Information Processing Systems.2019. [5]REN Y,HU C,TAN X,et al.FastSpeech 2:Fast and high-quali-ty end-to-end text to speech[J].arXiv:2006.04558,2020. [6]PENG K W,CHEN J.Clarinet:Parallel wave generation in end-to-end text-to-speech[J].arXiv:1807.07281,2018. [7]DONAHUE J,DIELEMAN S,BIŃKOWSKI M,et al.End-to-end adversarial text-to-speech[J].arXiv:2006.03575,2020. [8]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851. [9]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[J].Advances in Neural Information Processing Systems,2014,27. [10]KIM J,KIM S,KONG J,et al.Glow-TTS:A generative flow for text-to-speech via monotonic alignment search[J].Advances in Neural Information Processing Systems,2020,33:8067-8077. [11]POPOV V,VOVK I,GOGORYA N,et al.Grad-TTS:A Diffusion Probabilistic Model for Text-to-Speech[C]//International Conference on Machine Learning(2021). [12]CHEN J.LightGrad:Lightweight Diffusion Probabilistic Model for Text-to-Speech[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2023).2023:1-5. [13]LU C,ZHOU Y,BAO F,et al.DPM-Solver:A fast ode solver for diffusion probabilistic model sampling in around 10 steps[J].Advances in Neural Information Processing Systems,2022,35:5775-5787. [14]LIANG Z,SHI H,WANG J,et al.EM-TTS:Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech[J].arXiv:2403.08164,2024. [15]JEONGM,KIM M,CHOI B J,et al.Transfer Learning for Low-Resource,Multi-Lingual,and Zero-Shot Multi-Speaker Text-to-Speech[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2024. [16]LAM T Q,et al.Instance-based transfer learning approach forVietnamese speech synthesis with very low resource[C]//Future of Information and Communication Conference.Cham:Springer International Publishing,2022. [17]PHUN V L.Data processing for optimizing naturalness of Vietnamese text-to-speech system[C]//2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques(O-COCOSDA).IEEE,2020. [18]NGUYEN L T,THINH P,DAT Q N.XPhoneBERT:A Pre-trained MULTILINGUAL Model for Phoneme Representations for Text-to-Speech[J].arXiv:2305.19709,2023. [19]ZHOU Z,SIDDIQUEE M M R,TAJBAKHSH N,et al.UNet++:Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation[J].IEEE Transactions on Medical Imaging,2020,39(6):1856-1867. [20]SONG Y,SOHL-DICKSTEIN J,KINGMAD P,et al.Score-based generative modeling through stochastic differential equations[J].arXiv:2011.13456,2020. [21]KONG J,KIM J,BAE J.HiFi-GAN:Generative adversarial networks for efficient and high fidelity speech synthesis[J].Advances in Neural Information Processing Systems,2020,33:17022-17033. [22]CHOLLET F.Xception:Deep Learning with Depthwise Separable Convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,HI,USA,2017:1800-1807. [23]ElLINAS N,VAMVOUKAKIS G,MARKOPOULOS K,et al.High quality streaming speech synthesis with low,sentence-length-independent latency[J].arXiv:2111.09052,2021. [24]DEVLIN J.BERT:Pre-training of Deep Bidirectional Trans-formers for Language Understanding[C]//North American Chapter of the Association for Computational Linguistics.2019. [25]LIU Y,OTT M,GOYAL N,et al.RoBERTa:A robustly optimized bert pretraining approach[J].arXiv:1907.11692,2019. [26]MISRA D.Mish:A Self Regularized Non-Monotonic Activation Function[J].British Machine Vision Conference,2020. [27]ZHUORAN S,MINGYUAN Z,HAIYU Z,et al.Efficient Attention:Attention with Linear Complexities[C]//2021 IEEE Winter Conference on Applications of Computer Vision(WACV).Waikoloa,HI,USA,2021:3530-3538. |
[1] | ZHOU Lei, SHI Huaifeng, YANG Kai, WANG Rui, LIU Chaofan. Intelligent Prediction of Network Traffic Based on Large Language Model [J]. Computer Science, 2025, 52(6A): 241100058-7. |
[2] | BAI Yuntian, HAO Wenning, JIN Dawei. Study on Open-domain Question Answering Methods Based on Retrieval-augmented Generation [J]. Computer Science, 2025, 52(6A): 240800141-7. |
[3] | ZHANG Le, CHE Chao, LIANG Yan. Hallucinations Proactive Relief in Diabetes Q&A LLM [J]. Computer Science, 2025, 52(6A): 240700182-10. |
[4] | YIN Baosheng, ZONG Chen. Research on Semantic Fusion of Chinese Polysemous Words Based on Large LanguageModel [J]. Computer Science, 2025, 52(6A): 240400139-7. |
[5] | HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4. |
[6] | ZHAO Zheyu, WANG Zhongqing, WANG Hongling. Commodity Attribute Classification Method Based on Dual Pre-training [J]. Computer Science, 2025, 52(6A): 240500127-8. |
[7] | TU Ji, XIAO Wendong, TU Wenji, LI Lijian. Application of Large Language Models in Medical Education:Current Situation,Challenges and Future [J]. Computer Science, 2025, 52(6A): 240400121-6. |
[8] | LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7. |
[9] | HOU Zhexiao, LI Bicheng, CAI Bingyan, XU Yifei. High Quality Image Generation Method Based on Improved Diffusion Model [J]. Computer Science, 2025, 52(6A): 240500094-9. |
[10] | GENG Sheng, DING Weiping, JU Hengrong, HUANG Jiashuang, JIANG Shu, WANG Haipeng. FDiff-Fusion:Medical Image Diffusion Fusion Network Segmentation Model Driven Based onFuzzy Logic [J]. Computer Science, 2025, 52(6): 274-285. |
[11] | GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329. |
[12] | KANG Kai, WANG Jiabao, XU Kun. Balancing Transferability and Imperceptibility for Adversarial Attacks [J]. Computer Science, 2025, 52(6): 381-389. |
[13] | CHEN Xuhao, HU Sipeng, LIU Hongchao, LIU Boran, TANG Dan, ZHAO Di. Research on LLM Vector Dot Product Acceleration Based on RISC-V Matrix Instruction Set Extension [J]. Computer Science, 2025, 52(5): 83-90. |
[14] | CONG Yingnan, HAN Linrui, MA Jiayu, ZHU Jinqing. Research on Intelligent Judgment of Criminal Cases Based on Large Language Models [J]. Computer Science, 2025, 52(5): 248-259. |
[15] | ZHU Shucheng, HUO Hongying, WANG Weikang, LIU Ying, LIU Pengyuan. Automatic Optimization and Evaluation of Prompt Fairness Based on Large Language Model Itself [J]. Computer Science, 2025, 52(4): 240-248. |
|