计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 273-279.doi: 10.11896/jsjkx.230900006
高贝贝, 张仰森
GAO Beibei, ZHANG Yangsen
摘要: 字音转换是中文语音合成系统(Text-To-Speech,TTS)的重要组成部分,其核心问题是多音字消歧,即在若干候选读音中为多音字选择一个正确的发音。现有的方法通常无法充分理解多音字所在词语的语义,且多音字数据集存在分布不均衡的问题。针对以上问题,提出了一种基于预训练模型RoBERTa的多音字消歧方法CLTRoBERTa(Cross-lingual Translation RoBERTa)。首先联合跨语言互译模块获得多音字所在词语的另一种语言翻译,并将其作为额外特征输入模型以提升对词语的语义理解,然后使用判别微调中的层级学习率优化策略来适应神经网络不同层之间的学习特性,最后结合样本权重模块以解决多音字数据集的分布不均衡问题。CTLRoBERTa平衡了数据集的不均衡分布带来的性能差异,并且在CPP(Chinese Polyphone with Pinyin)基准数据集上取得了99.08%的正确率,性能优于其他基线模型。
中图分类号:
[1] BRUGUIER A,BAKHTIN A,SHARMA D.Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction[C]//INTERS-PEECH.2018:3733-3737. [2] HE M,YANG J,HE L,et al.Neural lexicon reader:Reduce pro-nunciation errors in end-to-end tts by leveraging external textual knowledge[J].arXiv:2110.09698,2021. [3] GOU D,LUO W.Processing of polyphone character in chinese tts system[J].Chinese Information,1991,1:33-36. [4] DONG H,TAO J,XU B.Grapheme-to-phoneme conversion in Chinese TTS system[C]//2004 International Symposium on Chinese Spoken Language Processing.IEEE,2004:165-168. [5] ZHANG Z R,CHU M,CHANG E.An efficient way to learnrules for grapheme-to-phoneme conversion in Chinese[C]//International Symposium on Chinese Spoken Language Proces-sing.2002. [6] LIU F,ZHOU Y.Polyphone disambiguation based on tree-guided tbl[J].Computer Engineering and Applications,2011,47(12):137-140. [7] LIU F,SHI Q,TAO J.Maximum entropy based homograph disambiguation[C]//NCMMSC2007.2007:41-46. [8] SHAN C,XIE L,YAO K.A bi-directional lstm approach for polyphone disambiguation in mandarin chinese[C]//2016 10th International Symposium on Chinese Spoken Language Proces-sing(ISCSLP).IEEE,2016:1-5. [9] CAI Z,YANG Y,ZHANG C,et al.Polyphone disambiguationfor mandarin chinese using conditional neural network with multi-level embedding features[J].arXiv:1907.01749,2019. [10] ZHANG H,PAN H,LI X.A Mask-Based Model for Mandarin Chinese Polyphone Disambiguation[C]//INTERSPEECH.2020:1728-1732. [11] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//International Conference on Machine Learning.PMLR,2017:1243-1252. [12] ZHANG H T.Polyphone Disambiguation in Chinese by Using FLAT[C]//INTERSPEECH.2021. [13] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [14] LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[J].arXiv:1907.11692,2019. [15] LAN Z,CHEN M,GOODMAN S,et al.Albert:A lite bert for self-supervised learning of language representations[J].arXiv:1909.11942,2019. [16] CLARK K,LUONG M T,LE Q V,et al.Electra:Pre-training text encoders as discriminators rather than generators[J].ar-Xiv:2003.10555,2020. [17] DAI D,WU Z,KANG S,et al.Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT[C]//INTERSPEECH.2019:2090-2094. [18] ZHANG S,ZHENG K,ZHU X,et al.A Poly-phone BERT for Polyphone Disambiguation in Mandarin Chinese[J].arXiv:2207.12089,2022. [19] SHI Y,WANG C,CHEN Y,et al.Polyphone disambiguation in mandarin chinese with semi-supervised learning[J].arXiv:2102.00621,2021. [20] BROWN P F,DELLA PIETRA S A,DELLA PIETRA V J,et al.The mathematics of statistical machine translation:Para-meter estimation[J].Computational linguistics,1993,19(2):263-311. [21] HOWARD J,RUDER S.Universal language model fine-tuningfor text classification[J].arXiv:1801.06146,2018. [22] PARK K,LEE S.g2pm:A neural grapheme-to-phoneme conversion package for mandarin chinese based on a new open benchmark dataset[J].arXiv:2004.03136,2020. [23] GAO Y,XIONG Y J,YE J C.Double-Weighted Disambiguation Algorithm for Long-tail Polyphone Problem[J].Journal of Chinese Information Processing,2022,36(11):169-176. [24] ZHANG S,LI Z,YAN S,et al.Distribution alignment:A unified framework for long-tail visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:2361-2370. [25] ZHANG J,ZHAO Y,ZHU J,et al.Distant Supervision for Poly-phone Disambiguation in Mandarin Chinese[C]//INTERSPEECH.2020:1753-1757. |
|