计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 273-279.doi: 10.11896/jsjkx.230900006

• 人工智能 • 上一篇    下一篇

基于预训练模型的多音字消歧方法

高贝贝, 张仰森   

  1. 北京信息科技大学智能信息处理研究所 北京 100192
  • 收稿日期:2023-09-04 修回日期:2024-02-08 出版日期:2024-11-15 发布日期:2024-11-06
  • 通讯作者: 张仰森(zhangyangsen@163.com)
  • 作者简介:(beibgao@163.com)
  • 基金资助:
    国家自然科学基金(62176023)

Polyphone Disambiguation Based on Pre-trained Model

GAO Beibei, ZHANG Yangsen   

  1. Institution of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100192,China
  • Received:2023-09-04 Revised:2024-02-08 Online:2024-11-15 Published:2024-11-06
  • About author:GAO Beibei,born in 2000,postgra-duate.Her main research interests include natural language processing and machine learning.
    ZHANG Yangsen,born in 1962,postdoc-tor,professor,Ph.D supervisor,is a member of CCF(No.16640S).His main research interests include natural language processing and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(62176023).

摘要: 字音转换是中文语音合成系统(Text-To-Speech,TTS)的重要组成部分,其核心问题是多音字消歧,即在若干候选读音中为多音字选择一个正确的发音。现有的方法通常无法充分理解多音字所在词语的语义,且多音字数据集存在分布不均衡的问题。针对以上问题,提出了一种基于预训练模型RoBERTa的多音字消歧方法CLTRoBERTa(Cross-lingual Translation RoBERTa)。首先联合跨语言互译模块获得多音字所在词语的另一种语言翻译,并将其作为额外特征输入模型以提升对词语的语义理解,然后使用判别微调中的层级学习率优化策略来适应神经网络不同层之间的学习特性,最后结合样本权重模块以解决多音字数据集的分布不均衡问题。CTLRoBERTa平衡了数据集的不均衡分布带来的性能差异,并且在CPP(Chinese Polyphone with Pinyin)基准数据集上取得了99.08%的正确率,性能优于其他基线模型。

关键词: 多音字消歧, 预训练模型, 字音转换, 跨语言互译, 层级学习率, 样本权重

Abstract: Grapheme-to-phoneme conversion(G2P) is an important part of the Chinese text-to-speech system(TTS).The key issue of G2P is to select the correct pronunciation for polyphonic characters among several alternatives.Existing methods usually struggle to fully grasp the semantics of words that contain polyphonic characters,and fail to effectively handle the imbalanced distribution in datasets.To solve these problems,this paper proposes a polyphone disambiguation method based on the pre-trained model RoBERTa,called cross-lingual translation RoBERTa(CLTRoBERTa).Firstly,the cross-lingual translation module gene-rates another translation of the word containing the polyphonic character as an additional input feature to improve the model’s semantic comprehension.Secondly,the hierarchical learning rate optimization strategy is employed to adapt the different layers of the neural network.Finally,the model is enhanced with the sample weight module to address the imbalanced distribution in the dataset.Experimental results show that CLTRoBERTa mitigates performance differences caused by uneven dataset distribution and achieves a 99.08% accuracy on the public Chinese polyphone with pinyin(CPP) dataset,outperforming other baseline models.

Key words: Polyphone disambiguation, Pre-trained model, Grapheme-to-phoneme conversion, Cross-lingual translation, Hierarchical learning rate, Sample weight

中图分类号: 

  • TP391
[1] BRUGUIER A,BAKHTIN A,SHARMA D.Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction[C]//INTERS-PEECH.2018:3733-3737.
[2] HE M,YANG J,HE L,et al.Neural lexicon reader:Reduce pro-nunciation errors in end-to-end tts by leveraging external textual knowledge[J].arXiv:2110.09698,2021.
[3] GOU D,LUO W.Processing of polyphone character in chinese tts system[J].Chinese Information,1991,1:33-36.
[4] DONG H,TAO J,XU B.Grapheme-to-phoneme conversion in Chinese TTS system[C]//2004 International Symposium on Chinese Spoken Language Processing.IEEE,2004:165-168.
[5] ZHANG Z R,CHU M,CHANG E.An efficient way to learnrules for grapheme-to-phoneme conversion in Chinese[C]//International Symposium on Chinese Spoken Language Proces-sing.2002.
[6] LIU F,ZHOU Y.Polyphone disambiguation based on tree-guided tbl[J].Computer Engineering and Applications,2011,47(12):137-140.
[7] LIU F,SHI Q,TAO J.Maximum entropy based homograph disambiguation[C]//NCMMSC2007.2007:41-46.
[8] SHAN C,XIE L,YAO K.A bi-directional lstm approach for polyphone disambiguation in mandarin chinese[C]//2016 10th International Symposium on Chinese Spoken Language Proces-sing(ISCSLP).IEEE,2016:1-5.
[9] CAI Z,YANG Y,ZHANG C,et al.Polyphone disambiguationfor mandarin chinese using conditional neural network with multi-level embedding features[J].arXiv:1907.01749,2019.
[10] ZHANG H,PAN H,LI X.A Mask-Based Model for Mandarin Chinese Polyphone Disambiguation[C]//INTERSPEECH.2020:1728-1732.
[11] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//International Conference on Machine Learning.PMLR,2017:1243-1252.
[12] ZHANG H T.Polyphone Disambiguation in Chinese by Using FLAT[C]//INTERSPEECH.2021.
[13] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[14] LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[J].arXiv:1907.11692,2019.
[15] LAN Z,CHEN M,GOODMAN S,et al.Albert:A lite bert for self-supervised learning of language representations[J].arXiv:1909.11942,2019.
[16] CLARK K,LUONG M T,LE Q V,et al.Electra:Pre-training text encoders as discriminators rather than generators[J].ar-Xiv:2003.10555,2020.
[17] DAI D,WU Z,KANG S,et al.Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT[C]//INTERSPEECH.2019:2090-2094.
[18] ZHANG S,ZHENG K,ZHU X,et al.A Poly-phone BERT for Polyphone Disambiguation in Mandarin Chinese[J].arXiv:2207.12089,2022.
[19] SHI Y,WANG C,CHEN Y,et al.Polyphone disambiguation in mandarin chinese with semi-supervised learning[J].arXiv:2102.00621,2021.
[20] BROWN P F,DELLA PIETRA S A,DELLA PIETRA V J,et al.The mathematics of statistical machine translation:Para-meter estimation[J].Computational linguistics,1993,19(2):263-311.
[21] HOWARD J,RUDER S.Universal language model fine-tuningfor text classification[J].arXiv:1801.06146,2018.
[22] PARK K,LEE S.g2pm:A neural grapheme-to-phoneme conversion package for mandarin chinese based on a new open benchmark dataset[J].arXiv:2004.03136,2020.
[23] GAO Y,XIONG Y J,YE J C.Double-Weighted Disambiguation Algorithm for Long-tail Polyphone Problem[J].Journal of Chinese Information Processing,2022,36(11):169-176.
[24] ZHANG S,LI Z,YAN S,et al.Distribution alignment:A unified framework for long-tail visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:2361-2370.
[25] ZHANG J,ZHAO Y,ZHU J,et al.Distant Supervision for Poly-phone Disambiguation in Mandarin Chinese[C]//INTERSPEECH.2020:1753-1757.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!