基于预训练模型的多音字消歧方法

doi:10.11896/jsjkx.230900006

Abstract

Abstract: Grapheme-to-phoneme conversion(G2P) is an important part of the Chinese text-to-speech system(TTS).The key issue of G2P is to select the correct pronunciation for polyphonic characters among several alternatives.Existing methods usually struggle to fully grasp the semantics of words that contain polyphonic characters,and fail to effectively handle the imbalanced distribution in datasets.To solve these problems,this paper proposes a polyphone disambiguation method based on the pre-trained model RoBERTa,called cross-lingual translation RoBERTa(CLTRoBERTa).Firstly,the cross-lingual translation module gene-rates another translation of the word containing the polyphonic character as an additional input feature to improve the model’s semantic comprehension.Secondly,the hierarchical learning rate optimization strategy is employed to adapt the different layers of the neural network.Finally,the model is enhanced with the sample weight module to address the imbalanced distribution in the dataset.Experimental results show that CLTRoBERTa mitigates performance differences caused by uneven dataset distribution and achieves a 99.08% accuracy on the public Chinese polyphone with pinyin(CPP) dataset,outperforming other baseline models.

Key words: Polyphone disambiguation, Pre-trained model, Grapheme-to-phoneme conversion, Cross-lingual translation, Hierarchical learning rate, Sample weight

CLC Number:

TP391

GAO Beibei, ZHANG Yangsen. Polyphone Disambiguation Based on Pre-trained Model[J].Computer Science, 2024, 51(11): 273-279.

References

[1] BRUGUIER A,BAKHTIN A,SHARMA D.Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction[C]//INTERS-PEECH.2018:3733-3737.
[2] HE M,YANG J,HE L,et al.Neural lexicon reader:Reduce pro-nunciation errors in end-to-end tts by leveraging external textual knowledge[J].arXiv:2110.09698,2021.
[3] GOU D,LUO W.Processing of polyphone character in chinese tts system[J].Chinese Information,1991,1:33-36.
[4] DONG H,TAO J,XU B.Grapheme-to-phoneme conversion in Chinese TTS system[C]//2004 International Symposium on Chinese Spoken Language Processing.IEEE,2004:165-168.
[5] ZHANG Z R,CHU M,CHANG E.An efficient way to learnrules for grapheme-to-phoneme conversion in Chinese[C]//International Symposium on Chinese Spoken Language Proces-sing.2002.
[6] LIU F,ZHOU Y.Polyphone disambiguation based on tree-guided tbl[J].Computer Engineering and Applications,2011,47(12):137-140.
[7] LIU F,SHI Q,TAO J.Maximum entropy based homograph disambiguation[C]//NCMMSC2007.2007:41-46.
[8] SHAN C,XIE L,YAO K.A bi-directional lstm approach for polyphone disambiguation in mandarin chinese[C]//2016 10th International Symposium on Chinese Spoken Language Proces-sing(ISCSLP).IEEE,2016:1-5.
[9] CAI Z,YANG Y,ZHANG C,et al.Polyphone disambiguationfor mandarin chinese using conditional neural network with multi-level embedding features[J].arXiv:1907.01749,2019.
[10] ZHANG H,PAN H,LI X.A Mask-Based Model for Mandarin Chinese Polyphone Disambiguation[C]//INTERSPEECH.2020:1728-1732.
[11] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//International Conference on Machine Learning.PMLR,2017:1243-1252.
[12] ZHANG H T.Polyphone Disambiguation in Chinese by Using FLAT[C]//INTERSPEECH.2021.
[13] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[14] LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[J].arXiv:1907.11692,2019.
[15] LAN Z,CHEN M,GOODMAN S,et al.Albert:A lite bert for self-supervised learning of language representations[J].arXiv:1909.11942,2019.
[16] CLARK K,LUONG M T,LE Q V,et al.Electra:Pre-training text encoders as discriminators rather than generators[J].ar-Xiv:2003.10555,2020.
[17] DAI D,WU Z,KANG S,et al.Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT[C]//INTERSPEECH.2019:2090-2094.
[18] ZHANG S,ZHENG K,ZHU X,et al.A Poly-phone BERT for Polyphone Disambiguation in Mandarin Chinese[J].arXiv:2207.12089,2022.
[19] SHI Y,WANG C,CHEN Y,et al.Polyphone disambiguation in mandarin chinese with semi-supervised learning[J].arXiv:2102.00621,2021.
[20] BROWN P F,DELLA PIETRA S A,DELLA PIETRA V J,et al.The mathematics of statistical machine translation:Para-meter estimation[J].Computational linguistics,1993,19(2):263-311.
[21] HOWARD J,RUDER S.Universal language model fine-tuningfor text classification[J].arXiv:1801.06146,2018.
[22] PARK K,LEE S.g2pm:A neural grapheme-to-phoneme conversion package for mandarin chinese based on a new open benchmark dataset[J].arXiv:2004.03136,2020.
[23] GAO Y,XIONG Y J,YE J C.Double-Weighted Disambiguation Algorithm for Long-tail Polyphone Problem[J].Journal of Chinese Information Processing,2022,36(11):169-176.
[24] ZHANG S,LI Z,YAN S,et al.Distribution alignment:A unified framework for long-tail visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:2361-2370.
[25] ZHANG J,ZHAO Y,ZHU J,et al.Distant Supervision for Poly-phone Disambiguation in Mandarin Chinese[C]//INTERSPEECH.2020:1753-1757.

Related Articles 14

[1]	WENG Yu, LUO Haoyu, Chaomurilige, LIU Xuan , DONG Jun, LIU Zheng. CINOSUM:An Extractive Summarization Model for Low-resource Multi-ethnic Language [J]. Computer Science, 2024, 51(7): 296-302.
[2]	LIAO Meng, JIA Zhen, LI Tianrui. Chinese Named Entity Recognition Based on Label Information Fusion and Multi-task Learning [J]. Computer Science, 2024, 51(3): 198-204.
[3]	QIN Xianping, DING Zhaoxu, ZHONG Guoqiang, WANG Dong. Deep Learning-based Method for Mining Ocean Hot Spot News [J]. Computer Science, 2024, 51(11A): 231200005-10.
[4]	SU Qi, WANG Hongling, WANG Zhongqing. Unsupervised Script Summarization Based on Pre-trained Model [J]. Computer Science, 2023, 50(2): 310-316.
[5]	WANG Zhendong, DONG Kaikun, HUANG Junheng, WANG Bailing. SemFA:Extreme Multi-label Text Classification Model Based on Semantic Features and Association Attention [J]. Computer Science, 2023, 50(12): 270-278.
[6]	KANG Mengyao, LIU Yang, HUANG Junheng, WANG Bailing, LIU Shulong. Chat Dialogue Summary Model Based on Multi-granularity Contrastive Learning [J]. Computer Science, 2023, 50(11): 192-200.
[7]	WANG Can, LIU Yong-jian, XIE Qing, MA Yan-chun. Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization [J]. Computer Science, 2022, 49(8): 157-164.
[8]	YAO Yi, YANG Fan. Chinese Keyword Extraction Method Combining Knowledge Graph and Pre-training Model [J]. Computer Science, 2022, 49(10): 243-251.
[9]	TANG Shi-zheng, ZHANG Yan-feng. DragDL:An Easy-to-Use Graphical DL Model Construction System [J]. Computer Science, 2021, 48(8): 220-225.
[10]	WANG Sheng, ZHANG Yang-sen, CHEN Ruo-yu, XIANG Ga. Text Matching Method Based on Fine-grained Difference Features [J]. Computer Science, 2021, 48(8): 60-65.
[11]	MA Chuang, TIAN Qing, SUN He-yang, CAO Meng, MA Ting-huai. Unsupervised Domain Adaptation Based on Weighting Dual Biases [J]. Computer Science, 2021, 48(2): 217-223.
[12]	HUANG Xin, LEI Gang, CAO Yuan-long, LU Ming-ming. Review on Interactive Question Answering Techniques Based on Deep Learning [J]. Computer Science, 2021, 48(12): 286-296.
[13]	ZHANG Yu-shuai, ZHAO Huan, LI Bo. Semantic Slot Filling Based on BERT and BiLSTM [J]. Computer Science, 2021, 48(1): 247-252.
[14]	GUO Wei, YU Jian-jiang, TANG Ke-ming, XU Tao. Survey of Online Sequential Extreme Learning Algorithms for Dynamic Data Stream Analysis [J]. Computer Science, 2019, 46(4): 1-7.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Polyphone Disambiguation Based on Pre-trained Model

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 14

Metrics

Comments

Recommended 0