基于深度神经网络的汉语语音合成的研究

计算机科学 ›› 2015, Vol. 42 ›› Issue (Z6): 75-78.

基于深度神经网络的汉语语音合成的研究

王坚,张媛媛

中央财经大学信息学院北京100081,中央财经大学信息学院北京100081

出版日期:2018-11-14 发布日期:2018-11-14
基金资助:
本文受中央财经大学重点学科建设项目,北京高等学校青年英才计划项目(YETP0988)资助

Title Research on Deep Neural Network Based Chinese Speech Synthesis

WANG Jian and ZHANG Yuan-yuan

Online:2018-11-14 Published:2018-11-14

摘要/Abstract

摘要： 为了提高基于HMM的语音合成的音质,探讨了不同的结构和参数对深度神经网络(DNN)训练的影响,并证明了DNN判别S/U/V的有效性；完成了DNN对HMM合成系统的合成语音谱参向原始语音进行转换。进一步地,探讨了对暂时分解(TD)算法得到的参数进行转换的方案,对TD分解得到的事件向量进行DNN训练,建立转换模型,并同未转换的事件函数进行再合成。实验证明,用DNN转换合成后的频谱更接近原始频谱；主观评测表明,该方法能有效地改善合成语音的音质。

Abstract: In order to improve the quality of speech synthesis based on HMM,this paper discussed the different structure and parameters on the effect of DNN training and demonstrated the validity of DNN discriminating S/U/V.The paper finished the speech synthesis of DNN on the HMM synthesis system were converted to the original speech spectrum parameter.Then,we studied on temporal decomposition(TD) algorithm to get the parameters of conversion program,and for DNN training set up the conversion model and event with no conversion function resynthesis of event vectors.The experiment proves that DNN conversion spectrum synthesis is closer to the original spectrum,and the subjective evaluation shows that this method can effectively improve the synthesized speech quality.

Key words: HTS,DNN,Deep leaning,Voice conversion,Temporal decomposition

王坚,张媛媛. 基于深度神经网络的汉语语音合成的研究[J]. 计算机科学, 2015, 42(Z6): 75-78. https://doi.org/

WANG Jian and ZHANG Yuan-yuan. Title Research on Deep Neural Network Based Chinese Speech Synthesis[J]. Computer Science, 2015, 42(Z6): 75-78. https://doi.org/

参考文献

[1] 井晓阳,罗飞,王亚棋.汉语语音合成技术综述[J].计算机科学,2012,9(Z3):386-391
[2] 赵鸿图,刘云.改进粒子群算法的小波神经网络语音去噪[J].计算机测量与控制,2013,21(10):2799-2802
[3] 赵建东,高光来,飞龙.蒙古语语音合成语料库标注规则的设计[J].内蒙古大学学报:自然科学版,2013,44(3):51-55
[4] 胡郁,凌震华,王仁华,等.基于声学统计建模的语音合成技术研究[J].中文信息学报,2011,25(6):275-279
[5] 宋阳.基于统计声学建模的单元挑选语音合成方法研究[D].合肥:中国科学技术大学,2014
[6] 赵力.语音信号处理(第2版)[M].北京:机械工业出版社,2011
[7] 孙志军,薛磊,许阳明,等.深度学习研究综述[J].计算机应用研究,2012,29(8):2806-2810
[8] Nandasena A,Nguyen P C,Akagi M.Spectral stability basedevent localizingtemporal decomposition[J].Computer Speech and Language,2011,15(4):381-401
[9] 殷力昂.一种在深度结构中学习原型的分类方法[D].上海:上海交通大学,2012

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed