Computer Science ›› 2022, Vol. 49 ›› Issue (6A): 301-308.doi: 10.11896/jsjkx.210300134
• Image Processing & Multimedia Technology • Previous Articles Next Articles
LIU Chang, WEI Wei-min, MENG Fan-xing, CAI Zhi
CLC Number:
[1] ABE M,NAKAMURA S,SHIKANO K,et al.Voice conversionthrough vector quantization[C]//International Conference on Acoustics,Speech,and Signal Processing(ICASSP-88).New York,USA,1988:655-658. [2] STYLIANOU Y,CAPPE O,MOULINES E.Continuous probabilistic transform for voice conversion[J].IEEE Transactions on Speech and Audio Processing,1998,6(2):131-142. [3] TAMAMORI A,HAYASHI T,KOBAYASHI K,et al.Speaker-dependent wavenet vocoder[C]//Proceedings of Interspeech.2017:1118-1122. [4] LING Z H,DENG L,YU D.Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis[C]//IEEE International Conference on Acoustics.IEEE,2013. [5] KANEKO T,KAMEOKA H,HIRAMATSU K,et al.Sequence-to-sequence voice conversion with similarity metric learned using generative adversarial networks[C]//Proceedings of the Interspeech,Stockholm.2017:1283-1287. [6] HAYASHI T,TAMAMORI A,KOBAYASHI K,et al.An in-vestigation of multi-speaker training for waveNet vocoder[C]//IEEE Automatic Speech Recognition and Under-standing Workshop(ASRU).Okinawa,2017:712-718. [7] ZHANG X W,MIAO X K,ZENG X,et al.Re-search status and Prospect of speech conversion technology[J].Data Acquisition and Processing,2019,34(5):753-770 [8] NARENDRANATH M,MURTHY H A,RAJENDRAN S.Transformation of formants for voice conversion using artificial neural networks[J].Speech Communication,1995,16(2):207-216. [9] KAWAHARA H.Speech representation and transformationusing adaptive interpolation of weighted spectrum:vocoder revisited[C]//IEEE International Conference on Acoustics,Speech,and Signal Processing.Munich,1997:1303-1306. [10] AL-RADHI M S,CSAPÓ T G,NÉMETH G.Continuous voco-der applied in deep neural network based voice conversion[J].Multimed Tools Appl,2019,78:33549-33572. [11] KOBAYASHI K,HAYASHI T,TAMAMORI A,et al.Statistical voice conversion with wavenet-based waveform generation[C]//Proc. Interspeech.2017:1138-1142. [12] OORD A V D,DIELEMAN S,ZEN H,et al.WaveNet:A gene-rative model for raw audio [EB/OL].(2016-09-12).https://arXiv.org/abs/1609.03499. [13] NIWA J,YOSHIMURA T,HASHIMOTO K,et al.Statistical voice conversion with WaveNet vocoder[J].arXiv:1907.08940,2020. [14] HAYASHI T,TAMAMORI A,KOBAYASHI K,et al.An in-vestigation of multi-speaker training for wavenet vocoder[C]//IEEE Automatic Speech Recognition and Under-standing Workshop(ASRU).Okinawa,2017:712-718. [15] SHEN J,PANG R,WEISS R J,et al.Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions[EB/OL].(2017-12-16).https://arxiv.org/abs/1712.05881. [16] KOBAYASHI K,HAYASHI T,TAMAMO-RI A,et al.Statistical voice conversion with WaveNet-based waveform generation[C]//Interspeech 2017.Stockholm,Sweden,2017:20-24. [17] CHEN K,CHEN B,LAI J H,et al.High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder[C]//Interspeech.2018. [18] SISMAN B,ZHANG M,LI H.Group Sparse Representation with WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2019,27(6):1085-1097. [19] WU Y C,KOBAYASHI K,HAYASHI T,et al.CollapsedSpeech Segment Detection and Suppression for WaveNet Voco-der[C]//Interspeech.2018:1988-1992. [20] WU Y,TOBING P L,KOBAYASHI K,et al.Non-ParallelVoice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression[C]//IEEE Access.2020:62094-62106. [21] HELANDER E,SCHWARZ J,SILEN H,et al.On the impact of alignment on voice conversion performance[C]//9th Annual Conference of the International Speech Communication Association(INTERSPEECH 2008).Brisbane,Australia,2008:22-26. [22] HSU C C,HWANG H T,WU Y C.Dictionary Update forNMF-based Voice Conversion Using an Encoder-Decoder Network[J].arXiv:1610.03988v1,2016. [23] SHAH N J,PATIL H A.A novel approach to remove outliers for parallel voice conversion.[J].Computer Speech & Language 2019(58):127-152. [24] KAMEOKA H,TANAKA K,KWAS'NY D,et al.ConvS2S-VC:Fully Convolutional Sequence-to-Sequence Voice Conversion[C]//IEEE/ACM Transactions on Audio,Speech,and Language Processing.2020:1849-1863. [25] MOUCHTARIS A,DER SPIEGEL J V,MUELLER P.Nonpa-rallel training for voice conversion based on a parameter adap-tation approach[J].IEEE Trans.Audio,Speech,and Language Processing,2006,14(3):952-963. [26] DUXANS H,ERRO D,P′EREZ J.Voice Conversion of Non-aligned Data using Unit Selection[C]//TC-STAR Workshop on Speech-to-Speech Translation.Barcelona,Spain,2006:19-21. [27] LEE C H,WU C H.MAP-based adaptation for speech conversion using adaptation data selection and non-parallel training[C]//ICSLP,Ninth International Conference on Spoken Language Processing(INTERSPEECH 2006).Pittsburgh,PA,USA,2006:17-21. [28] ERRO D,MORENO A,BONAFONTE A.INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora[J].IEEE Transactions on Audio,Speech,and Language Processing,2010,18(5):944-953. [29] SAITO D,WATANABE S,NAKAMURA A,et al.Statistical voice conversion based on noisy channel model[J].IEEE Trans.Speech and Audio Processing,2012,20(6):1784-1794. [30] XIE F L,SOONG F K,LI H.A KL divergence and DNN-based approach to voice conversion without parallel training sentences[C]//Interspeech.2016:287-291. [31] KINNUNEN T,JUVELA L,ALKU P,etal.Non-parallelvoice conversion using i-vector PLDA:Towards unifying speaker verificationand transfor-mation[C]//Proc.IEEE Int.Conf.Acoust.,Speech,Signal Process.2017:5535-5539. [32] HSU C C,HWANG H T,WU Y C,et al.Voice conversion from non-parallel corpora using variational auto-encoder[C]//2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(APSIPA).2016:1-6. [33] HSU C C,HWANG H T,WU Y C,et al.Voice conversion from unaligned corpora using variational autoencoding Wasserstein generative adversarial network[C]//Interspeech.2017. [34] KANEKO T,KAMEOKA H.CycleGAN-VC:Non-parallelVoice Conversion Using Cycle-Consistent Adversarial Networks[C]//2018 26th European Signal Processing Conference(EUSIPCO).Rome,2018:2100-2104. [35] KANEKO T,KAMEOKA H,TANAKA K,et al.Cyclegan-VC2:Improved CycleGAN-based Non-parallel Voice Conversion[C]//ICASSP.2019:6820-6824. [36] KAMEOKA H,KANEKO T,TANAKA K,et al.StarGAN-VC:non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks[C]//IEEE Spoken Language Technology Workshop(SLT).Athens,Greece,2018:266-273. [37] KAMEOKA H,KANEKO T,TANAKA K.ACVAE-VC:Non-parallel many-to-many voice conversion with auxiliary classifier variational auto-encoder[J].arXiv:1806.02169,2018. [38] LU B.Research on speech Conversion technolog[D].Chengdu:University of Electronic Science and Technology of China,2016. [39] KAIN A,MACON M W.Spectral voice conversion for text-to-speech synthesis[C]//Proceedings of the 1998 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP '98).1998:285-288. [40] CHEN Y N,CHU M,CHANG C.Voice conversion withsmoothed GMM and MAP adaptation[C]//8th European Conference on Speech Communication and Technology(EU-ROSPEECH 2003-INTERSPEECH 2003).Geneva,Switzerland,2003:1-4. [41] TODA T,BLACK A W,TOKUDA K.Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Tra-jectory[J].IEEE Transactions on Audio,Speech,and Language Processing,2007,15(8):2222-2235. [42] TODA T,OHTANI Y,SHIKANO K.One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices[C]//2007 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP '07).Honolulu,HI,2007:1249-1252. [43] HELANDER E,VIRTANEN T,NURMINEN J,et al.VoiceConversion Using Partial Least Squares Regression[J].IEEE Transactions on Audio,Speech,and Language Processing,2010,18(5):912-921. [44] DAISUKE S,NOBUAKI M,KEIKICHI H.Tensor FactorAnalysis for Arbitrary Speaker Conversion[J].IEICE Transactions on Information and Systems,2020,103(6):1395-1405. [45] MOHAMMADI S H,KAIN A.Voice conversion using deep neural networks with speaker-independent pre-training[C]//2014 IEEE Spoken Language Technology Workshop(SLT).South Lake Tahoe,NV,2014:19-23. [46] MING H P,HUANG D Y,XIE L.Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion[C]//Interspeech 2016.2016:2016-1053. [47] CHEN L H,LIU L J,LING Z H.The USTC System for Voice Conversion Challenge 2016:Neural Network Based Approaches for Spectrum,Aperiodicity and F0 Conversion[C]//INTERSPEECH 2016.San Francisco,USA,2016:8-12. [48] KANEKO T,KAMEOKA H,HIRAMATSU K,et al.Sequence-to-Sequence Voice Conversion with Sim-ilarity Metric Learned Using Generative Adversarial Networks[C]//Interspeech 2017.Stockholm,Sweden,2017:20-24. [49] KANEKO T,KAMEOKA H,HOJO N.Generative adversative netword-based postfilter for statistical parametric speech synthesis[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).2017. [50] VEKKOT S,GUPTA D,ZAKARIAH M,et al.Emotional Voice Conversion Using a Hybrid Framework With Speaker-Adaptive DNN and Particle-Swarm-Optimized Neural Network[J].IEEE Access,2020,8:74627-74647. [51] STYLIANOU Y.Voice Transformation:A survey[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.Taipei,2009:3585-3588. [52] HIROSHI M,SHIZUO H,TOSGIO S,et al.MultidimensionalRepresentation of Personal Quality of Vowels and its Acoustical Correlates[J].IEEE Transactions on Audio Electroacoustics,1973,21(5):428-436. [53] KOBAYASHI K,TODA T,NAKAMURA S.Implementation of F0 transformation for statistical singing voice conversion based on direct waveform modification[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).Shanghai,2016:5670-5674. [54] LEE K S.Voice Conversion Using a Perceptual Criterion[J].Applied Sciences,2020,10(8):28-44. [55] KOMINEK J,BLACK A W.The CMU Arctic speech databases[C]//Proceedings of Isca Speech Synthesis Workshop.2004. [56] TODA T,CHEN L H,SAITO D,et al.The Voice Conversion Challenge[C]//Interspeech,2016.2016. [57] LORENZO-TRUEBA J,YAMAGISHI J,TODA T,et al.TheVoice Conversion Challenge 2018:Promoting Development of Parallel and Nonparallel Methods[C]//Odyssey 2018 The Speaker and Language Recognition Workshop.2018. [58] ZHAO Y,HUANG W C,TIAN X,et al.Voice ConversionChallenge 2020:Intra-lingual semi-parallel and cross-lingual voice conversion[C]//Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge.Shanghai,China,2020. [59] WU Y C,TOBING P L,HAYASH T.The NU Non-ParallelVoice Conversion System for the Voice Conversion Challenge[C]//Proc. Odyssey Speaker Lang Recognit Workshop.2018. [60] CHEN H J,LIANG Q Z,XIE L,et al.Unsupervised acoustic modeling based on DP-GMM:Parallel inference and feasibility study[C]//Proceedings of National Conference on Man-Machine Speech Communication(NCMMSC'2015).2015:69-70. [61] ZHOU Y,TIAN X,LI H.Multi-Task WaveRNN With an Integrated Architecture for Cross-Lingual Voice Conversion[J].IEEE Signal Processing Letters,2020,27:1310-1314. |
[1] | ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105. |
[2] | XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141. |
[3] | YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352. |
[4] | YIN Wen-bing, GAO Ge, ZENG Bang, WANG Xiao, CHEN Yi. Speech Enhancement Based on Time-Frequency Domain GAN [J]. Computer Science, 2022, 49(6): 187-192. |
[5] | AN Xin, DAI Zi-biao, LI Yang, SUN Xiao, REN Fu-ji. End-to-End Speech Synthesis Based on BERT [J]. Computer Science, 2022, 49(4): 221-226. |
[6] | CHENG Gao-feng, YAN Yong-hong. Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods [J]. Computer Science, 2022, 49(1): 47-52. |
[7] | YANG Run-yan, CHENG Gao-feng, LIU Jian. Study on Keyword Search Framework Based on End-to-End Automatic Speech Recognition [J]. Computer Science, 2022, 49(1): 53-58. |
[8] | PAN Xiao-qin, LU Tian-liang, DU Yan-hui, TONG Xin. Overview of Speech Synthesis and Voice Conversion Technology Based on Deep Learning [J]. Computer Science, 2021, 48(8): 200-208. |
[9] | YE Hong-liang, ZHU Wan-ning, HONG Lei. Music Style Transfer Method with Human Voice Based on CQT and Mel-spectrum [J]. Computer Science, 2021, 48(6A): 326-330. |
[10] | ZHANG Zi-cheng, TAN Zhi-wei, ZHANG Chen-rui, WANG Xuan, LIU Xiao-xuan, YU Yi-biao. Speech Endpoint Detection Based on Bayesian Decision of Logarithmic Power Spectrum Ratio in High and Low Frequency Band [J]. Computer Science, 2021, 48(6A): 33-37. |
[11] | LI Yu-rong, LIU Jie, LIU Ya-lin, GONG Chun-ye, WANG Yong. Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation [J]. Computer Science, 2020, 47(8): 49-55. |
[12] | ZHENG Chun-jun, WANG Chun-li, JIA Ning. Survey of Acoustic Feature Extraction in Speech Tasks [J]. Computer Science, 2020, 47(5): 110-119. |
[13] | LIU Xin-yi,TIAN Wei-wei,LIANG Wen-ru,HE Ling,YIN Heng. Automatic Detection Algorithm of Nasal Leak in Cleft Palate Speech Based on Recursive Plot Analysis [J]. Computer Science, 2020, 47(2): 95-101. |
[14] | ZHANG Jing, YANG Jian, SU Peng. Survey of Monosyllable Recognition in Speech Recognition [J]. Computer Science, 2020, 47(11A): 172-174. |
[15] | ZHANG Mei-yu, LIU Yue-hui, QIN Xu-jia, WU Liang-wu. Neural Style Transfer Method Based on Laplace Operator to Suppress Artifacts [J]. Computer Science, 2020, 47(11A): 209-214. |
|