基于CQT和梅尔频谱的带有人声的音乐风格转换方法

doi:10.11896/jsjkx.200900104

Computer Science ›› 2021, Vol. 48 ›› Issue (6A): 326-330.doi: 10.11896/jsjkx.200900104

• Intelligent Computing • Previous Articles Next Articles

Music Style Transfer Method with Human Voice Based on CQT and Mel-spectrum

YE Hong-liang, ZHU Wan-ning, HONG Lei

School of Software Engineering,Jinling Institute of Technology,Nanjing 211100,China

Online:2021-06-10 Published:2021-06-17
About author:YE Hong-liang,born in 1999.His main research interests include deep learning and music processing.
ZHU Wan-ning,born in 1983,Ph.D.His main research interests include quantum information technology and quantum computing.
Supported by:
Jinling Institute of Technology High-level Talent Research Startup Fund Support(jit-b-201624),Jiangsu Province University Student Innovation Training Program Project(202013573045Y) and Jiangsu University Philosophy and Social Science Foundation Project(2019SJA0485).

Abstract

Abstract: In recent years,the generative confrontation network has performed well in the field of image style transfer,but its performance in the field of music is average.The existing music style transfer has poor effect on the style transfer of music with human voice.In order to solve these problems,the CQT feature and Mel spectrum feature of the music are extracted,and then CycleGAN is used to transfer the style of the combined feature of CQT feature and Mel spectrum.Finally,the WaveNet vocoder is used to decode the migrated spectrum.Finally,we realize the style transfer of music with vocals.The proposed model is evaluated on the public data set FMA,and the average style transfer rate of music that meets the requirements reaches 94.07%.Compared with other algorithms,the style transfer rate and audio quality of the music produced by this method are better than other algorithms.

Key words: Generative adversarial networks, Music processing, Representation learning, Style transfer

CLC Number:

TP183

YE Hong-liang, ZHU Wan-ning, HONG Lei. Music Style Transfer Method with Human Voice Based on CQT and Mel-spectrum[J].Computer Science, 2021, 48(6A): 326-330.

References

[1] JING Y,YANG Y,FENG Z,et al.Neural style transfer:A review[J].IEEE Transactions on Visualization and Computer Graphics,2019,26(11):3365-3385.
[2] DAI S,ZHANG Z,XIA G G.Music style transfer:A position paper[J].arXiv:1803.06841,2018.
[3] GATYS L A,ECKER A S,BETHGE M.A neural algorithm of artistic style[J].arXiv:1508.06576,2015.
[4] ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2223-2232.
[5] YI Z,ZHANG H,TAN P,et al.Dualgan:Unsupervised duallearning for image-to-image translation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2849-2857.
[6] KIM T,CHA M,KIM H,et al.Learning to discover cross-domain relations with generative adversarial networks[J].arXiv:1703.05192,2017.
[7] BRUNNER G,WANG Y,WATTENHOFER R,et al.Symbolic music genre transfer with cyclegan[C]//2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).IEEE,2018:786-793.
[8] HUANG S,LI Q,ANIL C,et al.Timbretron:A wavenet (cyclegan (cqt (audio))) pipeline for musical timbre transfer[J].arXiv:1811.09620,2018.
[9] MOR N,WOLF L,POLYAK A,et al.A universal music translation network[J].arXiv:1805.07848,2018.
[10] GOSODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems.2014:2672-2680.
[11] OORD A,DIELEMAN S,ZEN H,et al.Wavenet:A generative model for raw audio[J].arXiv:1609.03499,2016.
[12] POLYAK A,WOLF L.Attention-based wavenet autoencoderfor universal voice conversion[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2019:6800-6804.
[13] ODENA A,DUMOULIN V,OLAH C.Deconvolution andcheckerboard artifacts[J].Distill,2016,1(10):e3.
[14] ENGEL J,RESNICK C,ROBERTS A,et al.Neural audio synthesis of musical notes with wavenet autoencoders[C]//International Conference on Machine Learning.PMLR,2017:1068-1077.
[15] DEFFERRARD M,BENZI K,VANDERGHEYNST P,et al.Fma:A dataset for music analysis[J].arXiv:1612.01840,2016.
[16] WU M,LIU X.A Double Weighted KNN Algorithm and ItsApplication in the Music Genre Classification[C]//2019 6th International Conference on Dependable Systems and Their Applications(DSA).IEEE,2020:335-340.

Related Articles 15

[1]	SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[2]	HUANG Li, ZHU Yan, LI Chun-ping. Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning [J]. Computer Science, 2022, 49(9): 76-82.
[3]	XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[4]	LI Zong-min, ZHANG Yu-peng, LIU Yu-jie, LI Hua. Deformable Graph Convolutional Networks Based Point Cloud Representation Learning [J]. Computer Science, 2022, 49(8): 273-278.
[5]	ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.
[6]	HUANG Pu, DU Xu-ran, SHEN Yang-yang, YANG Zhang-jing. Face Recognition Based on Locality Regularized Double Linear Reconstruction Representation [J]. Computer Science, 2022, 49(6A): 407-411.
[7]	XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190.
[8]	LIU Chang, WEI Wei-min, MENG Fan-xing, CAI Zhi. Research Progress on Speech Style Transfer [J]. Computer Science, 2022, 49(6A): 301-308.
[9]	YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[10]	XU Hui, KANG Jin-meng, ZHANG Jia-wan. Digital Mural Inpainting Method Based on Feature Perception [J]. Computer Science, 2022, 49(6): 217-223.
[11]	DOU Zhi, WANG Ning, WANG Shi-jie, WANG Zhi-hui, LI Hao-jie. Sketch Colorization Method with Drawing Prior [J]. Computer Science, 2022, 49(4): 195-202.
[12]	GAO Zhi-yu, WANG Tian-jing, WANG Yue, SHEN Hang, BAI Guang-wei. Traffic Prediction Method for 5G Network Based on Generative Adversarial Network [J]. Computer Science, 2022, 49(4): 321-328.
[13]	LI Si-quan, WAN Yong-jing, JIANG Cui-ling. Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal [J]. Computer Science, 2022, 49(3): 179-184.
[14]	TAN Xin-yue, HE Xiao-hai, WANG Zheng-yong, LUO Xiao-dong, QING Lin-bo. Text-to-Image Generation Technology Based on Transformer Cross Attention [J]. Computer Science, 2022, 49(2): 107-115.
[15]	JIANG Zong-li, FAN Ke, ZHANG Jin-li. Generative Adversarial Network and Meta-path Based Heterogeneous Network Representation Learning [J]. Computer Science, 2022, 49(1): 133-139.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Music Style Transfer Method with Human Voice Based on CQT and Mel-spectrum

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0