计算机科学 ›› 2021, Vol. 48 ›› Issue (6A): 326-330.doi: 10.11896/jsjkx.200900104

• 智能计算 • 上一篇    下一篇

基于CQT和梅尔频谱的带有人声的音乐风格转换方法

叶洪良, 朱皖宁, 洪蕾   

  1. 金陵科技学院软件工程学院 南京211100
  • 出版日期:2021-06-10 发布日期:2021-06-17
  • 通讯作者: 朱皖宁(zhuwanning@jit.edu.cn)
  • 作者简介:yehongliangiao@163.com
  • 基金资助:
    金陵科技学院高层次人才科研启动基金(jit-b-201624);江苏省大学生创新训练计划项目(202013573045Y);江苏高校哲学社会科学基金项目(2019SJA0485)

Music Style Transfer Method with Human Voice Based on CQT and Mel-spectrum

YE Hong-liang, ZHU Wan-ning, HONG Lei   

  1. School of Software Engineering,Jinling Institute of Technology,Nanjing 211100,China
  • Online:2021-06-10 Published:2021-06-17
  • About author:YE Hong-liang,born in 1999.His main research interests include deep learning and music processing.
    ZHU Wan-ning,born in 1983,Ph.D.His main research interests include quantum information technology and quantum computing.
  • Supported by:
    Jinling Institute of Technology High-level Talent Research Startup Fund Support(jit-b-201624),Jiangsu Province University Student Innovation Training Program Project(202013573045Y) and Jiangsu University Philosophy and Social Science Foundation Project(2019SJA0485).

摘要: 近年来,生成对抗网络在图像风格迁移领域中表现优秀,然而其在音乐领域表现一般。现有的音乐风格迁移对带有人声的音乐的风格迁移效果不佳。为了解决这些问题,首先提取音乐的CQT特征和梅尔频谱特征,然后采用CycleGAN对CQT特征和梅尔频谱的联合特征做风格迁移,再通过WaveNet声码器来对迁移后的谱图进行解码,最终实现了带有人声的音乐的风格迁移。在公开数据集FMA上对所提模型进行评估,符合要求的音乐的平均风格迁移率达到了94.07%。与其他算法相比,该方法所产生的音乐的风格迁移率和音频质量都优于其他算法。

关键词: 表征学习, 风格迁移, 生成对抗网络, 音乐处理

Abstract: In recent years,the generative confrontation network has performed well in the field of image style transfer,but its performance in the field of music is average.The existing music style transfer has poor effect on the style transfer of music with human voice.In order to solve these problems,the CQT feature and Mel spectrum feature of the music are extracted,and then CycleGAN is used to transfer the style of the combined feature of CQT feature and Mel spectrum.Finally,the WaveNet vocoder is used to decode the migrated spectrum.Finally,we realize the style transfer of music with vocals.The proposed model is evaluated on the public data set FMA,and the average style transfer rate of music that meets the requirements reaches 94.07%.Compared with other algorithms,the style transfer rate and audio quality of the music produced by this method are better than other algorithms.

Key words: Generative adversarial networks, Music processing, Representation learning, Style transfer

中图分类号: 

  • TP183
[1] JING Y,YANG Y,FENG Z,et al.Neural style transfer:A review[J].IEEE Transactions on Visualization and Computer Graphics,2019,26(11):3365-3385.
[2] DAI S,ZHANG Z,XIA G G.Music style transfer:A position paper[J].arXiv:1803.06841,2018.
[3] GATYS L A,ECKER A S,BETHGE M.A neural algorithm of artistic style[J].arXiv:1508.06576,2015.
[4] ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2223-2232.
[5] YI Z,ZHANG H,TAN P,et al.Dualgan:Unsupervised duallearning for image-to-image translation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2849-2857.
[6] KIM T,CHA M,KIM H,et al.Learning to discover cross-domain relations with generative adversarial networks[J].arXiv:1703.05192,2017.
[7] BRUNNER G,WANG Y,WATTENHOFER R,et al.Symbolic music genre transfer with cyclegan[C]//2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).IEEE,2018:786-793.
[8] HUANG S,LI Q,ANIL C,et al.Timbretron:A wavenet (cyclegan (cqt (audio))) pipeline for musical timbre transfer[J].arXiv:1811.09620,2018.
[9] MOR N,WOLF L,POLYAK A,et al.A universal music translation network[J].arXiv:1805.07848,2018.
[10] GOSODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems.2014:2672-2680.
[11] OORD A,DIELEMAN S,ZEN H,et al.Wavenet:A generative model for raw audio[J].arXiv:1609.03499,2016.
[12] POLYAK A,WOLF L.Attention-based wavenet autoencoderfor universal voice conversion[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2019:6800-6804.
[13] ODENA A,DUMOULIN V,OLAH C.Deconvolution andcheckerboard artifacts[J].Distill,2016,1(10):e3.
[14] ENGEL J,RESNICK C,ROBERTS A,et al.Neural audio synthesis of musical notes with wavenet autoencoders[C]//International Conference on Machine Learning.PMLR,2017:1068-1077.
[15] DEFFERRARD M,BENZI K,VANDERGHEYNST P,et al.Fma:A dataset for music analysis[J].arXiv:1612.01840,2016.
[16] WU M,LIU X.A Double Weighted KNN Algorithm and ItsApplication in the Music Genre Classification[C]//2019 6th International Conference on Dependable Systems and Their Applications(DSA).IEEE,2020:335-340.
[1] 张佳, 董守斌.
基于评论方面级用户偏好迁移的跨领域推荐算法
Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer
计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[2] 黄丽, 朱焱, 李春平.
基于异构网络表征学习的作者学术行为预测
Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning
计算机科学, 2022, 49(9): 76-82. https://doi.org/10.11896/jsjkx.210900078
[3] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[4] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[5] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[6] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[7] 刘畅, 魏为民, 孟繁星, 才智.
语音风格迁移研究进展
Research Progress on Speech Style Transfer
计算机科学, 2022, 49(6A): 301-308. https://doi.org/10.11896/jsjkx.210300134
[8] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[9] 尹文兵, 高戈, 曾邦, 王霄, 陈怡.
基于时频域生成对抗网络的语音增强算法
Speech Enhancement Based on Time-Frequency Domain GAN
计算机科学, 2022, 49(6): 187-192. https://doi.org/10.11896/jsjkx.210500114
[10] 徐辉, 康金梦, 张加万.
基于特征感知的数字壁画复原方法
Digital Mural Inpainting Method Based on Feature Perception
计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105
[11] 彭云聪, 秦小林, 张力戈, 顾勇翔.
面向图像分类的小样本学习算法综述
Survey on Few-shot Learning Algorithms for Image Classification
计算机科学, 2022, 49(5): 1-9. https://doi.org/10.11896/jsjkx.210500128
[12] 高志宇, 王天荆, 汪悦, 沈航, 白光伟.
基于生成对抗网络的5G网络流量预测方法
Traffic Prediction Method for 5G Network Based on Generative Adversarial Network
计算机科学, 2022, 49(4): 321-328. https://doi.org/10.11896/jsjkx.210300240
[13] 黎思泉, 万永菁, 蒋翠玲.
基于生成对抗网络去影像的多基频估计算法
Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal
计算机科学, 2022, 49(3): 179-184. https://doi.org/10.11896/jsjkx.201200081
[14] 石达, 芦天亮, 杜彦辉, 张建岭, 暴雨轩.
基于改进CycleGAN的人脸性别伪造图像生成模型
Generation Model of Gender-forged Face Image Based on Improved CycleGAN
计算机科学, 2022, 49(2): 31-39. https://doi.org/10.11896/jsjkx.210600012
[15] 唐雨潇, 王斌君.
基于深度生成模型的人脸编辑研究进展
Research Progress of Face Editing Based on Deep Generative Model
计算机科学, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!