计算机科学 ›› 2021, Vol. 48 ›› Issue (6A): 326-330.doi: 10.11896/jsjkx.200900104
叶洪良, 朱皖宁, 洪蕾
YE Hong-liang, ZHU Wan-ning, HONG Lei
摘要: 近年来,生成对抗网络在图像风格迁移领域中表现优秀,然而其在音乐领域表现一般。现有的音乐风格迁移对带有人声的音乐的风格迁移效果不佳。为了解决这些问题,首先提取音乐的CQT特征和梅尔频谱特征,然后采用CycleGAN对CQT特征和梅尔频谱的联合特征做风格迁移,再通过WaveNet声码器来对迁移后的谱图进行解码,最终实现了带有人声的音乐的风格迁移。在公开数据集FMA上对所提模型进行评估,符合要求的音乐的平均风格迁移率达到了94.07%。与其他算法相比,该方法所产生的音乐的风格迁移率和音频质量都优于其他算法。
中图分类号:
[1] JING Y,YANG Y,FENG Z,et al.Neural style transfer:A review[J].IEEE Transactions on Visualization and Computer Graphics,2019,26(11):3365-3385. [2] DAI S,ZHANG Z,XIA G G.Music style transfer:A position paper[J].arXiv:1803.06841,2018. [3] GATYS L A,ECKER A S,BETHGE M.A neural algorithm of artistic style[J].arXiv:1508.06576,2015. [4] ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2223-2232. [5] YI Z,ZHANG H,TAN P,et al.Dualgan:Unsupervised duallearning for image-to-image translation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2849-2857. [6] KIM T,CHA M,KIM H,et al.Learning to discover cross-domain relations with generative adversarial networks[J].arXiv:1703.05192,2017. [7] BRUNNER G,WANG Y,WATTENHOFER R,et al.Symbolic music genre transfer with cyclegan[C]//2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).IEEE,2018:786-793. [8] HUANG S,LI Q,ANIL C,et al.Timbretron:A wavenet (cyclegan (cqt (audio))) pipeline for musical timbre transfer[J].arXiv:1811.09620,2018. [9] MOR N,WOLF L,POLYAK A,et al.A universal music translation network[J].arXiv:1805.07848,2018. [10] GOSODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems.2014:2672-2680. [11] OORD A,DIELEMAN S,ZEN H,et al.Wavenet:A generative model for raw audio[J].arXiv:1609.03499,2016. [12] POLYAK A,WOLF L.Attention-based wavenet autoencoderfor universal voice conversion[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2019:6800-6804. [13] ODENA A,DUMOULIN V,OLAH C.Deconvolution andcheckerboard artifacts[J].Distill,2016,1(10):e3. [14] ENGEL J,RESNICK C,ROBERTS A,et al.Neural audio synthesis of musical notes with wavenet autoencoders[C]//International Conference on Machine Learning.PMLR,2017:1068-1077. [15] DEFFERRARD M,BENZI K,VANDERGHEYNST P,et al.Fma:A dataset for music analysis[J].arXiv:1612.01840,2016. [16] WU M,LIU X.A Double Weighted KNN Algorithm and ItsApplication in the Music Genre Classification[C]//2019 6th International Conference on Dependable Systems and Their Applications(DSA).IEEE,2020:335-340. |
[1] | 张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131 |
[2] | 黄丽, 朱焱, 李春平. 基于异构网络表征学习的作者学术行为预测 Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning 计算机科学, 2022, 49(9): 76-82. https://doi.org/10.11896/jsjkx.210900078 |
[3] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[4] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[5] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[6] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[7] | 刘畅, 魏为民, 孟繁星, 才智. 语音风格迁移研究进展 Research Progress on Speech Style Transfer 计算机科学, 2022, 49(6A): 301-308. https://doi.org/10.11896/jsjkx.210300134 |
[8] | 杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236 |
[9] | 尹文兵, 高戈, 曾邦, 王霄, 陈怡. 基于时频域生成对抗网络的语音增强算法 Speech Enhancement Based on Time-Frequency Domain GAN 计算机科学, 2022, 49(6): 187-192. https://doi.org/10.11896/jsjkx.210500114 |
[10] | 徐辉, 康金梦, 张加万. 基于特征感知的数字壁画复原方法 Digital Mural Inpainting Method Based on Feature Perception 计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105 |
[11] | 彭云聪, 秦小林, 张力戈, 顾勇翔. 面向图像分类的小样本学习算法综述 Survey on Few-shot Learning Algorithms for Image Classification 计算机科学, 2022, 49(5): 1-9. https://doi.org/10.11896/jsjkx.210500128 |
[12] | 高志宇, 王天荆, 汪悦, 沈航, 白光伟. 基于生成对抗网络的5G网络流量预测方法 Traffic Prediction Method for 5G Network Based on Generative Adversarial Network 计算机科学, 2022, 49(4): 321-328. https://doi.org/10.11896/jsjkx.210300240 |
[13] | 黎思泉, 万永菁, 蒋翠玲. 基于生成对抗网络去影像的多基频估计算法 Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal 计算机科学, 2022, 49(3): 179-184. https://doi.org/10.11896/jsjkx.201200081 |
[14] | 石达, 芦天亮, 杜彦辉, 张建岭, 暴雨轩. 基于改进CycleGAN的人脸性别伪造图像生成模型 Generation Model of Gender-forged Face Image Based on Improved CycleGAN 计算机科学, 2022, 49(2): 31-39. https://doi.org/10.11896/jsjkx.210600012 |
[15] | 唐雨潇, 王斌君. 基于深度生成模型的人脸编辑研究进展 Research Progress of Face Editing Based on Deep Generative Model 计算机科学, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108 |
|