计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 241100014-13.doi: 10.11896/jsjkx.241100014
胥备1,2, 赵丹1
XU Bei1,2, ZHAO Dan1
摘要: 音乐是人表达情感的重要方式。音乐情感转换技术能够将原始音乐转换成具有目标情感的音乐,满足用户对多样化情感音乐的需求,并提升创作效率。现有音乐情感转换技术通过构建深度学习模型来实现端到端的情感转换,但其表征音乐的情感向量与实际音乐特征之间的对应性不足,导致中间层缺乏可解释性,这在一定程度上限制了音乐情感转换的准确性,并可能引发梯度消失问题。针对上述问题,提出了一种基于CVAE-WGAN(Conditional Variational Autoencoder Wasserstein Generative Adversarial Network)架构的音乐情感转换模型,使用WGAN-GP网络替代传统GAN,引入Wasserstein 距离和梯度惩罚机制,有效避免模式崩溃和梯度消失,从而提升训练的稳定性和生成质量。同时,为了解决生成模型中间过程缺乏可解释性的问题,引入涵盖音乐旋律、和声、节奏、动态强弱、音色、表达性和曲式方面的64种具备明确可解释性的中间感知特征作为潜在空间变量融入模型,确保潜在空间的每一个维度都能对应一个具体的音乐特征。此外,该模型还使用高斯混合模型代替变分自编码器中的单高斯模型,用于捕捉和表示不同情感类别下的音乐特征分布。实验结果表明,该模型在快乐、悲伤、温柔、愤怒、恐惧和惊讶6种典型情感间的相互转换任务上表现优异,在情感准确率、重构误差、生成连贯性和生成多样性方面的表现均优于对比模型。
中图分类号:
| [1]FERREIRA L N,WHITEHEAD J.Learning to generate music with sentiment[J].arXiv:2103.06125,2021. [2]KOH E,DUBNOV S.Comparison and analysis of deep audio embeddings for music emotion recognition[J].arXiv:2104.06517,2021. [3]AGRES K R,SCHAEFER R S,VOLKA,et al.Music,computing,and health:a roadmap for the current and future roles of music technology for health care and well-being[J].Music & Science,2021,4:2059204321997709. [4]ELLIOTT D,POLMAN R,MCGREGOR R.Relaxing music for anxiety control[J].Journal of Music Therapy,2011,48(3):264-288. [5]STEWART J,GARRIDO S,HENSE C,et al.Music use formood regulation:Self-awareness and conscious listening choices in young people with tendencies to depression[J].Frontiers in Psychology,2019,10:1199. [6]CLEMENTS-CORTÉSA.The use of music in facilitating emotional expression in the terminally ill[J].American Journal of Hospice and Palliative Medicine©,2004,21(4):255-260. [7]FUJIOKA T,WEEN J E,JAMALI S,et al.Changes in neuro-magnetic beta-band oscillation after music-supported stroke rehabilitation[J].Annals of the New York Academy of Sciences,2012,1252(1):294-304. [8]GORINI A,CAPIDEVILLE C S,DE L,et al.The role of immersion and narrative in mediated presence:the virtual hospital experience[J].Cyberpsychology,Behavior,and Social Networking,2011,14(3):99-105. [9]KANTOSALO A,TOIVONEN H.Modes for creative human-computer collaboration:Alternating and task-divided co-creativity[C]//Proceedings of the Seventh International Conference on Computational Creativity.Paris:ICCC Press,2016:77-84. [10]MICCHI G,BIGO L,GIRAUDM,et al.I Keep Counting:An experiment in human/AI co-creative songwriting[J].Transactions of the International Society for Music Information Retrieval(TISMIR),2021,4(1):263-275. [11]MADHOK R,GOEL S,GARG S.SentiMozart:Music Generation based on Emotions[C]//ICAART(2).Portugal:SciTEPress,2018:501-506. [12]MA L,ZHONG W,MA X,et al.Learning to generate emotional music correlated with music structure features[J].Cognitive Computation and Systems,2022,4(2):100-107. [13]BENGIO Y,SIMARD P,FRASCONI P.Learning long-term dependencies with gradient descent is difficult[J].IEEE Transactions on Neural Networks,1994,5(2):157-166. [14]VIJAYAKUMAR A K,COGSWELL M,SELVARAJU R R,et al.Diverse beam search:Decoding diverse solutions from neural sequence models[J].arXiv:1610.02424,2016. [15]KINGMA D P,WELLING M.Auto-encoding variationalbayes[J].arXiv:1312.6114,2013. [16]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Gene-rative adversarial networks[J].Communications of the ACM,2020,63(11):139-144. [17]ROBERTS A,ENGEL J,RAFFEL C,et al.A hierarchical latent vector model for learning long-term structure in music[C]//International Conference on Machine Learning.Stockholm:PMLR,2018:4364-4373. [18]SOHN K,LEE H,YAN X.Learning structured output repre-sentation using deep conditional generative models[J].Advances in Neural Information Processing Systems,2015,28:1935. [19]GREKOW J,DIMITROVA-GREKOW T.Monophonic musicgeneration with a given emotion using conditional variationalautoencoder[J].IEEE Access,2021,9:129088-129101. [20]DAHMANI S,COLOTTE V,GIRARD V,et al.Learning emotions latent representation with CVAE for text-driven expressive audiovisual speech synthesis[J].Neural Networks,2021,141:315-329. [21]DONG H W,HSIAO W Y,YANG L C,et al.Musegan:Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment[C]//Proceedings of the AAAI Conference on Artificial Intelligence.New Orleans:AAAI Press,2018:34-41. [22]MIRZA M,OSINDERO S.Conditional generative adversarial-nets[J].arXiv:1411.1784,2014. [23]ARJOVSKY M,CHINTALA S,BOTTOU L,Wasserstein GAN[J].arXiv:1701.07875,2017. [24]GULRAJANI I,AHMED F,ARJOVSKY M,et al.Improvedtraining ofwasserstein gans[C]//Advances in Neural Information Processing Systems.2017:5769-5779. [25]BAO J,CHEN D,WEN F,et al.CVAE-GAN:fine-grained image generation through asymmetric training[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice:IEEE Press,2017:2745-2754. [26]GOMEZ P,DANUSER B.Relationships between musical structure and psychophysiological measures of emotion[J].Emotion,2007,7(2):377-387. [27]CHOWDHURY S,VALL A,HAUNSCHMID V,et al.To-wards explainable music emotion recognition:The route via mid-levelfeatures[J].arXiv:1907.03572,2019. [28]EKMAN P.Are there basic emotions?[J].Psychological Review,1992,99(3):550-553. [29]PLUTCHIK R.The nature of emotions:Human emotions have deep evolutionary roots,a fact that may explain their complexity and provide tools for clinical practice[J].American Scientist,2001,89(4):344-350. [30]HEVNER K.Experimental studies of the elements of expression inmusic[J].The American Journal of Psychology,1936,48(2):246-268. [31]CHELKOWSKA-ZACHAREWICZ M,JANOWSKI M.Polishadaptation of the Geneva Emotional Music Scale:Factor structure andreliability[J].Psychology of Music,2021,49(5):1117-1131. [32]THAYER R E.The biopsychology of mood andarousal[M].Oxford:Oxford University Press,1990. [33]RUSSELL J A.A circumplex model ofaffect[J].Journal of Personality and Social Psychology,1980,39(6):1161. [34]MEHRABIAN A.Silent messages:implicit communication of emotions andattitudes[M].Belmont:Wadsworth Pub,1981. [35]FRIBERG A,SCHOONDERWALDT E,HEDBLAD A,et al.Using listener-based perceptual features as intermediate representations in music information retrieval[J].The Journal of the Acoustical Society of America,2014,136(4):1951-1963. [36]ALJANAKI A,SOLEYMANI M.A data-driven approach tomid-level perceptual musical feature modeling[C]//Proceedings of the 19th International Society for Music Information Retrieval Conference(ISMIR).2018:615-621. [37]MCKINNEY M,BREEBAART J.Features for audio and music classification[C]//Proceedings of the International Conference on Music Information Retrieval.Plymouth MA,2004:151-158. [38]PANDA R,MALHEIRO R,PAIVA R P.Novel audio features for music emotionrecognition[J].IEEE Transactions on Affective Computing,2018,11(4):614-626. [39]PANDA R,MALHEIRO R,PAIVA R P.Audio features formusic emotion recognition:asurvey[J].IEEE Transactions on Affective Computing,2020,14(1):68-88. [40]KHURANA A,MITTAL S,KUMAR D,et al.Tri-integrated convolutional neural network for audio image classification using Mel-frequency spectrograms[J].Multimedia Tools and Applications,2023,82(4):5521-5546. [41]ALI S,NAZ B,NAREJO S,et al.Alex Net-Based Speech Emotion Recognition Using 3D Mel-Spectrograms[J].International Journal of Innovations in Science and Technology,2024,6(2):426-433. [42]LIANG X,WU J,YIN Y.MIDI-Sandwich:Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track MusicGeneration[J].Australian Journal of Intelligent Information Processing Systems,2019,15(2):1-9. [43]HUANG C F,HUANG C Y.Emotion-based AI music generation system with CVAE-GAN[C]//2020 IEEE Eurasia Confe-rence on IOT,Communication and Engineering(ECICE).Taiwan:IEEE Press,2020:220-222. [44]HUANG C F,HUANG C Y.CVAE-GAN Emotional AI Music System for Car DrivingSafety[J].Intelligent Automation & Soft Computing,2022,32(3):1939-1953. [45]KOSSALE Y,AIRAJ M,DAROUICHI A.Mode collapse ingenerative adversarial networks:An overview[C]//InternationalConference on Optimization and Applications(ICOA).IEEE Press,2022:1-6. [46]FASSMEYER P,KORTMANN F,DREWS P,et al.Towards a Camera-Based Road Damage Assessment and Detection for Autonomous Vehicles:Applying Scaled-YOLO and CVAE-WGAN[C]//2021 IEEE 94th Vehicular Technology Conference(VTC2021-Fall).IEEE,2021:1-7. [47]YONEKURA K,TOMORI Y,SUZUKI K.Airfoil Shape Generation and Feature Extraction Using the Conditional VAE-WGAN-gp[J].AI,2024,5(4):2092-2103. [48]KUMAR N,KUMAR R,BHATTACHARYA S.Testing reliability ofMirtoolbox[C]//International Conference on Electronics and Communication Systems(ICECS).India:IEEE Press,2015:710-717. [49]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.California:IEEE Press,2021:10012-10022. [50]EEROLA T,VUOSKOSKI J K.A comparison of the discreteand dimensional models of emotion inmusic[J].Psychology of Music,2011,39(1):18-49. [51]LI Y,FU R,MENG X,et al.A SAR-to-optical image translation method based on conditional generation adversarial network(cGAN)[J].IEEE Access,2020,8:60338-60343. [52]NEVES P,FORNARI J,FLORINDO J.Generating music with sentiment using Transformer-GANs[J].arXiv:2212.11134,2022. [53]MADHAVI K R,CHALIVENDRA V,VASANTHA C L,et al.Music Recommendation and Generation Based on Face Emotion Detection[C]//2024 7th International Conference on Circuit Power and Computing Technologies(ICCPCT).IEEE,2024:1205-1210. |
|
||