Computer Science ›› 2024, Vol. 51 ›› Issue (8): 281-296.doi: 10.11896/jsjkx.230500124
• Artificial Intelligence • Previous Articles Next Articles
XU Bei1,2, LIU Tong1
CLC Number:
[1]TIE Y,CHEN H J,JIN C,et al.Research on emotion recognition method based on audio and video feature fusion[J].Journal of Chongqing University of Technology(Natural Science),2022,36(1):120-127. [2]MA L,ZHONG W,MA X,et al.Learning to generate emotional music correlated with music structure features[J].Cognitive Computation and Systems,2022,4(2):100-107. [3]SULUN S,DAVIES M E P,VIANA P.Symbolic music genera-tion conditioned on continuous-valued emotions[J].IEEE Access,2022,10:44617-44626. [4]HUNG H T,CHING J,DOH S,et al.EMOPIA:A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation[C]//Proceedings of 22th International Society for Music Information Retrieval Conference(ISMIR).2021:318-325. [5]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013. [6]GREKOW J,DIMITROVA-GREKOW T.Monophonic musicgeneration with a given emotion using conditional variational autoencoder[J].IEEE Access,2021,9:129088-129101. [7]TAN H H,HERREMANS D.Music FaderNets:ControllableMusic Generation Based on High-Level Features via Low-Level Feature Modelling[C]//Proceedings of 21th International So-ciety for Music Information Retrieval Conference(ISMIR).2020:109-116. [8]DILOKTHANAKUL N,MEDIANO P A M,GARNELO M,et al.Deep unsupervised clustering with Gaussian mixture variational autoencoders[C]//International Conference on Learning Representations(ICLR).2017. [9]RUSSELL J A.A circumplex model of affect[J].Journal of Personality and Social Psychology,1980,39(6):1161. [10]LI Z,ZHAO Y,XU H,et al.Unsupervised clustering throughGaussian mixture variational autoencoder with non-reparamete-rized variational inference and std annealing[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-8. [11]ROBERTS A,ENGEL J,RAFFEL C,et al.Ahierarchical latent vector model for learning long-term structure in music[C]//International Conference on Machine Learning(ICML).PMLR,2018:4364-4373. [12]BRUNNER G,KONRAD A,WANG Y,et al.MIDI-VAE:Mo-deling dynamics and instrumentation of music with applications to style transfer[C]//Proceedings of the 19th International Society for Music Information Retrieval Conference(ISMIR).2018:747-754. [13]DAI Z,YANG Z,YANG Y,et al.Transformer-XL:Attentive Language Models beyond a Fixed-Length Context[C]//Proceedings of the 57thAnnual Meeting of the Association for Computational Linguistics(ACL).2019:2978-2988. [14]HEVNER K.Experimental studies of the elements of expression in music[J].The American Journal of Psychology,1936,48(2):246-268. [15]CHEŁKOWSKA-ZACHAREWICZ M,JANOWSKI M.Polishadaptation of the Geneva Emotional Music Scale:Factor structure and reliability[J].Psychology of Music,2021,49(5):1117-1131. [16]THAYER R E.The biopsychology of mood and arousal[M].Oxford University Press,1990. [17]MEHRABIAN A.Silent messages:implicit communication ofemotions and attitudes[M].Wadsworth Pub,1981. [18]KREUTZ G,OTT U,TEICHMANN D,et al.Using music to induce emotions:Influences of musical preference and absorption[J].Psychology of Music,2008,36(1):101-126. [19]VIEILLARD S,PERETZ I,GOSSELIN N,et al.Happy,sad,scary and peaceful musical excerpts for research on emotions[J].Cognition & Emotion,2008,22(4):720-752. [20]YANG X,SONG Z,KING I,et al.A survey on deep semi-supervised learning[J].arXiv:2103.00550,2021. [21]KINGMA D P,REZENDE D J,MOHAMED S,et al.Semi-supervised learning with deep generative models[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2(NIPS).2014:3581-3589. [22]HABIB R,MARIOORYAD S,SHANNON M,et al.Semi-supervised generative modeling for controllable speech synthesis[C]//International Conference on Learning Representations(ICLR).2019. [23]CHEUNG V K M,KAO H K,SU L.Semi-supervised violin fingering generation using variational autoencoders[C]//Procee-dings of 22th International Society for Music Information Retrie-val Conference(ISMIR).2021:113-120. [24]SCHUSTER M,PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing,1997,45(11):2673-2681. [25]LI Y,PAN Q,WANG S,et al.Disentangled variational autoencoder for semi-supervised learning[J].Information Sciences,2019,482:73-85. [26]JOY T,SCHMON S M,TORR P H S,et al.Capturing labelcharacteristics in VAEs[C]//International Conference on Learning Representations(ICLR).2021. [27]DEMPSTER A P,LAIRD N M,RUBIN D B.Maximum likelihood from incomplete data via the EM algorithm[J].Journal of the Royal Statistical Society:Series B(Methodological),1977,39(1):1-22. [28]LUO Y J,AGRES K,HERREMANS D.Learning disentangled representations of timbre and pitch for musical instrument sounds using Gaussian mixture variational autoencoders[C]//Proceedings of 20th International Society for Music Information Retrieval Conference(ISMIR).2019:746-753. [29]BENGIO Y,COURVILLE A,VINCENT P.Representationlearning:A review and new perspectives[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1798-1828. [30]WANG X,CHEN H,TANG S,et al.Disentangled Representation Learning[J].arXiv:2211.11695,2022. [31]HIGGINS I,MATTHEY L,PALA,et al.beta-VAE:Learning basic visual concepts with a constrained variational framework[C]//International Conference on Learning Representations(ICLR).2017. [32]CHEN R T Q,LI X,GROSSE R,et al.Isolating sources ofdisentanglement inVAEs[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems(NIPS).2018:2615-2625. [33]KUMAR A,SATTIGERI P,BALAKRISHNAN A.Variational inference of disentangled latent concepts from unlabeled observations[C]//International Conference on Learning Representations(ICLR).2018. [34]WANG Z,WANG D,ZHANG Y,et al.Learning interpretable representation for controllable polyphonic music generation[C]//Proceedings of 21th International Society for Music Information Retrieval Conference(ISMIR).2020:662-669. [35]YANG R,WANG D,WANG Z,et al.Deep music analogy via latent representation disentanglement[C]//Proceedings of 20th International Society for Music Information Retrieval Confe-rence(ISMIR).2019:596-603. [36]WU Y,CARSAULT T,NAKAMURA E,et al.Semi-supervised neural chord estimation based on a variational autoencoder with latent chord labels and features[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2020,28:2956-2966. [37]AKAMA T.Controlling Symbolic Music Generation based onConcept Learning from Domain Knowledge[C]//Proceedings of 20th International Society for Music Information Retrieval Conference(ISMIR).2019:816-823. [38]CHOI K,CHO K.Deep unsupervised drum transcription[C]//Proceedings of 20th International Society for Music Information Retrieval Conference(ISMIR).2019:183-191. [39]ZHANG Y.Representation learning for controllable music ge-neration:A survey[C]//Proceedings of 20th International So-ciety for Music Information Retrieval Conference(ISMIR).2020:1-8. [40]MI L,HE T,PARK C F,et al.Revisiting LatentSpace Interpolation via a Quantitative Evaluation Framework[J].arXiv:2110.06421,2021. [41]JIANG Z,ZHENG Y,TAN H,et al.Variational deep embedding:an unsupervised and generative approach to clustering[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence(IJCAI).2017:1965-1972. [42]ZHAO T,LEE K,ESKENAZI M.Unsupervised Discrete Sen-tence Representation Learning for Interpretable Neural Dialog Generation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(ACL).2018:1098-1107. [43]REZAABAD A L,VISHWANATH S.Learning representations by maximizing mutual information in variational autoencoders[C]//2020 IEEE International Symposium on Information Theory(ISIT).IEEE,2020:2729-2734. [44]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems(NeurIPS).2017:5998-6008. [45]JIANG J,XIA G G,CARLTON D B,et al.Transformer VAE:A Hierarchical Model for Structure-Aware and Interpretable Music Representation Learning[C]//2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:516-520. [46]WU S L,YANG Y H.MuseMorphose:Full-song and fine-grained piano music style transfer with one transformer VAE[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2023,31:1953-1967. [47]DONG H W,HSIAO W Y,YANG L C,et al.MuseGAN:Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment[C]//Proceedings of the AAAI Conference on Artificial Intelligence(AAAI).2018:34-41. [48]BERTIN-MAHIEUX T,ELLIS D P W,WHITMAN B,et al.The million song dataset[C]//Proceedings of 12th International Society for Music Information Retrieval Conference(ISMIR).2011:591-596. [49]HUANG Y S,YANG Y H.Pop music transformer:Beat-based modeling and generation of expressive pop piano compositions[C]//Proceedings of the 28th ACM International Conference on Multimedia(ACM Multimedia).2020:1180-1188. [50]GLOROT X,BENGIO Y.Understanding the difficulty of trai-ning deep feedforward neural networks[C]//Proceedings of the 13th International Conference on Artificial Intelligence and Statistics(AISTATS).JMLR Workshop and Conference Proceedings,2010:249-256. [51]ZHENG K,MENG R,ZHENG C,et al.EmotionBox:A music-element-driven emotional music generation system based on music psychology[J].Frontiers in Psychology,2022,13:5189. [52]VAN DER MAATEN L,HINTON G.Visualizing Data using t-SNE[J].Journal of Machine Learning Research,2008,9:2579-2605. [53]KAWAI L,ESLING P,HARADA T.Attributes-Aware DeepMusic Transformation[C]//Proceedings of 21th International Society for Music Information Retrieval Conference(ISMIR).2020:670-677. [54]DONG H W,HSIAO W Y,YANG Y H.Pypianoroll:Opensource Python package for handling multitrack pianorolls[C]//Proceedings of 19th International Society for Music Information Retrieval Conference(ISMIR).Late-breaking paper,2018. |
[1] | HAO Jingyu, WEN Jingxuan, LIU Huafeng, JING Liping, YU Jian. Deep Disentangled Collaborative Filtering with Graph Global Information [J]. Computer Science, 2023, 50(1): 41-51. |
|