计算机科学 ›› 2022, Vol. 49 ›› Issue (3): 179-184.doi: 10.11896/jsjkx.201200081
黎思泉, 万永菁, 蒋翠玲
LI Si-quan, WAN Yong-jing, JIANG Cui-ling
摘要: 多基频估计被广泛应用于音乐结构分析、乐音辅助教育、信息检索等各个领域。为了满足准确识别乐曲中随机和弦的需求,提出了基于生成对抗网络去影像的多基频估计算法。首先将完整音频切分成音符段,提出了一种谐音指纹图提取音符段频谱特征;然后通过卷积神经网络识别谐音指纹图当前的主导基频,将已识别出的主导基频作为干扰下一个基频识别的影像,并通过生成对抗网络去除干扰影像,对已去除干扰影像后的谐音指纹图进行新一轮的多基频估计;最后通过逐级迭代去影像操作实现完整和弦的多基频估计。对随机二音和弦及随机三音和弦组成的钢琴音频数据库进行实验,结果表明,所提算法与经典频谱迭代删除算法和大型词袋和弦识别算法相比,能够适应随机和弦的识别,在不同的音域范围内鲁棒性高,整体正确率有明显提升。
中图分类号:
[1]SUN M.Applied research on music recognition technology[J].Consumer Electronics,2020(4):62-63. [2]CHEN Y W,LI K,HAN Y,et al.Musical Note Recognition ofMusical Instruments Based on MFCC and Constant Q Transform[J].Computer Science,2020,47(3):149-155. [2]LIU Y,ZHAO T Z,JIANG Y Q,et al.Improved piano music recognition algorithm based on autocorrelation function[J].Journal of Wuhan University of Technology,2018,40(2):208-213. [3]WAN Y,WANG X L,ZHOU R H,et al.Piano multi note estimation algorithm based on spectral envelope nonnegative matrix decomposition[C]//Proceedings of the 5th Academic Exchange Meeting Commemorating the 50th Anniversary of the Institute of Acoustics,Chinese Academy of Sciences.2014:283-287. [4]HUMPHREY E J,BELLO J P.Rethinking Automatic ChordRecognition with Convolutional Neural Networks[C]//International Conference on Machine Learning & Applications.IEEE,2013. [5]ALEX K,ILYA S,GEOFFREY E.ImageNet Classification with Deep Convolutional Neural Networks[J].Communications of the ACM,2017,60(6):84-90. [6]QUAN Z.Convolutional Neural Networks[C]//The 3rd International Conference on Electromechanical Control Technology and Transportation.2018:434-439. [7]KORZENIOWSKI F,WIDMER G.A Fully Convolutional Deep Auditory Model for Musical Chord Recognition[C]//International Workshop on Machine Learning for Signal Processing (MLSP).IEEE,2016. [8]ZHANG X L,PENG Y.Audio recognition method based on residual network and random forest[J].Computer Engineering and Science,2019,41(4):727-732. [9]DENG J Q,KWOK Y K.Large vocabulary automatic chord estimation using bidirectional long short-term memory recurrent neural network with even chance training[J].Journal of New Music Research,2018,47(1):53-67. [10]RAZVAN P,CAGLAR G,KYUNGHYUN C,et al.How toConstruct Deep Recurrent Neural Networks[J].arXiv:1312.6026,2014. [11]MESEGUER-BROCAL G,PEETERS G.Conditioned-U-Net:Introducing a control mechanism in the U-Net for multiple source separations[J].arXiv:1907.01277,2019. [12]LIECK R,ROHRMEIER M.Modelling hierarchical key structure with pitch scapes[C]//Proceedings of the 21st Internatio-nal Society for Music Information Retrieval Conference.Montréal,Canada,2020. [13]KLAPURI A P.Multiple fundamental frequency estimationbased on harmonicity and spectral smoothness[J].IEEE Tran-sactions on Speech and Audio Proceessing,2003,11(6):804-816. [14]CHEN J.Research on multi fundamental frequency estimation of piano music[D].Chengdu:University of Electronic Science and technology,2016. [15]YU L,WU H J,JIANG W K.Multi channel speech enhance-ment based on beamforming and Gan networks[J].Noise and Vibration Control,2018,38(z1):591-596. [16]LIU H,LI Y,YUAN H Q,et al.Speech signal separation based on generated countermeasure network[J].Computer Enginee-ring,2020,46(1):302-308. [17]LI Y P,CAO P,SHI Y,et al.Speech conversion based on variational auto encoder and auxiliary classifier in non parallel text[J].Fudan Journal (Natural Science Edition),2020,59(3):322-329. [18]CHENG X Y,XIE L,ZHU J X,et al.A review of generative countermeasure network Gan[J].Computer Science,2019,46(3):74-81. [19]PHILLIP I,JUNYAN Z,TINGHUI Z,et al.Image-to-ImageTranslation with Conditional Adversarial Networks[J].arXiv:1611.07004,2018. [20]EMIYA V,BADEAU R,DAVID B.Multipitch estimation ofpiano sounds using a new probabilistic spectral smoothness principle[J].IEEE Transactions on Audio,Speech,and Language Processing,2010,18(6):1643-1654. |
[1] | 张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131 |
[2] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[3] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[4] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[5] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[6] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[7] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[8] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[9] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[10] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[11] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[12] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[13] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[14] | 吴子斌, 闫巧. 基于动量的映射式梯度下降算法 Projected Gradient Descent Algorithm with Momentum 计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039 |
[15] | 杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行. 基于步态分类辅助的虚拟IMU的行人导航方法 Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification 计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148 |
|