计算机科学 ›› 2022, Vol. 49 ›› Issue (3): 179-184.doi: 10.11896/jsjkx.201200081

• 计算机图形学&多媒体 • 上一篇    下一篇

基于生成对抗网络去影像的多基频估计算法

黎思泉, 万永菁, 蒋翠玲   

  1. 华东理工大学信息科学与工程学院 上海200000
  • 收稿日期:2020-12-08 修回日期:2021-05-06 出版日期:2022-03-15 发布日期:2022-03-15
  • 通讯作者: 蒋翠玲(cuilingjiang@ecust.edu.cn)
  • 作者简介:(siquan_li@163.com)

Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal

LI Si-quan, WAN Yong-jing, JIANG Cui-ling   

  1. Department of Information Science and Engineering,East China University of Science and Technology,Shanghai 200000,China
  • Received:2020-12-08 Revised:2021-05-06 Online:2022-03-15 Published:2022-03-15
  • About author:LI Si-quan,born in 1996,master.His main research interests include compu-ter learning and audio analysis.
    JIANG Cui-ling,born in 1976,Ph.D,lecturer.Her main research interests include artificial intelligence and image processing.

摘要: 多基频估计被广泛应用于音乐结构分析、乐音辅助教育、信息检索等各个领域。为了满足准确识别乐曲中随机和弦的需求,提出了基于生成对抗网络去影像的多基频估计算法。首先将完整音频切分成音符段,提出了一种谐音指纹图提取音符段频谱特征;然后通过卷积神经网络识别谐音指纹图当前的主导基频,将已识别出的主导基频作为干扰下一个基频识别的影像,并通过生成对抗网络去除干扰影像,对已去除干扰影像后的谐音指纹图进行新一轮的多基频估计;最后通过逐级迭代去影像操作实现完整和弦的多基频估计。对随机二音和弦及随机三音和弦组成的钢琴音频数据库进行实验,结果表明,所提算法与经典频谱迭代删除算法和大型词袋和弦识别算法相比,能够适应随机和弦的识别,在不同的音域范围内鲁棒性高,整体正确率有明显提升。

关键词: 多基频估计, 基频影像, 卷积神经网络, 生成对抗网络, 谐音指纹图

Abstract: Multiple fundamental frequency estimation is widely used in music structure analysis,music aided education,information retrieval and other fields.In order to meet the requirements of accurate identification of random chords in music,a multiple fundamental frequency estimation algorithm based on generative adversarial networks is proposed.Firstly,the complete audio is divided into note segments,and a homophonic fingerprint is proposed to extract the spectrum characteristics of the note segment.Then,the current dominant fundamental frequency of the homophonic fingerprint is identified by convolution neural network,and the identified dominant fundamental frequency is considered as the image that interferes with the next fundamental frequency re-cognition.Then,the interference image is removed by generative adversarial networks,and the homophonic fingerprint image affected by interference is processed in a new round.Finally,the multiple fundamental frequency estimation of complete chords is realized by iterative de imaging operation step by step.Experiments on the piano audio database composed of random two tone chord and random three tone chord are carried out.The results show that,compared with the classical spectrum iterative deletion algorithm and the large vocabulary chord recognition algorithm,the algorithm in this paper can adapt to the recognition of random chords,has high robustness in different ranges,and improves the overall accuracy significantly.

Key words: Convolution neural network, Fundamental frequency image, Generative adversarial networks, Homophonic fingerprint, Multiple fundamental frequency estimation

中图分类号: 

  • TP183
[1]SUN M.Applied research on music recognition technology[J].Consumer Electronics,2020(4):62-63.
[2]CHEN Y W,LI K,HAN Y,et al.Musical Note Recognition ofMusical Instruments Based on MFCC and Constant Q Transform[J].Computer Science,2020,47(3):149-155.
[2]LIU Y,ZHAO T Z,JIANG Y Q,et al.Improved piano music recognition algorithm based on autocorrelation function[J].Journal of Wuhan University of Technology,2018,40(2):208-213.
[3]WAN Y,WANG X L,ZHOU R H,et al.Piano multi note estimation algorithm based on spectral envelope nonnegative matrix decomposition[C]//Proceedings of the 5th Academic Exchange Meeting Commemorating the 50th Anniversary of the Institute of Acoustics,Chinese Academy of Sciences.2014:283-287.
[4]HUMPHREY E J,BELLO J P.Rethinking Automatic ChordRecognition with Convolutional Neural Networks[C]//International Conference on Machine Learning & Applications.IEEE,2013.
[5]ALEX K,ILYA S,GEOFFREY E.ImageNet Classification with Deep Convolutional Neural Networks[J].Communications of the ACM,2017,60(6):84-90.
[6]QUAN Z.Convolutional Neural Networks[C]//The 3rd International Conference on Electromechanical Control Technology and Transportation.2018:434-439.
[7]KORZENIOWSKI F,WIDMER G.A Fully Convolutional Deep Auditory Model for Musical Chord Recognition[C]//International Workshop on Machine Learning for Signal Processing (MLSP).IEEE,2016.
[8]ZHANG X L,PENG Y.Audio recognition method based on residual network and random forest[J].Computer Engineering and Science,2019,41(4):727-732.
[9]DENG J Q,KWOK Y K.Large vocabulary automatic chord estimation using bidirectional long short-term memory recurrent neural network with even chance training[J].Journal of New Music Research,2018,47(1):53-67.
[10]RAZVAN P,CAGLAR G,KYUNGHYUN C,et al.How toConstruct Deep Recurrent Neural Networks[J].arXiv:1312.6026,2014.
[11]MESEGUER-BROCAL G,PEETERS G.Conditioned-U-Net:Introducing a control mechanism in the U-Net for multiple source separations[J].arXiv:1907.01277,2019.
[12]LIECK R,ROHRMEIER M.Modelling hierarchical key structure with pitch scapes[C]//Proceedings of the 21st Internatio-nal Society for Music Information Retrieval Conference.Montréal,Canada,2020.
[13]KLAPURI A P.Multiple fundamental frequency estimationbased on harmonicity and spectral smoothness[J].IEEE Tran-sactions on Speech and Audio Proceessing,2003,11(6):804-816.
[14]CHEN J.Research on multi fundamental frequency estimation of piano music[D].Chengdu:University of Electronic Science and technology,2016.
[15]YU L,WU H J,JIANG W K.Multi channel speech enhance-ment based on beamforming and Gan networks[J].Noise and Vibration Control,2018,38(z1):591-596.
[16]LIU H,LI Y,YUAN H Q,et al.Speech signal separation based on generated countermeasure network[J].Computer Enginee-ring,2020,46(1):302-308.
[17]LI Y P,CAO P,SHI Y,et al.Speech conversion based on variational auto encoder and auxiliary classifier in non parallel text[J].Fudan Journal (Natural Science Edition),2020,59(3):322-329.
[18]CHENG X Y,XIE L,ZHU J X,et al.A review of generative countermeasure network Gan[J].Computer Science,2019,46(3):74-81.
[19]PHILLIP I,JUNYAN Z,TINGHUI Z,et al.Image-to-ImageTranslation with Conditional Adversarial Networks[J].arXiv:1611.07004,2018.
[20]EMIYA V,BADEAU R,DAVID B.Multipitch estimation ofpiano sounds using a new probabilistic spectral smoothness principle[J].IEEE Transactions on Audio,Speech,and Language Processing,2010,18(6):1643-1654.
[1] 张佳, 董守斌.
基于评论方面级用户偏好迁移的跨领域推荐算法
Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer
计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[2] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[3] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[4] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[5] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[6] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[7] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[8] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[9] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[10] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[11] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[12] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[13] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[14] 吴子斌, 闫巧.
基于动量的映射式梯度下降算法
Projected Gradient Descent Algorithm with Momentum
计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
[15] 杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行.
基于步态分类辅助的虚拟IMU的行人导航方法
Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification
计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!