基于MFCC和常数Q变换的乐器音符识别

doi:10.11896/jsjkx.190100224

摘要/Abstract

摘要： 音符识别是音乐信号分析处理领域内非常重要的研究内容,它为计算自动识谱、乐器调音、音乐数据库检索和电子音乐合成提供技术基础。传统的音符识别方法通过估计音符基频与标准频率进行一一对应识别。然而一一对应较为困难,且随着音符基频的增大将导致误差增大,可识别的音符基频范围不广。为此,文中采用分类的思想进行音符识别。首先,建立所需识别的音符音频库,并针对音乐信号低频信息的重要性,选取梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficients,MFCC)和常数Q变换(Constant Q Transform,CQT)作为音符信号提取特征。然后,将提取的特征MFCC和CQT分别作为音符识别的单一特征输入和两者特征融合输入;结合Softmax回归模型在多分类问题中的优势以及BP神经网络良好的非线性映射能力与自学习能力,构建基于Softmax回归模型的BP神经网络多分类识别器。在MATLAB R2016a的仿真环境下,将特征参数输入到多分类器中进行学习与训练,通过调整网络参数来寻找最优解。通过改变训练样本数进行对比实验。实验结果表明,将融合特征(MFCC+CQT)作为特征输入时,可以识别出从大字组到小字三组的25类音符,并可以获得95.6%的平均识别率;在识别过程中,特征CQT比特征MFCC的贡献更大。实验数据充分说明,利用分类的思想提取音符信号的MFCC和CQT特征来进行音符识别,可以取得很好的识别效果,并且不受音符基频范围的限制。

关键词: BP神经网络, MFCC, Softmax回归模型, 常数Q变换, 特征融合, 音符库

Abstract: Musical note recognition is a very important research content in the field of music signal analyzing and processing.It provides a technical basis for automatic music transcription,musical instrument tuning,music database retrieval and electronic music synthesis.In the conventional note recognition method,the musical note of one-to-one correspondence is identified by estimating the fundamental frequency of the note and the standard frequency.However,one-to-one correspondence is more difficult to identify,and the error increases as the fundamental frequency of the musical note increases.And the identifiable musical note frequency range is not wide.To this end,the paper used the idea of classification for musical note recognition,and established the required musical note library.For the importance of the low frequency information of the music signal,the Mel Frequency Cepstrum Coefficient (MFCC) and the Constant Q Transform (CQT) are selected as the note signal extraction features.The extracted features MFCC and CQT are respectively input as a note recognition single feature,and the feature fusion input is performed.Combining the advantages of Softmax regression model in multi-classification problem and the good nonlinear mapping ability and self-learning ability of BP neural network,the BP neural network multi-classification recognizer is constructed based on Softmax regression model.In the simulation environment of MATLAB R2016a,the characteristic parameters were input into the multi-classifier for learning and training,and the optimal solution was found by adjusting the network parameters.The comparative experi-ment was performed by changing the number of training samples.The experimental result data shows that when the fusion feature (MFCC+CQT) is used as the feature input,25 types of notes from the big character group to the small character group can be identified,and the average recognition rate of 95.6% can be obtained.And the feature CQT has a greater contribution than the feature MFCC in the recognition process.The experimental data fully demonstrates that using classification ideas for musical note recognition can achieve good recognition results and is not limited by the range of the musical note’s fundamental frequency.

Key words: BP neural network, Constant Q transform, Feature fusion, Mel frequency cepstrum coefficients, Music note library, Softmax regression model

中图分类号:

TP391

陈燕文,李坤,韩焱,王燕平. 基于MFCC和常数Q变换的乐器音符识别[J]. 计算机科学, 2020, 47(3): 149-155. https://doi.org/10.11896/jsjkx.190100224

CHEN Yan-wen,LI Kun,HAN Yan,WANG Yan-ping. Musical Note Recognition of Musical Instruments Based on MFCC and Constant Q Transform[J]. Computer Science, 2020, 47(3): 149-155. https://doi.org/10.11896/jsjkx.190100224

参考文献

[1]RABINER L.On the use of autocorrelation analysis for pitch detection[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,2003,25(1):24-33.
[2]RODET X,DOVAL B.Maximum-likelihood harmonic matc- hing for fundamental frequency estimation[J].Journal of the Acoustical Society of America,1992,92(4):2428-2429.
[3]SUN X.Pitch determination and voice quality analysis using Subharmonic-to-Harmonic Ratio[C]∥International Conference on Acoustics.IEEE Computer Society,2002.
[4]GU Y R,YANG L.Comparison of Several Music Recognition Algorithms[J].Journal of Nanjing University of Posts and Te-lecommunications,1998(2):36-40.
[5]KADAMBE S,BOUDREAUXBARTELS G F.Application of the wavelet transform for pitch detection of speech signals[J].IEEE Transactions on Information Theory,1992,38(2):917-924.
[6]WU J J,MENG L L.Basic frequency identification of musical notes[J].Electronic Measurement Technology,2009,32(4):126-128.
[7]ZHAI J W,WANG L,DU X W.Improved pitch recognition algorithm[J].Computer Engineering and Applications,2009,45(20):228-230.
[8]XU P J,GUO L,LIU S C.Note recognition algorithm based on joint detection of pitch and endpoint[J].Journal of Computer Applications,2011,31(s2):172-175.
[9]LIU Y,ZHAO T Z,JIANG Y Q,et al.Improved recognition algorithm for piano music based on autocorrelation function[J].Journal of Wuhan University of Technology,2018,40(2):208-213.
[10]LIU T.Application and research of musical note recognition algorithm based on nonlinear feature [J].Computer and Digital Engineering,2013,41(8):1246-1248.
[11]GUERRERO-TURRUBIATES J D J,GONZALEZ-REYNA S E,LEDESMA-OROZCO S E,et al.Pitch estimation for musical note recognition using Artificial Neural Networks[C]∥International Conference on Electronics.IEEE,2014.
[12]HONG L,XIAOLI X,GUOXIN W,et al.Research on speech emotion feature extraction based on MFCC[J／OL].Journal of Electronic Measurement and Instrumentation,http://www.en.cnki.com.cn/Article_en/CJFDTotal-DZIY201703023.html.
[13]SONG Z Y.Application of MATLAB in speech signal analysis and synthesis[M].Beijing:Beijing Aerospace University Press,2013.
[14]BROWN J C.Calculation of a constant Q spectral transform[J].Journal of the Acoustical Society of America,1998,89(1):425-434.
[15]BROWN J C,PUCKETTE M S.An efficient algorithm for the calculation of a constant Q transform[J].Journal of the Acoustical Society of America,1992,92(5):2698.
[16]DOBRE R A,NEGRESCU C.Automatic music transcription software based on constant Q transform[C]∥International Conference on Electronics.IEEE,2017.
[17]ZHAO H X,YANG W S.Audio recognition of vehicle type based on short-time energy and Mel cepstrum coefficient[J].Science Technology and Engineering,2018,18(18):197-201.
[18]SUN T T.Analysis of the timbre characteristics of musical instruments [D].Jinan:Shandong University,2012.
[19]DONG C H.Matlab neural network and its application[M].Beijing:National Defence Industry Press,2007.
[20]LIU Y C,TANG Z L.Multi-classification identification method for communication signal cyclic spectrum based on softmax regression[J].Modern Electronic Technology,2018,41(3):1-5.
[21]Musical instrument music signal acquisition specification:GB/T 30414-2013[S].Beijing:China Standard Press,2013.
[22]YAN K.Research on piano sound field approximation based on microphone array [D].Guangzhou:South China University of Technology,2018.
[23]JING L,XIE L.Comparison of Performance in Automatic Classification between Chinese and Western Musical Instruments[C]∥Wase International Conference on Information Enginee-ring.2010.

相关文章 15

[1]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[2]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[3]	郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[4]	刘宝宝, 杨菁菁, 陶露, 王贺应. 基于DE-LSTM模型的教育统计数据预测研究 Study on Prediction of Educational Statistical Data Based on DE-LSTM Model 计算机科学, 2022, 49(6A): 261-266. https://doi.org/10.11896/jsjkx.220300120
[5]	杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[6]	陈永平, 朱建清, 谢懿, 吴含笑, 曾焕强. 基于外接圆半径差损失的实时安全帽检测算法 Real-time Helmet Detection Algorithm Based on Circumcircle Radius Difference Loss 计算机科学, 2022, 49(6A): 424-428. https://doi.org/10.11896/jsjkx.220100252
[7]	孙洁琪, 李亚峰, 张文博, 刘鹏辉. 基于离散小波变换的双域特征融合深度卷积神经网络 Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation 计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199
[8]	徐佳楠, 张天瑞, 赵伟博, 贾泽轩. 面向供应链风险评估的改进BP小波神经网络研究 Study on Improved BP Wavelet Neural Network for Supply Chain Risk Assessment 计算机科学, 2022, 49(6A): 654-660. https://doi.org/10.11896/jsjkx.210800049
[9]	朱旭辉, 沈国娇, 夏平凡, 倪志伟. 基于螺旋进化萤火虫算法和BP神经网络的模型及其在PPP融资风险预测中的应用 Model Based on Spirally Evolution Glowworm Swarm Optimization and Back Propagation Neural Network and Its Application in PPP Financing Risk Prediction 计算机科学, 2022, 49(6A): 667-674. https://doi.org/10.11896/jsjkx.210800088
[10]	蓝凌翔, 池明旻. 基于特征注意力融合网络的遥感变化检测研究 Remote Sensing Change Detection Based on Feature Fusion and Attention Network 计算机科学, 2022, 49(6): 193-198. https://doi.org/10.11896/jsjkx.210500058
[11]	李发光, 伊力哈木·亚尔买买提. 基于改进CenterNet的航拍绝缘子缺陷实时检测模型 Real-time Detection Model of Insulator Defect Based on Improved CenterNet 计算机科学, 2022, 49(5): 84-91. https://doi.org/10.11896/jsjkx.210400142
[12]	王学光, 诸珺文, 张爱新. 基于ARIMA预测MFCC特征的声纹同一性鉴定方法 Identification Method of Voiceprint Identity Based on ARIMA Prediction of MFCC Features 计算机科学, 2022, 49(5): 92-97. https://doi.org/10.11896/jsjkx.210400071
[13]	董奇达, 王喆, 吴松洋. 结合注意力机制与几何信息的特征融合框架 Feature Fusion Framework Combining Attention Mechanism and Geometric Information 计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180
[14]	李鹏祖, 李瑶, Ibegbu Nnamdi JULIAN, 孙超, 郭浩, 陈俊杰. 基于多特征融合的重叠组套索脑功能超网络构建及分类 Construction and Classification of Brain Function Hypernetwork Based on Overlapping Group Lasso with Multi-feature Fusion 计算机科学, 2022, 49(5): 206-211. https://doi.org/10.11896/jsjkx.210300049
[15]	范新南, 赵忠鑫, 严炜, 严锡君, 史朋飞. 结合注意力机制的多尺度特征融合图像去雾算法 Multi-scale Feature Fusion Image Dehazing Algorithm Combined with Attention Mechanism 计算机科学, 2022, 49(5): 50-57. https://doi.org/10.11896/jsjkx.210400093

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed