计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 24-28.doi: 10.11896/JsJkx.191200022

• 人工智能 • 上一篇    下一篇

一种基于音高显著性增强的主旋律提取方法

金文清, 韩芳   

  1. 东华大学信息科学与技术学院 上海 201620
  • 发布日期:2020-07-07
  • 通讯作者: 韩芳(yadiahan@dhu.edu.cn)
  • 作者简介:Jwq_wem@163.com
  • 基金资助:
    国家自然科学基金(11572084,11972115)

Main Melody Extraction Method Based on Saliency Enhancement

JIN Wen-qing and HAN Fang   

  1. School of Information Science and Technology,Donghua University,Shanghai 201620,China
  • Published:2020-07-07
  • About author:JIN Wen-qing, born in 1996, master.His main research interests include deep learning and music information retrieval.
    HAN Fang, born in 1981, Ph.D, professor.Her main research interests include intelligent systems and neurodynamics.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (11572084,11972115).

摘要: 在音乐信息检索领域,主旋律的提取是一项非常困难的工作。复调音乐中的不同声源相互影响,导致主旋律音高序列不连续,使旋律原始音高准确率降低。针对这一问题,设计了增强音高显著性表示和自动旋律跟踪的CNN-CRF模型。为了更好地提取谐波信息,提出利用结构化的数据来加强SF-NMF计算的初始显著性表示,并在动态规划框架下结合旋律特征和音高的平滑约束条件在音高空间寻找最优的演变路径。实验表明,所提方法得到了较好的旋律提取结果,且在两个测试数据集上的原始音高准确率均高于其他参考方法,通过对比不同输入验证了结构化数据能加强显著性表示并弥补SF-NMF对音高的误判。

关键词: CNN-CRF, 音高显著性增强, 音乐信号处理, 音乐信息检索, 主旋律提取

Abstract: In the field of music information retrieval,the extraction of the main melody is a very difficult task.In the polyphonic music,different sound sources interact with each other,leading to discontinuity of the main melody’s pitch sequence,which reduces the accuracy of the original pitch of the melody.In response to this problem,a CNN-CRF model with enhanced pitch salie-ncy representation and automatic melody tracking is designed.In order to better extract the harmonic information,it is proposed to enhance the initial saliency representation of the SF-NMF calculation by structured data,and to combine the melody characte-ristics and the smooth constraint conditions of the pitch under the dynamic programming framework to find the optimal evolution path.Experiments show that the proposed method has better melody extraction results,and the original pitch accuracy on both test data sets is higher than that of other reference methods.Comparing different inputs validates that structured data can enhance the significance representation and make up for the misJudgement of pitch by SF-NMF.

Key words: CNN-CRF, Melody extraction, Music information retrieval, Music signal processing, Pitch saliency enhancement

中图分类号: 

  • TP391
[1] SALAMON J,GOMEZ E.Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics.IEEE Transactions on Audio,Speech,and Language Processing,2012,20(6):1759-1770.
[2] ZHANG W W,CHEN Z,YIN F L,et al.Review on Melody Extraction from Polyphonic Music.Acta Electronica Sinica,2017,45(4):1000-1011.
[3] KLAPURI A P.Multiple fundamental frequency estimation by summing harmonic amplitudes//7th International Society for Music Information Retrieval Conference(ISMIR).Victoria:Music Information Retrieval Society,2006:216-221.
[4] DURRIEU J L,DAVID B,GAL R.A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation.IEEE Journal of Selected Topics in Signal Processing,2011,5(6):1180-1191.
[5] DURRIEU J L,RICHARD G,DAVID B,et al.Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals.IEEE Transactions on Audio,Speech and Language Processing,2010,18(3):564-575.
[6] BOSCH J J,BITTNER R M,SALAMON J,et al.A Comparison of Melody Extraction Methods Based on Source-Filter Modelling//17th International Society for Music Information Retrieval Conference (ISMIR 2016).2016:571-577.
[7] GONG J C,LIU G.A Melody Pitch Extraction Algorithm for Waveform File Based On Hidden Markov Mode.Software,2013,34(12):152-155,177.
[8] KUM S,OH C,NAM J.Melody extraction on vocal segments using multi-column deep neural networks//International Society for Music Information Retrieval Conference.2016:819-825.
[9] BITTNER R M,MCFEE B,SALAMON J,et al.Deep salience representations for f0 estimation in polyphonicmusic//Proceedings of the International Society for Music Information Retrieval (ISMIR).2017.
[10] SU L.Vocal melody extraction using patch-based CNN//IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2018:371-375.
[11] PARK H,YOO C D.Melody extraction and detection through LSTM-RNN with harmonic sum loss//IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2017:2766-2770.
[12] BASARAN D,ESSID S,PEETERS G.Main melody extraction with source-filter NMF and CRNN//Proceedings of International Society for Music Information Retrieval (ISMIR).2018:82-89.
[13] ZHANG W W,CHEN Z,YIN F L.Melody Extraction from Polyphonic Music Combining Modified Euclidean Algorithm and Dynamic Programming.Journal of Signal Processing,2018,34(8):1008-1015.
[14] FANG X Y.Research on Melody Extraction in Polyphonic Music.BeiJing:BeiJing University of Posts and Telecommunications,2017.
[15] MCCALLUM A,LI W.Early results fornamed entity recognition with conditional random fields,feature induction and web-enhanced lexicons//Proceedings of the Seventh Conference on Natural Language Learningat HLT-NAACL 2003.Association for Computational Linguistics,2003:188-191.
[16] LI W,FENG X Y,WU Y M,ZHANG X L.Review on Main Melody Extraction from Pop Music.Computer Science,2017,44(5):1-5.
[1] 李伟,冯相宜,吴益明,张旭龙.
流行音乐主旋律提取技术综述
Review on Main Melody Extraction from Pop Music
计算机科学, 2017, 44(5): 1-5. https://doi.org/10.11896/j.issn.1002-137X.2017.05.001
[2] 周利娟,林鸿飞,闫俊.
基于TLDA和SVSM的音乐信息检索模型
Tags Know You Better:A New Approach to Enhancing MIR System
计算机科学, 2014, 41(2): 174-178.
[3] 邱诚 王大海 任伟家 邹权.
基于集成学习的音乐识别方法研究
Research of Music Recognition Based on Ensemble Learning
计算机科学, 2012, 39(12): 188-191.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!