计算机科学 ›› 2015, Vol. 42 ›› Issue (Z6): 462-464.

• 数据挖掘 • 上一篇    下一篇

一个基于压缩后缀数组的乐纹索引算法

刘学政,史有群,罗辛,陶然   

  1. 东华大学计算机科学与技术学院 上海201620,东华大学计算机科学与技术学院 上海201620,东华大学计算机科学与技术学院 上海201620,东华大学计算机科学与技术学院 上海201620
  • 出版日期:2018-11-14 发布日期:2018-11-14

Index Algorithm for Audio Fingerprints Based on Compressed Suffix Array

LIU Xue-zheng, SHI You-qun, LUO Xin and TAO Ran   

  • Online:2018-11-14 Published:2018-11-14

摘要: 在基于乐纹的音乐检索系统中,提取的乐纹的多少决定了检索结果的匹配度,这就造成了数据库大小与检索匹配度不能兼顾的矛盾。提出使用压缩后缀数组来压缩乐纹索引的方法,解决全文索引时索引空间过大的问题。主要利用有序乐纹数据中较高位特征出现重复 的概率大的特点,使用游程编码对乐纹序列进行无损压缩。 实验结果表明,该方法在包含2000首歌曲的数据库中仅需要使用原来80%的乐纹数据空间,在包含12000首歌曲的数据库中只需要使用原来30%的乐纹数据空间。与传统的后缀数组索引方法相比,该方法需要的索引存储空间仅为原来的60%。

Abstract: In music retrieval methods based on audio fingerprints,a large database is required in order to compare with the fingerprints leading to low retrieval efficiency.In this paper,we proposed a method for index compression using a compressed suffix array,in order to overcome the disadvantage of large memory cost with full-text indexing.The proposed method compresses data sequences lossless by run length encoding,mainly taking advantage of the fact that the repetitivecharacters occur frequently in higher bits of the sorted audio fingerprint data.The experimental results show that the proposed method,compared with the conventional method,only needs 30% of the space of an audio fingerprints database for a music database consisting of 12000 songs,and around 80% of the index space for a database of 2000 songs.Moreover,theentire space cost is reduced to around 60%,compared with the method based on the suffix array.

Key words: Audio fingerprint,Compressed suffix array,Index compression,Run length encoding,Vertical code

[1] Casey M A,Veltkamp R,Goto M,et al.Content-based music information retrieval:current directions and future challenges[J].Proceedings of the IEEE,2008,96(4):668-696
[2] Xiao Q,Xin Luo,Saito N,et al.Index Compression for Au-dio Fingerprinting Systems Based on Compressed Suffix Array[J].International Journal of Information and Education Technology,2013,3(4):455-460
[3] Bellettini C,Mazzin G.A Framework for Robust Audio Fingerprinting [J].Journal of Communications,2010,5(5):409-424
[4] Xiao Qing-mei,Saito N,Luo Xin,et al.Index Compression for Audio Fingerprinting Systems Based on Compressed Suffix Array[J].International Journal of Information and Education Technology,2013,3(4):455-460
[5] Haitsma J,Kalker T.Ahighly robust audio fingerprinting system [C]∥Proc.3rd International Conference on Music Information Retrieval.2002
[6] 李伟,李晓强,陈芳,等.数字音频乐纹技术综述[J].小型微型计算机系统,2008,9(11):2124-2130
[7] Wang A L.An Industrial-Strength Audio Search Algorithm [C]∥Proc.the 4th International Conference on Music Information Retrieval(ISMIR 2003).2003
[8] Miller M,Rodriguez M,Cox I.Audio fingerprinting:Nearest nei-ghbor search in high dimensional binary spaces [J].Journal of VLSI Signal Processing,2005,41(3):285-291
[9] Manber U,Myers G.Suffix arrays:a new method for on-linestring searches [C] ∥1st ACM-SIAM Symposium on Discrete Algorithms.1990
[10] Jensen R,Shen Q.Semantics-preserving Dimensionality Reduction:Rough and Fuzzy-rough-based Approaches[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(12):1457-1471
[11] 徐英进,蔡锐,蔡莲红.一种基于 “乐纹” 的海量音乐检索系统[C]∥第二届和谐人机环境联合学术会议(HHME2006)——第15届中国多媒体学术会议(NCMT’06) 论文集.北京:清华大学出版社,2006
[12] 姚全珠,张楠,杨增辉,等.基于压缩后缀数组技术的搜索引擎[J].计算机工程,2008,34(10):83-85

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!