计算机科学 ›› 2020, Vol. 47 ›› Issue (5): 90-95.doi: 10.11896/jsjkx.190300150
王青松, 姜富山, 李菲
WANG Qing-song, JIANG Fu-shan, LI Fei
摘要: 传统单标签挖掘技术研究中,每个样本只属于一个标签且标签之间两两互斥。而在多标签学习问题中,一个样本可能对应多个标签,并且各标签之间往往具有关联性。目前,标签间关联性研究逐渐成为多标签学习研究的热门问题。首先为适应大数据环境,对传统关联规则挖掘算法Apriori进行并行化改进,提出基于Hadoop的并行化算法Apriori_ING,实现各节点独立完成候选项集的生成、剪枝与支持数统计,充分发挥并行化的优势;通过Apriori_ING算法得到的频繁项集和关联规则生成标签集合,提出基于推理机的标签集合生成算法IETG。然后,将标签集合应用到多标签学习中,提出多标签学习算法FreLP。FreLP利用关联规则生成标签集合,将原始标签集分解为多个子集,再使用LP算法训练分类器。通过实验将FreLP与现有的多标签学习算法进行对比,结果表明在不同评价指标下所提算法可以取得更好的结果。
中图分类号:
[1]TSOUMAKAS G,KATAKIS I,VLAHAVAS I.Mining multi-labeldata[M]//Data mining and knowledge discovery handbook.US:Springer,2010:667-685. [2]LI L,WANG M,ZHANG L,et al.Learning semantic similarityfor multi-label text categorization[C]//Chinese LexicalSemantics Lecture Notes in Computer Science.2014:260-269. [3]RUBIN T N,CHAMBERS A,SMYTH P,et al.Statistical topic models for multi-label document classification[J].Machine Learning,2012,88(1):157-208. [4]JIANG J Y,TSAI S C,LEE S J.FSKNN:multi-label text categorization based on fuzzy similarity and k nearest neighbors[J].Expert Systems with Applications,2012,39(1):521-530. [5]LIU S M,CHEN J H.A multi-label classification based ap-proach for sentiment classification[J].Expert Systems with Applications,2015,42(3):1083-1093. [6]HUANG S,PENG W,LI J,et al.Sentiment and topic analysis on social media:a multi-task multi-label classification approach[C]//Proceedings of the 5th Annual ACM Web Science Confe-rence.2013:172-181. [7]LO H Y,WANG J C,WANG H M,et al.Cost-Sensitive multi-label learning for audio tag annotation and retrieval[J].IEEE Trans.on Multimedia,2011,13(3):518-529. [8]WU B,LYU S,HU B G,et al.Multi-label learning with missing labels for image annotation and facial action unit recognition[J].Pattern Recognition,2015,48(7):2279-2289. [9]ZHANG M L,ZHOU Z H.Multi-label neural networks withapplications to functional genomics and text categorization [J].IEEE Transactions on Knowledge and Data Engineering,2007,18(10):1338-1351. [10]ZHOU Y,XUE H,GENG X.Emotion distribution recognition from facial expressions[C]//Proc.of the ACM Int'l Conf.on Multimedia.2015:1247-1250. [11]BOUTELL M R,LUO J,SHEN X,et al.Learning multi-label scene classification[J].Pattern Recognition,2004,37(9):1757-1771. [12]READ J,PFAHRINGER B,HOLMES G.Multi-label classification using ensembles of pruned sets[C]//8th IEEE Internatio-nal Conference on Data Mining (ICDM'08).2008:995-1000. [13]READ J,PFAHRINGER B,HOLMES G,et al.Classifier chains for multi-label classification[C]//20th European Conference on Machine Learning(ECML'09).Berlin:Springer,2009:254-269. [14]SCHAPIRE R E,SINGER Y.BoosTexter:a boosting-based system for text categorization[J].Machine Learning,2000,39(2/3):135-168. [15]DOQUIRE,GAUTHIER,VERLEYSEN,et al.Mutual information-based feature selection for multilabel classification [J].Neurocomputing,2013,122:148-155. [16]LI S N,LI N,LI Z H.Multi-label Data Mining Technology:A Review [J].Computer Science,2013,40(4):14-21. [17]LIU J Y,JIA X Y.A multi-label classification algorithm using association rules mining [J].Journal of Software,2017,28(11):2865-2878. [18]XIAO W,HU J,ZHOU X F.A Survey of Algorithms for Mi-ning Parallel Association Rules Based on MapReduce-based Computing Model [J].Computer Applied Research,2018,35(1):13-23. [19]ZHANG M L,ZHOU Z H.A Review on Multi-Label Learning Algorithms [J].IEEE Trans. on Knowledge and Data Enginee-ring,2014,26(8):1819-1837. [20]FURNKRANZ J,HULLERMEIER E,MENCIA E L,et al.Multi-labelclas-sification via calibrated label ranking [J].Machine Learning,2008,73(2):133-152. [21]TSOUMAKAS G,VLAHAVAS I.Random k-labelsets:an ensemble method for multilabel classification[C]//Proceedings of the 18th European Conference on Machine Learning.2007:406-417. [22]CHENG X Q,JIN X L,WANG Y Z,et al.Survey on big data system and analytic technology[J].Journal of Software,2014,25(9):1889-1908. [23]AGRAWAL R,SRIKANT R.Fast algorithm for mining association rules[C]//Processdings of 20th Int.Conf.Very Large Data Bases(VLDB).Morgan Kaufman Press.1994:487-499. [24]XING C Z,AN W G,WANG X.Improvement of algorithm for mining frequent itemsets in vertical data format [J].Computer Engineering and Science,2017,39(7):1365-1370. [25]LIU S H,LIU S J,CHEN S X,et al.IOMRA:a high efficiency frequent itemset mining algorithm based on the MapReduce computation model[C]//Proc of IEEE International Conference on Computational Science and Engineering.2014:1290-1295. [26]TSOUMAKAS G,VILCEK J,XIOUFITS E S.Mulan:A Java library for multi-label learning[OL].http://mulan.sourceforge.net/datasets.html. |
[1] | 曹扬晨, 朱国胜, 孙文和, 吴善超. 未知网络攻击识别关键技术研究 Study on Key Technologies of Unknown Network Attack Identification 计算机科学, 2022, 49(6A): 581-587. https://doi.org/10.11896/jsjkx.210400044 |
[2] | 田冰川, 田臣, 周宇航, 陈贵海, 窦万春. 减少Hadoop集群中网络队头阻塞的调度算法 Reducing Head-of-Line Blocking on Network in Hadoop Clusters 计算机科学, 2022, 49(3): 11-22. https://doi.org/10.11896/jsjkx.210900117 |
[3] | 徐慧慧, 晏华. 基于相对危险度的儿童先心病风险因素分析算法 Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children 计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082 |
[4] | 沈夏炯, 杨继勇, 张磊. 基于不相关属性集合的属性探索算法 Attribute Exploration Algorithm Based on Unrelated Attribute Set 计算机科学, 2021, 48(4): 54-62. https://doi.org/10.11896/jsjkx.200800082 |
[5] | 廉文娟, 赵朵朵, 范修斌, 耿玉年, 范新桐. 基于认证及区块链的CFL_BLP_BC模型 CFL_BLP_BC Model Based on Authentication and Blockchain 计算机科学, 2021, 48(11): 36-45. https://doi.org/10.11896/jsjkx.201000002 |
[6] | 崔巍, 贾晓琳, 樊帅帅, 朱晓燕. 一种新的不均衡关联分类算法 New Associative Classification Algorithm for Imbalanced Data 计算机科学, 2020, 47(6A): 488-493. https://doi.org/10.11896/JsJkx.190600132 |
[7] | 张素梅, 张波涛. 一种基于量子耗散粒子群的评估模型构建方法 Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization 计算机科学, 2020, 47(6A): 84-88. https://doi.org/10.11896/JsJkx.190900148 |
[8] | 陈孟辉, 曹黔峰, 兰彦琦. 基于区块挖掘与重组的启发式算法求解置换流水车间调度问题 Heuristic Algorithm Based on Block Mining and Recombination for Permutation Flow-shop Scheduling Problem 计算机科学, 2020, 47(6A): 108-113. https://doi.org/10.11896/JsJkx.190300151 |
[9] | 刘晓玲,刘柏嵩,王洋洋,唐浩. 基于深度学习的多标签生成研究进展 Research and Development of Multi-label Generation Based on Deep Learning 计算机科学, 2020, 47(3): 192-199. https://doi.org/10.11896/jsjkx.190300137 |
[10] | 朱岸青, 李帅, 唐晓东. Spark平台中的并行化FP_growth关联规则挖掘方法 Parallel FP_growth Association Rules Mining Method on Spark Platform 计算机科学, 2020, 47(12): 139-143. https://doi.org/10.11896/jsjkx.191000110 |
[11] | 张蕾,蔡明. 基于主题融合和关联规则挖掘的图像标注 Image Annotation Based on Topic Fusion and Frequent Patterns Mining 计算机科学, 2019, 46(7): 246-251. https://doi.org/10.11896/j.issn.1002-137X.2019.07.037 |
[12] | 张维国. 面向知识推荐服务的选课决策 Decision Making of Course Selection Oriented by Knowledge Recommendation Service 计算机科学, 2019, 46(6A): 507-510. |
[13] | 贾宁, 李瑛达. 基于智能可穿戴设备的个性化健康监管平台的构建 Construction of Personalized Health Monitoring Platform Based on Intelligent Wearable Device 计算机科学, 2019, 46(6A): 566-570. |
[14] | 白若琛, 庞成鑫, 贾佳, 邱曙光, 邵嘉, 卢小姣. 多协议融合LPWAN能源物联网云平台的设计 Design of Cloud Platform for Energy Internet of Things Based on LPWAN Multi-protocol 计算机科学, 2019, 46(6A): 589-592. |
[15] | 陆鑫赟, 王兴芬. 基于领域关联冗余的教务数据关联规则挖掘 Educational Administration Data Mining of Association Rules Based on Domain Association Redundancy 计算机科学, 2019, 46(6A): 427-430. |
|