Computer Science ›› 2020, Vol. 47 ›› Issue (5): 90-95.doi: 10.11896/jsjkx.190300150

• Databωe & Big Data & Data Science • Previous Articles     Next Articles

Multi-label Learning Algorithm Based on Association Rules in Big Data Environment

WANG Qing-song, JIANG Fu-shan, LI Fei   

  1. College of Information,Liaoning University,Shenyang 110036,China
  • Received:2019-03-28 Online:2020-05-15 Published:2020-05-19
  • About author:WANG Qing-song,born in 1974,asso-ciate professor.His main research inte-rests include big data and data mining.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61802160).

Abstract: In the traditional single-label mining technology research,each sample belongs to only one label and the labels are mutually exclusive.In the multi-label learning problem,one sample may correspond to multiple labels,and each label is often asso-ciated with each other.At present,the research on the correlation between tags gradually becomes a hot issue in multi-label lear-ning research.Firstly,in order to adapt to the big data environment,the traditional association rule mining algorithm Apriori is parallelized and improved.The Hadoop-based parallelization algorithm Apriori_ING is proposed to realize the generation of the candidate set,the pruning and the support number statistics,and the parallelization.The advantage is that the frequent itemsets and association rules obtained by the Apriori_ING algorithm generate tag sets,and the inference engine based tag set generation algorithm IETG is proposed.Then,the label set is applied to multi-label learning,and a multi-label learning algorithm FreLP is proposed.FreLP uses association rules to generate a set of labels,decomposes the original set of labels into multiple subsets,and then uses the LP algorithm to train the classifier.FreLP was compared with the existing multi-label learning algorithms.Experiment results show that the proposed algorithm can obtain better results under different evaluation indicators.

Key words: Apriori, Association rule, Hadoop, LP, Multi-label learning

CLC Number: 

  • TP301
[1]TSOUMAKAS G,KATAKIS I,VLAHAVAS I.Mining multi-labeldata[M]//Data mining and knowledge discovery handbook.US:Springer,2010:667-685.
[2]LI L,WANG M,ZHANG L,et al.Learning semantic similarityfor multi-label text categorization[C]//Chinese LexicalSemantics Lecture Notes in Computer Science.2014:260-269.
[3]RUBIN T N,CHAMBERS A,SMYTH P,et al.Statistical topic models for multi-label document classification[J].Machine Learning,2012,88(1):157-208.
[4]JIANG J Y,TSAI S C,LEE S J.FSKNN:multi-label text categorization based on fuzzy similarity and k nearest neighbors[J].Expert Systems with Applications,2012,39(1):521-530.
[5]LIU S M,CHEN J H.A multi-label classification based ap-proach for sentiment classification[J].Expert Systems with Applications,2015,42(3):1083-1093.
[6]HUANG S,PENG W,LI J,et al.Sentiment and topic analysis on social media:a multi-task multi-label classification approach[C]//Proceedings of the 5th Annual ACM Web Science Confe-rence.2013:172-181.
[7]LO H Y,WANG J C,WANG H M,et al.Cost-Sensitive multi-label learning for audio tag annotation and retrieval[J].IEEE Trans.on Multimedia,2011,13(3):518-529.
[8]WU B,LYU S,HU B G,et al.Multi-label learning with missing labels for image annotation and facial action unit recognition[J].Pattern Recognition,2015,48(7):2279-2289.
[9]ZHANG M L,ZHOU Z H.Multi-label neural networks withapplications to functional genomics and text categorization [J].IEEE Transactions on Knowledge and Data Engineering,2007,18(10):1338-1351.
[10]ZHOU Y,XUE H,GENG X.Emotion distribution recognition from facial expressions[C]//Proc.of the ACM Int'l Conf.on Multimedia.2015:1247-1250.
[11]BOUTELL M R,LUO J,SHEN X,et al.Learning multi-label scene classification[J].Pattern Recognition,2004,37(9):1757-1771.
[12]READ J,PFAHRINGER B,HOLMES G.Multi-label classification using ensembles of pruned sets[C]//8th IEEE Internatio-nal Conference on Data Mining (ICDM'08).2008:995-1000.
[13]READ J,PFAHRINGER B,HOLMES G,et al.Classifier chains for multi-label classification[C]//20th European Conference on Machine Learning(ECML'09).Berlin:Springer,2009:254-269.
[14]SCHAPIRE R E,SINGER Y.BoosTexter:a boosting-based system for text categorization[J].Machine Learning,2000,39(2/3):135-168.
[15]DOQUIRE,GAUTHIER,VERLEYSEN,et al.Mutual information-based feature selection for multilabel classification [J].Neurocomputing,2013,122:148-155.
[16]LI S N,LI N,LI Z H.Multi-label Data Mining Technology:A Review [J].Computer Science,2013,40(4):14-21.
[17]LIU J Y,JIA X Y.A multi-label classification algorithm using association rules mining [J].Journal of Software,2017,28(11):2865-2878.
[18]XIAO W,HU J,ZHOU X F.A Survey of Algorithms for Mi-ning Parallel Association Rules Based on MapReduce-based Computing Model [J].Computer Applied Research,2018,35(1):13-23.
[19]ZHANG M L,ZHOU Z H.A Review on Multi-Label Learning Algorithms [J].IEEE Trans. on Knowledge and Data Enginee-ring,2014,26(8):1819-1837.
[20]FURNKRANZ J,HULLERMEIER E,MENCIA E L,et al.Multi-labelclas-sification via calibrated label ranking [J].Machine Learning,2008,73(2):133-152.
[21]TSOUMAKAS G,VLAHAVAS I.Random k-labelsets:an ensemble method for multilabel classification[C]//Proceedings of the 18th European Conference on Machine Learning.2007:406-417.
[22]CHENG X Q,JIN X L,WANG Y Z,et al.Survey on big data system and analytic technology[J].Journal of Software,2014,25(9):1889-1908.
[23]AGRAWAL R,SRIKANT R.Fast algorithm for mining association rules[C]//Processdings of 20th Int.Conf.Very Large Data Bases(VLDB).Morgan Kaufman Press.1994:487-499.
[24]XING C Z,AN W G,WANG X.Improvement of algorithm for mining frequent itemsets in vertical data format [J].Computer Engineering and Science,2017,39(7):1365-1370.
[25]LIU S H,LIU S J,CHEN S X,et al.IOMRA:a high efficiency frequent itemset mining algorithm based on the MapReduce computation model[C]//Proc of IEEE International Conference on Computational Science and Engineering.2014:1290-1295.
[26]TSOUMAKAS G,VILCEK J,XIOUFITS E S.Mulan:A Java library for multi-label learning[OL].http://mulan.sourceforge.net/datasets.html.
[1] CAO Yang-chen, ZHU Guo-sheng, SUN Wen-he, WU Shan-chao. Study on Key Technologies of Unknown Network Attack Identification [J]. Computer Science, 2022, 49(6A): 581-587.
[2] SUN Lin, HUANG Miao-miao, XU Jiu-cheng. Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief [J]. Computer Science, 2022, 49(4): 152-160.
[3] TIAN Bing-chuan, TIAN Chen, ZHOU Yu-hang, CHEN Gui-hai, DOU Wan-chun. Reducing Head-of-Line Blocking on Network in Hadoop Clusters [J]. Computer Science, 2022, 49(3): 11-22.
[4] LUO Wen-cong, ZHENG Jia-li, QUAN Yi-xuan, XIE Xiao-de, LIN Zi-han. Optimized Deployment of RFID Reader Antenna Based on Improved Multi-objective Salp Swarm Algorithm [J]. Computer Science, 2021, 48(9): 292-297.
[5] XU Hui-hui, YAN Hua. Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children [J]. Computer Science, 2021, 48(6): 210-214.
[6] SHEN Xia-jiong, YANG Ji-yong, ZHANG Lei. Attribute Exploration Algorithm Based on Unrelated Attribute Set [J]. Computer Science, 2021, 48(4): 54-62.
[7] YU Jia-shan, WU Lei. Two Types of Leaders Salp Swarm Algorithm [J]. Computer Science, 2021, 48(4): 254-260.
[8] ZHOU Chuan. Optimization of Sharing Bicycle Density Distribution Based on Improved Salp Swarm Algorithm [J]. Computer Science, 2021, 48(11A): 106-110.
[9] LIAN Wen-juan, ZHAO Duo-duo, FAN Xiu-bin, GENG Yu-nian, FAN Xin-tong. CFL_BLP_BC Model Based on Authentication and Blockchain [J]. Computer Science, 2021, 48(11): 36-45.
[10] ZHANG Zhi-qiang, LU Xiao-feng, SUI Lian-sheng, LI Jun-huai. Salp Swarm Algorithm with Random Inertia Weight and Differential Mutation Operator [J]. Computer Science, 2020, 47(8): 297-301.
[11] ZHANG Yan, QIN Liang-xi. Improved Salp Swarm Algorithm Based on Levy Flight Strategy [J]. Computer Science, 2020, 47(7): 154-160.
[12] ZHANG Su-mei and ZHANG Bo-tao. Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization [J]. Computer Science, 2020, 47(6A): 84-88.
[13] CHEN Meng-hui, CAO Qian-feng and LAN Yan-qi. Heuristic Algorithm Based on Block Mining and Recombination for Permutation Flow-shop Scheduling Problem [J]. Computer Science, 2020, 47(6A): 108-113.
[14] CUI Wei, JIA Xiao-lin, FAN Shuai-shuai and ZHU Xiao-yan. New Associative Classification Algorithm for Imbalanced Data [J]. Computer Science, 2020, 47(6A): 488-493.
[15] LIU Xiao-ling,LIU Bai-song,WANG Yang-yang,TANG Hao. Research and Development of Multi-label Generation Based on Deep Learning [J]. Computer Science, 2020, 47(3): 192-199.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!