计算机科学 ›› 2018, Vol. 45 ›› Issue (11): 256-260.doi: 10.11896/j.issn.1002-137X.2018.11.040

• 人工智能 • 上一篇    下一篇

基于稀疏表示和特征加权的大数据挖掘方法的研究

蔡柳萍1, 解辉2, 张福泉3, 张龙飞3   

  1. (广东技术师范学院天河学院计算机科学与工程学院 广州510540)1
    (清华大学计算机科学与技术系 北京100084)2
    (北京理工大学软件学院 北京100081)3
  • 收稿日期:2018-02-14 发布日期:2019-02-25
  • 作者简介:蔡柳萍(1981-),女,硕士,讲师,高级工程师,主要研究方向为大数据、软件工程研究等,E-mail:4425335@qq.com(通信作者);解 辉(1981-),男,博士,副教授,主要研究方向为计算网络、大数据处理;张福泉(1975-),男,博士,副教授,CCF会员,主要研究方向为创意计算、大数据;张龙飞(1977-),男,博士,副教授,主要研究方向为数字媒体科学、大数据。
  • 基金资助:
    本文受文化部国家科技支撑计划项目(2012BAH38F00),广东省本科高校应用型人才培养课程建设项目:能力培养导向的计算机类应用型课程建设(2017SZ03),广东省科技计划项目:基于医药电商大数据的服务系统研发(2016A010101029),广东技术师范学院天河学院计算机科学与技术重点学科建设项目(Xjt201702)资助。

Study on Big Data Mining Method Based on Sparse Representation and Feature Weighting

CAI Liu-ping1, XIE Hui2, ZHANG Fu-quan3, ZHANG Long-fei3   

  1. (School of Computer Science & Engineering,Tianhe College of Guangdong Polytechnic Normal University,Guangzhou 510540,China)1
    (Department of Computer Sciences and Technology,Tsinghua University,Beijing 100084,China)2
    (School of Software,Beijing Institute of Technology,Beijing 100081,China)3
  • Received:2018-02-14 Published:2019-02-25

摘要: 为了提高大数据挖掘的效率及准确度,文中将稀疏表示和特征加权运用于大数据处理过程中。首先,采用求解线性方程稀疏解的方式对大数据进行特征分类,在稀疏解的求解过程中利用向量的范数将此过程转化为最优化目标函数的求解。在完成特征分类后进行特征提取以降低数据维度,最后充分结合数据在类中的分布情况进行有效加权来实现大数据挖掘。实验结果表明,相比于常见的特征提取和特征加权算法,提出的算法在查全率和查准率方面均呈现出明显优势。

关键词: 大数据, 数据挖掘, 特征加权, 特征提取, 稀疏表示

Abstract: In order to improve the efficiency and accuracy of big data mining,this paper applied the sparse representation and feature weighting into big data processing.At first,the features of big data are classified by solving the sparse mode of linear equation.In the process of solving the sparse solution,a vector norm is utilized to transform this process into the process of solving the optimization objective function.After feature classification,feature extraction is executed to reduce the dimensionality of data.Finally,the distribution of data in the class is combined sufficiently to conduct weighting effectively,thus realizing data mining.The experimental results suggest that the proposed algorithm is supe-rior to the common feature extraction and feature weighting algorithms in the terms of recall and precision.

Key words: Big data, Data mining, Feature extraction, Feature weighting, Sparse representation

中图分类号: 

  • TP301
[1]LIANG J Y.Challenges and Reflections on large data mining[J].Computer Science,2016,43(7):1-2.(in Chinese)
梁吉业.大数据挖掘面临的挑战与思考[J].计算机科学,2016,43(7):1-2.
[2]FENG Z,ZHU Y.A Survey on Trajectory Data Mining:Techniques and Applications[J].IEEE Access,2017,4:2056-2067.
[3]ZHANG Z,XU Y,YANG J,et al.A Survey of Sparse Representation:Algorithms and Applications[J].IEEE Access,2017,3:490-530.
[4]LIU L,TRAN T D,SANG P C.Partial face recognition:A sparse representation-based approach[C]∥IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2016:2389-2393.
[5]QIU D,LIU Y.Improved image super-resolution via sparse representation[J].Video Engineering,2016,12(8):100-104.
[6]BOLÓN-CANEDO V,SÁNCHEZ-MAROÑO N,ALONSO-BET- ANZOS A.Feature selection for high-dimensional data[J].Progress in Artificial Intelligence,2016,5(2):65-75.
[7]LIU J H,LIN M L,ZHANG J,et al.A kind of heuristic local random feature selection algorithm[J].Computer Engineering and Applications,2016,52(2):170-174.(in Chinese)
刘景华,林梦雷,张佳,等.一种启发式的局部随机特征选择算法[J].计算机工程与应用,2016,52(2):170-174.
[8]ZHANG Z,HANCOCK E R.A Graph-Based Approach to Feature Selection[C]∥International Conference on Graph-Based Representations in Pattern Recognition.Springer-Verlag,2017:205-214.
[9]RIVEROMORENO C J,BRES S.Texture Feature Extraction and Indexing by Hermite Filters[C]∥International Conference on Pattern Recognition.IEEE,2017:684-687.
[10]JIANG F,LI G H,YUE X.Semantic-based Feature Extraction Method for Document[J].Computer Science,2016,43(2):254-2589.(in Chinese)
姜芳,李国和,岳翔.基于语义的文档特征提取研究方法[J].计算机科学,2016,43(2):254-258.
[11]ZHOU G,CICHOCKI A,ZHANG Y,et al.Group Component Analysis for Multiblock Data:Common and Individual Feature Extraction[J].IEEE Transations Neural Networks Learning Systems,2016,27(11):2426-2439.
[12]XIAO L Y,CHEN X H,LIN X L.Feature Weighted and Improved Partition Fuzzy C-Means Cluster Algorithm[J].Microelectronics &Computer,2016,33(10):143-146.(in Chinese)
肖林云,陈秀宏,林喜兰.特征加权和优化划分的模糊C均值聚类算法[J].微电子学与计算机,2016,33(10):143-146.
[13]ZHANG L,JIANG L,LI C,et al.Two feature weighting approaches for naive Bayes text classifiers[J].Knowledge-Based Systems,2016,100(C):137-144.
[14]LUO Y,ZHAO S L,LI X C,et al.Text keyword extraction method based on word frequency statistics[J].Journal of Computer Applications,2016,36(3):718-725.(in Chinese)
罗燕,赵书良,李晓超,等.基于词频统计的文本关键词提取方法[J].计算机应用,2016,36(3):718-725.
[15]CHEN Z,XIA J B,BAI J,et al.Feature Extraction Algorithm Based on Evolutionary Deep Learning[J].Computer Science,2015,42(11):288-292.(in Chinese)
陈珍,夏靖波,柏骏,等.基于进化深度学习的特征提取算法[J].计算机科学,2015,42(11):288-292.
[16]ZENG Q S,HUANG X Y.Fast Data Mining Algorithm Based on FP-tree.Journal of Chongqing University of Technology(Natural Science),2009,23(10):72-76.(in Chinese)
曾庆森,黄贤英.基于FP-tree的快速数据挖掘算法.重庆理工大学学报(自然科学),2009,23(10):72-76.
[1] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[2] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[3] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[4] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[5] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[6] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[7] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[8] 高元浩, 罗晓清, 张战成.
基于特征分离的红外与可见光图像融合算法
Infrared and Visible Image Fusion Based on Feature Separation
计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148
[9] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[10] 孙轩, 王焕骁.
政务大数据安全防护能力建设:基于技术和管理视角的探讨
Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives
计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010
[11] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[12] 左杰格, 柳晓鸣, 蔡兵.
基于图像分块与特征融合的户外图像天气识别
Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion
计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263
[13] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[14] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[15] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!