计算机科学 ›› 2018, Vol. 45 ›› Issue (11): 256-260.doi: 10.11896/j.issn.1002-137X.2018.11.040
蔡柳萍1, 解辉2, 张福泉3, 张龙飞3
CAI Liu-ping1, XIE Hui2, ZHANG Fu-quan3, ZHANG Long-fei3
摘要: 为了提高大数据挖掘的效率及准确度,文中将稀疏表示和特征加权运用于大数据处理过程中。首先,采用求解线性方程稀疏解的方式对大数据进行特征分类,在稀疏解的求解过程中利用向量的范数将此过程转化为最优化目标函数的求解。在完成特征分类后进行特征提取以降低数据维度,最后充分结合数据在类中的分布情况进行有效加权来实现大数据挖掘。实验结果表明,相比于常见的特征提取和特征加权算法,提出的算法在查全率和查准率方面均呈现出明显优势。
中图分类号:
[1]LIANG J Y.Challenges and Reflections on large data mining[J].Computer Science,2016,43(7):1-2.(in Chinese) 梁吉业.大数据挖掘面临的挑战与思考[J].计算机科学,2016,43(7):1-2. [2]FENG Z,ZHU Y.A Survey on Trajectory Data Mining:Techniques and Applications[J].IEEE Access,2017,4:2056-2067. [3]ZHANG Z,XU Y,YANG J,et al.A Survey of Sparse Representation:Algorithms and Applications[J].IEEE Access,2017,3:490-530. [4]LIU L,TRAN T D,SANG P C.Partial face recognition:A sparse representation-based approach[C]∥IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2016:2389-2393. [5]QIU D,LIU Y.Improved image super-resolution via sparse representation[J].Video Engineering,2016,12(8):100-104. [6]BOLÓN-CANEDO V,SÁNCHEZ-MAROÑO N,ALONSO-BET- ANZOS A.Feature selection for high-dimensional data[J].Progress in Artificial Intelligence,2016,5(2):65-75. [7]LIU J H,LIN M L,ZHANG J,et al.A kind of heuristic local random feature selection algorithm[J].Computer Engineering and Applications,2016,52(2):170-174.(in Chinese) 刘景华,林梦雷,张佳,等.一种启发式的局部随机特征选择算法[J].计算机工程与应用,2016,52(2):170-174. [8]ZHANG Z,HANCOCK E R.A Graph-Based Approach to Feature Selection[C]∥International Conference on Graph-Based Representations in Pattern Recognition.Springer-Verlag,2017:205-214. [9]RIVEROMORENO C J,BRES S.Texture Feature Extraction and Indexing by Hermite Filters[C]∥International Conference on Pattern Recognition.IEEE,2017:684-687. [10]JIANG F,LI G H,YUE X.Semantic-based Feature Extraction Method for Document[J].Computer Science,2016,43(2):254-2589.(in Chinese) 姜芳,李国和,岳翔.基于语义的文档特征提取研究方法[J].计算机科学,2016,43(2):254-258. [11]ZHOU G,CICHOCKI A,ZHANG Y,et al.Group Component Analysis for Multiblock Data:Common and Individual Feature Extraction[J].IEEE Transations Neural Networks Learning Systems,2016,27(11):2426-2439. [12]XIAO L Y,CHEN X H,LIN X L.Feature Weighted and Improved Partition Fuzzy C-Means Cluster Algorithm[J].Microelectronics &Computer,2016,33(10):143-146.(in Chinese) 肖林云,陈秀宏,林喜兰.特征加权和优化划分的模糊C均值聚类算法[J].微电子学与计算机,2016,33(10):143-146. [13]ZHANG L,JIANG L,LI C,et al.Two feature weighting approaches for naive Bayes text classifiers[J].Knowledge-Based Systems,2016,100(C):137-144. [14]LUO Y,ZHAO S L,LI X C,et al.Text keyword extraction method based on word frequency statistics[J].Journal of Computer Applications,2016,36(3):718-725.(in Chinese) 罗燕,赵书良,李晓超,等.基于词频统计的文本关键词提取方法[J].计算机应用,2016,36(3):718-725. [15]CHEN Z,XIA J B,BAI J,et al.Feature Extraction Algorithm Based on Evolutionary Deep Learning[J].Computer Science,2015,42(11):288-292.(in Chinese) 陈珍,夏靖波,柏骏,等.基于进化深度学习的特征提取算法[J].计算机科学,2015,42(11):288-292. [16]ZENG Q S,HUANG X Y.Fast Data Mining Algorithm Based on FP-tree.Journal of Chongqing University of Technology(Natural Science),2009,23(10):72-76.(in Chinese) 曾庆森,黄贤英.基于FP-tree的快速数据挖掘算法.重庆理工大学学报(自然科学),2009,23(10):72-76. |
[1] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[2] | 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平. 基于时空注意力克里金的边坡形变数据插值方法 Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation 计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161 |
[3] | 陈晶, 吴玲玲. 多源异构环境下的车联网大数据混合属性特征检测方法 Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment 计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273 |
[4] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[5] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[6] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[7] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[8] | 高元浩, 罗晓清, 张战成. 基于特征分离的红外与可见光图像融合算法 Infrared and Visible Image Fusion Based on Feature Separation 计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148 |
[9] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[10] | 孙轩, 王焕骁. 政务大数据安全防护能力建设:基于技术和管理视角的探讨 Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives 计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010 |
[11] | 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明. 大数据驱动的社会经济地位分析研究综述 Big Data-driven Based Socioeconomic Status Analysis:A Survey 计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014 |
[12] | 左杰格, 柳晓鸣, 蔡兵. 基于图像分块与特征融合的户外图像天气识别 Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion 计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263 |
[13] | 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008 |
[14] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[15] | 马董, 李新源, 陈红梅, 肖清. 星型高影响的空间co-location模式挖掘 Mining Spatial co-location Patterns with Star High Influence 计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186 |
|