计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 94-98.doi: 10.11896/jsjkx.190800056

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于PCA和随机树的数据库异常访问检测

冯安然1,2, 王旭仁1,2, 汪秋云2, 熊梦博1,2   

  1. 1 首都师范大学信息工程学院 北京100048
    2 中国科学院信息工程研究所,中国科学院网络测评技术重点实验室 北京100093
  • 收稿日期:2019-08-13 发布日期:2020-09-10
  • 通讯作者: 王旭仁(wangxuren@cnu.edu.cn)
  • 作者简介:ann654175863@163.com
  • 基金资助:
    国家电网有限公司总部科技项目(5700-201972227A-0-0-00)

Database Anomaly Access Detection Based on Principal Component Analysis and Random Tree

FENG An-ran1,2, WANG Xu-ren1,2, WANG Qiu-yun2, XIONG Meng-bo1,2   

  1. 1 College of Information Engineering,Capital Normal University,Beijing 100048,China
    2 Key Laboratory of Network Assessment Technology,Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
  • Received:2019-08-13 Published:2020-09-10
  • About author:FENG An-ran,born in 1993,master.Her main research interests include data mining,cyber security and database security.
    WANG Xu-ren,born in 1972,postgra-duate,Ph.D,vice professor.Her main research interests include data mining,cyber security and database security.
  • Supported by:
    science and technology project of State Grid Corporation of China (5700-201972227A-0-0-00).

摘要: 数据库作为数据存储与交互的平台,其中包含了机密与重要信息,是恶意人员攻击的对象。外部人员的攻击可通过基于角色的权限控制系统对未经授权的用户访问进行限制,而来自内部人员的伪装攻击往往不易被察觉。针对数据库的访问行为,提出一种基于主成分分析(Principal Component Analysis,PCA)和随机树(Random Tree,RT)的异常访问检测算法PCA-RT。首先,根据用户提交的查询语句特征构造用户数据库访问行为轮廓向量;然后,利用PCA算法对用户行为轮廓进行降维,使用随机树算法训练异常检测器。基于事务处理性能委员会(Transaction Processing Performance Council,TPC)组织发布的新一代数据库性能评测标准TPC-E构造实验数据集,提取较为全面的用户数据库访问行为轮廓特征向量。仿真实验结果表明,使用PCA算法对数据的约简达到35%以上,PCA-RT算法的精确率和召回率分别提高了1.78%和9.76%,从而证明了用户行为轮廓向量构造方法和PCA-RT算法对TPC-E数据库用户访问行为的异常检测是有效的。

关键词: TPC-E, 数据库安全, 随机树算法, 异常检测, 用户行为轮廓, 主成分分析

Abstract: As a platform for data storage and interaction,database contains confidential and important information,making it a target of malicious personnel attacks.To prevent attacks from outsiders,database administrators can limit unauthorized user access through role-based access control system,while masquerade attacks from insiders are often less noticeable.Therefore,the research on database anomaly detection based on user behavior have important practical application value.A user anomaly detection algorithm PCA-RT based on Principal Component Analysis (PCA) and Random Tree (RT) is proposed for the anomaly detection of database user access behavior.Firstly,users’ profile is constructed according to the characteristics of the query submitted by the users,then the principal component analysis is applied to reducing the dimension of the users’ profile and feature selection.Finally,random tree has trained anomaly detector.The experiments,based on dataset constructed according to TPC-E,which is a new generation of database performance evaluation standard issued by TPC (Transaction Processing Performance Council),show that the user profile and PCA-RT are fast and effective for anomaly detecting of database user access behavior.PCA algorithm reduces data during data preprocessing up to more than 35%.The accuracy and recall of PCA-RT algorithm are improved by 1.78% and 9.76% respectively.It is proved that the construction method of user profile vector and the PCA-RT algorithm are effective for anomaly detection of user access behavior in TPC-E database.

Key words: Anomaly detection, Database security, Principal component analysis, Random tree algorithm, TPC-E, User behavior profile

中图分类号: 

  • TP309
[1] IBM PONEMON INSTITUTE.2019 Cost of a data breach[EB/OL].(2019-7-22) [2019-08-01].https://www.ibm.com/security/data-breach.
[2] VERIZON R T.Data breach investigations report [EB/OL].(2019-02-15).https://enterprise.verizon.com/resources/reports/dbir/.
[3] WEI N.Anomaly detection and assessment of user behavior for database access [D].Nanjing:Southeast University,2017.
[4] CHEN D P.Intrusion detection system of database based on userbehavior of analysis and identification [D].Chengdu:University of Electronic Science and Technology of China,2015.
[5] DUAN X Q.Research on database intrusion detection based on data mining [D].Zhenjiang:Jiangsu University,2009.
[6] LI N,TRIPUNITARA M V.Security analysis in role-based access control[C]//9th ACM Symposium on Access Control Mo-dels and Technologies.New York:ACM,2004:126-135.
[7] NI Q,TROMBETTA A,BERTINO E,et al.Privacy-aware role-based access control[C]//12th ACM Symposium on Access Control Models and Technologies.New York:ACM,2007:41-50.
[8] HADDAD M,STEVOVIC J,CHIASERA A,et al.Access con-trol for data integration in presence of data dependencies[C]//19th International Conference on Database Systems for Advanced Applications.Switzerland:Springer,2014:203-217.
[9] ABITEBOUL S,BOURHIS P,VIANU V.A formal study ofcollaborative access control in distributed datalog[C]//19th International Conference on Database Theory.2016:1-17.
[10] BOSSI L,BERTINO E,HUSSAIN S R.A system for profiling and monitoring database access patterns by application programs for anomaly detection [J].IEEE Transactions on Software Engineering,2017,43(5):415-431.
[11] LEE W,STOLFO S J.Data Mining Approaches for Intrusion Detection[C]//Conference on USENIX Security Symposium.Berkeley:USENIX Association,1998:79-94.
[12] CHUNG C Y,GERTZ M,LEVITT K.DEMIDS:a misuse detection system for database systems[C]//Conference on Integrity and Internal Control Information Systems.Boston:Springer,1999:159-178.
[13] KAMRA A,TERZI E,BERTINO E.Detecting anomalous access patterns in relational databases [J].Vldb Journal,2008,17(5):1063-1077.
[14] MATHEW S,PETROPOULOS M,NGO H,et al.A data-centric approach to insider attack detection in database systems[C]//Conference on Recent advances in Intrusion Detection.Berlin:Springer,2010:382-401.
[15] SALLAM A,FADOLALKARIM D,BERTINO E,et al.Data and syntax centric anomaly detection for relational databases [J].Wiley Interdisciplinary Reviews:Data Mining and Know-ledge Discovery,2016,6(6):231-239.
[16] RONAO C A,CHO S B.Mining SQL queries to detect anomalous database access using random forest and PCA[C]//Confe-rence on Current Approaches in Applied Artificial Intelligence.Berlin:Springer,2015:151-160.
[17] HARRINGTON P.Machine learning in action[M].NewYork:Manning Publications,2012:269-272.
[18] Java Code Examples for weka.classifiers.trees.RandomTree[EB/OL].https://www.programcreek.com/java-api-examples/index.php?api=weka.classifiers.trees.RandomTree.
[19] ISLAM S M,KUZU M,KANTARCIOGLU M.A dynamic approach to detect anomalous queries on relational databases[C]//5th ACM Conference on Data and Application Security and Privacy.New York:ACM,2015:245-252.
[20] TPC Benchmark E Standard Specification Revision 1.14.0[EB/OL].[2019-02-23].http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-e_v1.14.0.pdf.
[1] 徐天慧, 郭强, 张彩明.
基于全变分比分隔距离的时序数据异常检测
Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance
计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174
[2] 李其烨, 邢红杰.
基于最大相关熵的KPCA异常检测方法
KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion
计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175
[3] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[4] 杜航原, 李铎, 王文剑.
一种面向电商网络的异常用户检测方法
Method for Abnormal Users Detection Oriented to E-commerce Network
计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092
[5] 阙华坤, 冯小峰, 刘盼龙, 郭文翀, 李健, 曾伟良, 范竞敏.
Grassberger熵随机森林在窃电行为检测的应用
Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection
计算机科学, 2022, 49(6A): 790-794. https://doi.org/10.11896/jsjkx.210800032
[6] 武玉坤, 李伟, 倪敏雅, 许志骋.
单类支持向量机融合深度自编码器的异常检测模型
Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder
计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142
[7] 冷佳旭, 谭明圮, 胡波, 高新波.
基于隐式视角转换的视频异常检测
Video Anomaly Detection Based on Implicit View Transformation
计算机科学, 2022, 49(2): 142-148. https://doi.org/10.11896/jsjkx.210900266
[8] 刘意, 毛莺池, 程杨堃, 高建, 王龙宝.
基于邻域一致性的异常检测序列集成方法
Locality and Consistency Based Sequential Ensemble Method for Outlier Detection
计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156
[9] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[10] 吴善杰, 王新.
基于AGA-DBSCAN优化的RBF神经网络构造煤厚度预测方法
Prediction of Tectonic Coal Thickness Based on AGA-DBSCAN Optimized RBF Neural Networks
计算机科学, 2021, 48(7): 308-315. https://doi.org/10.11896/jsjkx.200800110
[11] 郭奕杉, 刘漫丹.
基于时空轨迹数据的异常检测
Anomaly Detection Based on Spatial-temporal Trajectory Data
计算机科学, 2021, 48(6A): 213-219. https://doi.org/10.11896/jsjkx.201100193
[12] 邢红杰, 郝忠.
基于全局和局部判别对抗自编码器的异常检测方法
Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder
计算机科学, 2021, 48(6): 202-209. https://doi.org/10.11896/jsjkx.200400083
[13] 胡昕彤, 沙朝锋, 刘艳君.
基于随机投影和主成分分析的网络嵌入后处理算法
Post-processing Network Embedding Algorithm with Random Projection and Principal Component Analysis
计算机科学, 2021, 48(5): 124-129. https://doi.org/10.11896/jsjkx.200500058
[14] 管文华, 林春雨, 杨尚蓉, 刘美琴, 赵耀.
基于人体关节点的低头异常行人检测
Detection of Head-bowing Abnormal Pedestrians Based on Human Joint Points
计算机科学, 2021, 48(5): 163-169. https://doi.org/10.11896/jsjkx.200800214
[15] 王艺皓, 丁洪伟, 李波, 保利勇, 张颖婕.
基于聚类与特征融合的蛋白质亚细胞定位预测
Prediction of Protein Subcellular Localization Based on Clustering and Feature Fusion
计算机科学, 2021, 48(3): 206-213. https://doi.org/10.11896/jsjkx.200200081
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!