Computer Science ›› 2020, Vol. 47 ›› Issue (9): 94-98.doi: 10.11896/jsjkx.190800056

• Database & Big Data & Data Science • Previous Articles     Next Articles

Database Anomaly Access Detection Based on Principal Component Analysis and Random Tree

FENG An-ran1,2, WANG Xu-ren1,2, WANG Qiu-yun2, XIONG Meng-bo1,2   

  1. 1 College of Information Engineering,Capital Normal University,Beijing 100048,China
    2 Key Laboratory of Network Assessment Technology,Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
  • Received:2019-08-13 Published:2020-09-10
  • About author:FENG An-ran,born in 1993,master.Her main research interests include data mining,cyber security and database security.
    WANG Xu-ren,born in 1972,postgra-duate,Ph.D,vice professor.Her main research interests include data mining,cyber security and database security.
  • Supported by:
    science and technology project of State Grid Corporation of China (5700-201972227A-0-0-00).

Abstract: As a platform for data storage and interaction,database contains confidential and important information,making it a target of malicious personnel attacks.To prevent attacks from outsiders,database administrators can limit unauthorized user access through role-based access control system,while masquerade attacks from insiders are often less noticeable.Therefore,the research on database anomaly detection based on user behavior have important practical application value.A user anomaly detection algorithm PCA-RT based on Principal Component Analysis (PCA) and Random Tree (RT) is proposed for the anomaly detection of database user access behavior.Firstly,users’ profile is constructed according to the characteristics of the query submitted by the users,then the principal component analysis is applied to reducing the dimension of the users’ profile and feature selection.Finally,random tree has trained anomaly detector.The experiments,based on dataset constructed according to TPC-E,which is a new generation of database performance evaluation standard issued by TPC (Transaction Processing Performance Council),show that the user profile and PCA-RT are fast and effective for anomaly detecting of database user access behavior.PCA algorithm reduces data during data preprocessing up to more than 35%.The accuracy and recall of PCA-RT algorithm are improved by 1.78% and 9.76% respectively.It is proved that the construction method of user profile vector and the PCA-RT algorithm are effective for anomaly detection of user access behavior in TPC-E database.

Key words: Anomaly detection, Database security, Principal component analysis, Random tree algorithm, TPC-E, User behavior profile

CLC Number: 

  • TP309
[1] IBM PONEMON INSTITUTE.2019 Cost of a data breach[EB/OL].(2019-7-22) [2019-08-01].https://www.ibm.com/security/data-breach.
[2] VERIZON R T.Data breach investigations report [EB/OL].(2019-02-15).https://enterprise.verizon.com/resources/reports/dbir/.
[3] WEI N.Anomaly detection and assessment of user behavior for database access [D].Nanjing:Southeast University,2017.
[4] CHEN D P.Intrusion detection system of database based on userbehavior of analysis and identification [D].Chengdu:University of Electronic Science and Technology of China,2015.
[5] DUAN X Q.Research on database intrusion detection based on data mining [D].Zhenjiang:Jiangsu University,2009.
[6] LI N,TRIPUNITARA M V.Security analysis in role-based access control[C]//9th ACM Symposium on Access Control Mo-dels and Technologies.New York:ACM,2004:126-135.
[7] NI Q,TROMBETTA A,BERTINO E,et al.Privacy-aware role-based access control[C]//12th ACM Symposium on Access Control Models and Technologies.New York:ACM,2007:41-50.
[8] HADDAD M,STEVOVIC J,CHIASERA A,et al.Access con-trol for data integration in presence of data dependencies[C]//19th International Conference on Database Systems for Advanced Applications.Switzerland:Springer,2014:203-217.
[9] ABITEBOUL S,BOURHIS P,VIANU V.A formal study ofcollaborative access control in distributed datalog[C]//19th International Conference on Database Theory.2016:1-17.
[10] BOSSI L,BERTINO E,HUSSAIN S R.A system for profiling and monitoring database access patterns by application programs for anomaly detection [J].IEEE Transactions on Software Engineering,2017,43(5):415-431.
[11] LEE W,STOLFO S J.Data Mining Approaches for Intrusion Detection[C]//Conference on USENIX Security Symposium.Berkeley:USENIX Association,1998:79-94.
[12] CHUNG C Y,GERTZ M,LEVITT K.DEMIDS:a misuse detection system for database systems[C]//Conference on Integrity and Internal Control Information Systems.Boston:Springer,1999:159-178.
[13] KAMRA A,TERZI E,BERTINO E.Detecting anomalous access patterns in relational databases [J].Vldb Journal,2008,17(5):1063-1077.
[14] MATHEW S,PETROPOULOS M,NGO H,et al.A data-centric approach to insider attack detection in database systems[C]//Conference on Recent advances in Intrusion Detection.Berlin:Springer,2010:382-401.
[15] SALLAM A,FADOLALKARIM D,BERTINO E,et al.Data and syntax centric anomaly detection for relational databases [J].Wiley Interdisciplinary Reviews:Data Mining and Know-ledge Discovery,2016,6(6):231-239.
[16] RONAO C A,CHO S B.Mining SQL queries to detect anomalous database access using random forest and PCA[C]//Confe-rence on Current Approaches in Applied Artificial Intelligence.Berlin:Springer,2015:151-160.
[17] HARRINGTON P.Machine learning in action[M].NewYork:Manning Publications,2012:269-272.
[18] Java Code Examples for weka.classifiers.trees.RandomTree[EB/OL].https://www.programcreek.com/java-api-examples/index.php?api=weka.classifiers.trees.RandomTree.
[19] ISLAM S M,KUZU M,KANTARCIOGLU M.A dynamic approach to detect anomalous queries on relational databases[C]//5th ACM Conference on Data and Application Security and Privacy.New York:ACM,2015:245-252.
[20] TPC Benchmark E Standard Specification Revision 1.14.0[EB/OL].[2019-02-23].http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-e_v1.14.0.pdf.
[1] XU Tian-hui, GUO Qiang, ZHANG Cai-ming. Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance [J]. Computer Science, 2022, 49(9): 101-110.
[2] LI Qi-ye, XING Hong-jie. KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion [J]. Computer Science, 2022, 49(8): 267-272.
[3] WANG Xin-tong, WANG Xuan, SUN Zhi-xin. Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network [J]. Computer Science, 2022, 49(8): 314-322.
[4] DU Hang-yuan, LI Duo, WANG Wen-jian. Method for Abnormal Users Detection Oriented to E-commerce Network [J]. Computer Science, 2022, 49(7): 170-178.
[5] QUE Hua-kun, FENG Xiao-feng, LIU Pan-long, GUO Wen-chong, LI Jian, ZENG Wei-liang, FAN Jing-min. Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection [J]. Computer Science, 2022, 49(6A): 790-794.
[6] SHEN Shao-peng, MA Hong-jiang, ZHANG Zhi-heng, ZHOU Xiang-bing, ZHU Chun-man, WEN Zuo-cheng. Three-way Drift Detection for State Transition Pattern on Multivariate Time Series [J]. Computer Science, 2022, 49(4): 144-151.
[7] WU Yu-kun, LI Wei, NI Min-ya, XU Zhi-cheng. Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder [J]. Computer Science, 2022, 49(3): 144-151.
[8] LENG Jia-xu, TAN Ming-pi, HU Bo, GAO Xin-bo. Video Anomaly Detection Based on Implicit View Transformation [J]. Computer Science, 2022, 49(2): 142-148.
[9] ZHANG Ye, LI Zhi-hua, WANG Chang-jie. Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method [J]. Computer Science, 2021, 48(9): 337-344.
[10] QING Lai-yun, ZHANG Jian-gong, MIAO Jun. Temporal Modeling for Online Anomaly Detection [J]. Computer Science, 2021, 48(7): 206-212.
[11] WU Shan-jie, WANG Xin. Prediction of Tectonic Coal Thickness Based on AGA-DBSCAN Optimized RBF Neural Networks [J]. Computer Science, 2021, 48(7): 308-315.
[12] GUO Yi-shan, LIU Man-dan. Anomaly Detection Based on Spatial-temporal Trajectory Data [J]. Computer Science, 2021, 48(6A): 213-219.
[13] XING Hong-jie, HAO ZhongHebei. Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder [J]. Computer Science, 2021, 48(6): 202-209.
[14] HU Xin-tong, SHA Chao-feng, LIU Yan-jun. Post-processing Network Embedding Algorithm with Random Projection and Principal Component Analysis [J]. Computer Science, 2021, 48(5): 124-129.
[15] WANG Yi-hao, DING Hong-wei, LI Bo, BAO Li-yong, ZHANG Ying-jie. Prediction of Protein Subcellular Localization Based on Clustering and Feature Fusion [J]. Computer Science, 2021, 48(3): 206-213.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!