计算机科学 ›› 2018, Vol. 45 ›› Issue (3): 124-130.doi: 10.11896/j.issn.1002-137X.2018.03.020

• 信息安全 • 上一篇    下一篇

基于威胁情报平台的恶意URL检测研究

汪鑫,武杨,卢志刚   

  1. 中国科学院信息工程研究所 北京100093;中国科学院大学网络空间安全学院 北京100093,中国科学院信息工程研究所 北京100093,中国科学院信息工程研究所 北京100093
  • 出版日期:2018-03-15 发布日期:2018-11-13
  • 基金资助:
    本文受中科院基金项目(Y5X0071116),中国科学院网络测评技术重点实验室,网络安全防护技术北京市重点实验室资助

Study on Malicious URL Detection Based on Threat Intelligence Platform

WANG Xin, WU Yang and LU Zhi-gang   

  • Online:2018-03-15 Published:2018-11-13

摘要: 互联网应用已经渗透到人们日常生活的方方面面,恶意URL防不胜防,给人们的财产和隐私带来了严重威胁。当前主流的防御方法主要依靠黑名单机制, 难以检测 黑名单以外的URL。因此,引入机器学习来优化恶意URL检测是一个主要的研究方向,但其主要受限于URL的短文本特性,导致提取的特征单一,从而使得检测效果较差。针对上述挑战,设计了一个基于威胁情报平台的恶意URL检测系统。该系统针对URL字符串提取了结构特征、情报特征和敏感词特征3类特征来训练分类器,然后采用多分类器投票机制来判断类别,并实现威胁情报的自动更新。实验结果表明,该方法对恶意URL进行检测 的准确率 达到了96%以上。

关键词: 恶意URL,威胁情报,分类器,投票机制

Abstract: With Internet penetrating into daily life,it is hard to prevent ubiquitous malicious URLs,threatening the properties and privacies of people seriously.Traditional method to detect malicious URL relies on blacklist mechanism,but it can do nothing with the malicious URLs which are not in the list.Therefore,one of the fundamental directions is bringing in machine learning to optimize the malicious URL detection.However,the results of most existing solutions are not satisfying,as the characteristics of URL short text make it extract a single feature.To address those problems above,this paper designed a novel system to detect malicious URLs based on threat intelligence platform.The system extracts structural features,intelligence features and sensitive lexical features to train classifiers.Next,the voting me-chanism with results of multiple classifiers is exploited to determine the type of URLs.Finally,the threat intelligence can be updated automatically.The experimental results show that the method for detecting malicious URL has good de-tection effect,and is capable of achieving classification accuracy up to 96%.

Key words: Malicious URL,Threat intelligence,Classifier,Voting mechanism

[1] CNNIC.Statistical Report on Internet Development in China[EB/OL].http://www.cnnic.net.cn/hlwfzyj..
[2] Kaspersky Lab.KASPERSKY SECURITY BULLETIN 2015[EB/OL].http://www.gartner.com/doc/2487216/definition-threat-intelligence.
[3] RAHBARINIA B,BALDUZZI M,PERDISCI R.Real-Time Detection of Malware Downloads via Large-Scale URL-> File->Machine Graph Mining[C]∥Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security.ACM,2016:783-794.
[4] ZHOU Z,SONG T,JIA Y.A high-performance url lookup engine for url filtering systems[C]∥2010 IEEE International Conference on Communications (ICC).IEEE,2010:1-5.
[5] PRIYA M,SANDHYA L,THOMAS C.A static approach to detect drive-by-download attacks on webpages[C]∥2013 International Conference on Control Communication and Computing (ICCC).IEEE,2013:298-303.
[6] HEYMANN P,KOUTRIKA G,GARCIA-MOLINA H.Fighting spam on social web sites:A survey of approaches and future challenges[J].IEEE Internet Computing,2007,11(6):36-45.
[7] SHA H Z,LIU Q Y,LIU T W,et al.Survey on Malicious Webpage Detection Research[J].Chinese Journal of Computers,2016,39(3):529-542.(in Chinese) 沙泓州,刘庆云,柳厅文,等.恶意网页识别研究综述[J].计算机学报,2016,39(3):529-542.
[8] LIANG B,HUANG J,LIU F,et al.Malicious Web Pages Detection Based on Abnormal Visibility Recognition[C]∥2009 International Conference on E-Business and Information System Security.IEEE,2009:1-5.
[9] LI Z,ALRWAIS S,XIE Y,et al.Finding the linchpins of thedark web:a study on topologically dedicated hosts on malicious web infrastructures[C]∥2013 IEEE Symposium on Security and Privacy (SP).IEEE,2013:112-126.
[10] GARERA S,PROVOS N,CHEW M,et al.A framework for detection and measurement of phishing attacks[C]∥Proceedings of the 2007 ACM Workshop on Recurring Malcode.ACM,2007:1-8.
[11] MA J,SAUL L K,SAVAGE S,et al.Identifying suspiciousURLs:an application of large-scale online learning[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.ACM,2009:681-688.
[12] MA J,SAUL L K,SAVAGE S,et al.Beyond blacklists:learning to detect malicious web sites from suspicious URLs[C]∥Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2009:1245-1254.
[13] MA J,KULESZA A,DREDZE M,et al.Exploiting Feature Covariance in High-Dimensional Online Learning[C]∥AISTATS.2010:493-500.
[14] LIN H L,LI Y,WANG W P,et al.Efficient segment pattern based method for malicious URL detection[J].Journal on Communications,2015,36(Z1):141-148.(in Chinese) 林海伦,李焱,王伟平,等.高效的基于段模式的恶意 URL 检测方法[J].通信学报,2015,36(Z1):141-148.
[15] YANG Z M,LI Q,LIU J R,et al.Research of Threat Intelligence Sharing and Using for Cyber Attack Attribution[J].Journal of Information Securyity Research,2015,1(1):31-36.(in Chinese) 杨泽明,李强,刘俊荣,等.面向攻击溯源的威胁情报共享利用研究[J].信息安全研究,2015,1(1):31-36.
[16] SAMTANI S,CHINN K,LARSON C,et al.AZSecure Hacker Assets Portal:Cyber threat intelligence and malware analysis[C]∥2016 IEEE Conference on Intelligence and Security Informatics (ISI).IEEE,2016:19-24.
[17] AHREND J M,JIROTKA M,JONES K.On the collaborative practices of cyber threat intelligence analysts to develop and utilize tacit Threat and Defence Knowledge[C]∥2016 InternationalConference on Cyber Situational Awareness,Data Analytics And Assessment (CyberSA).IEEE,2016:1-10.
[18] DAI W,JI W.A mapreduce implementation of C4.5 decision tree algorithm[J].International Journal of Database Theory and Application,2014,7(1):49-60.
[19] PATIL T R,SHEREKAR S S.Performance analysis of Naive Bayes and J48 classification algorithm for data classification[J].International Journal of Computer Science and Applications,2013,6(2):256-261.
[20] PAN W,CHEN G.A method of off-line signature verificationfor digital forensics[C]∥2016 12th International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).IEEE,2016:488-493.
[21] VLADIMIR V N,VAPNIK V.The nature of statistical learning theory[M].New York:Springer-verlag,1995:988-999.
[22] CRAMMER K,DREDZE M,PEREIRA F.Exact convex confidence-weighted learning[C]∥Advances in Neural Information Processing Systems.2009:345-352.
[23] HOI S C H,WANG J,ZHAO P.LIBOL:A Library for Online Learning Algorithms[J].Journal of Machine Learning Research,2014,15(1):495-499.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!