计算机科学 ›› 2024, Vol. 51 ›› Issue (1): 327-334.doi: 10.11896/jsjkx.230100116

• 信息安全 • 上一篇    下一篇

基于样本嵌入的挖矿恶意软件检测方法

傅建明1, 姜宇谦1, 何佳2, 郑锐3, 苏日古嘎1, 彭国军1   

  1. 1 武汉大学国家网络安全学院空天信息安全与可信计算教育部重点实验室 武汉430072
    2 嵩山实验室技术中心 郑州450046
    3 河南大学计算机与信息工程学院 河南 开封475000
  • 收稿日期:2023-01-30 修回日期:2023-07-11 出版日期:2024-01-15 发布日期:2024-01-12
  • 通讯作者: 傅建明(jmfu@whu.edu.cn)
  • 基金资助:
    国家自然科学基金(61972297,62172308,62272351);国家重点研发计划(2021YFB3101201)

Cryptocurrency Mining Malware Detection Method Based on Sample Embedding

FU Jianming1, JIANG Yuqian1, HE Jia2, ZHENG Rui3, SURI Guga1, PENG Guojun1   

  1. 1 Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
    2 Technology Center of Songshan Laboratory,Zhengzhou 450046,China
    3 College of Computer and Information Engineering,Henan University,Kaifeng,Henan 475000,China
  • Received:2023-01-30 Revised:2023-07-11 Online:2024-01-15 Published:2024-01-12
  • About author:FU Jianming,born in 1969,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.07112S).His main research interests include system security and mobile security.
  • Supported by:
    National Natural Science Foundation of China(61972297,62172308,62272351) and National Key R & D Program of China(2021YFB3101201).

摘要: 加密货币挖矿恶意软件的高盈利性和匿名性,对计算机用户造成了巨大威胁和损失。为了对抗挖矿恶意软件带来的威胁,基于软件静态特征的机器学习检测器通常选取单一类型的静态特征,或者通过集成学习来融合不同种类静态特征的检测结果,忽略了不同种类静态特征之间的内在联系,其检测率有待提升。文章从挖矿恶意软件的内在层级联系出发,自下而上提取样本的基本块、控制流程图和函数调用图作为静态特征,训练三层模型以将这些特征分别嵌入向量化,并逐渐汇集从底层到高层的特征,最终输入分类器实现对挖矿恶意软件的检测。为了模拟真实环境中的检测情形,先在一个小的实验数据集上训练模型,再在另一个更大的数据集上测试模型的性能。实验结果表明,三层嵌入模型在挖矿恶意软件检测上的性能领先于近年提出的机器学习模型,在召回率和准确率上相比其他模型分别提高了7%和3%以上。

关键词: 挖矿恶意软件, 静态分析, 机器学习, 图嵌入

Abstract: Due to its high profitability and anonymity,cryptocurrency mining malware poses a great threat and loss to computer users.In order to confront the threat posed by mining malware,machine learning detectors based on software static features usually select a single type of static features,or integrate the detection results of different kinds of static features through integrated learning,ignoring the internal relationship between different kinds of static features,and its detection rate remains to be discussed.This paper starts from the internal hierarchical relationship of mining malware.It extracts basic blocks,control flow graphs and function call graphs of samples as static features,trains the three-layer model to embed these features into the vector respectively,and gradually gathers the features from the bottom to the top,and finally sends top features to the classifier to detect mining malware.To simulate the detection situation in real world,it first trains the model on a relatively smaller experimental data set,and then tests the performance of the model on another much larger data set.Experiment results show that the perfor-mance of th proposed method is much better than that of some machine learning models proposed in recent years.The recall rate and accuracy rate of three-layer-embedding model is more than 7% and 3% higher than that of other models,respectively.

Key words: Cryptocurrency mining malware, Static analysis, Machine learning, Graph embedding

中图分类号: 

  • TP311
[1]TEKINER E,ACAR A,ULUAGAC A S,et al.SoK:Crypto-jacking Malware[C]//2021 IEEE European Symposium on Security and Privacy(EuroS&P).IEEE,2021:120-139.
[2]PASTRANA S,SUAREZ-TANGIL G.A first look at the crypto-mining malware ecosystem:A decade of unrestricted wealth[C]//Proceedings of the Internet Measurement Conference.2019:73-86.
[3]360TS.Cryptominer,winstarnssmminer,has made a fortune bybrutally hijacking computers[EB/OL].[2021-12-31].https://blog.360totalsecurity.com/en/cryptominer-winstarnssmminer-made-fortune-brutally-hijacking-computer.
[4]TAHIR R,HUZAIFA M,DAS A,et al.Mining on someoneelse’s dime:Mitigating covert mining operations in clouds and enterprises[C]//International Symposium on Research in Attacks,Intrusions,and Defenses.Cham:Springer,2017:287-310.
[5]ESENTIRE I.Cryptocurrency craze drives 1,500% increase in coin-mining malware[EB/OL].[2021-12-31].https://www.esentire.com/news-releases/2018s-cryptocurrency-craze-helps-drive-1500-percent-increase-in-coinmining-malware.
[6]GRIFFTHS J.Coinminers target vulnerable users as bitcoin hits all-time high,[EB/OL].[2021-12-31].https://www.avira.com/en/blog/coinminers-target-vulnerable-users-as-bitcoin-hits-all-time-high/.
[7]YAN G.Be sensitive to your errors:Chaining neyman-pearsoncriteria for automated malware classification[C]//Proceedings of the 10th ACM Symposium on Information,Computer and Communications Security.2015:121-132.
[8]YOUSEFI-AZAR M,VARADHARAJAN V,HAMEY L,et al.Autoencoder-based feature learning for cyber security applications[C]//2017 International Joint Conference on Neural Networks(IJCNN).IEEE,2017:3854-3861.
[9]KEBEDE T M,DJANEYE-BOUNDJOU O,NARAYANAN B N,et al.Classification of malware programs using autoencoders based deep learning architecture and its application to the microsoft malware classification challenge(big 2015) dataset[C]//2017 IEEE National Aerospace and Electronics Conference(NAECON).IEEE,2017:70-75.
[10]HASSEN M,CARVALHO M M,CHAN P K.Malware classification using static analysis based features[C]//2017 IEEE Symposium Series on Computational Intelligence(SSCI).IEEE,2017:1-7.
[11]DREW J,MOORE T,HAHSLER M.Polymorphic malware detection using sequence classification methods[C]//2016 IEEE Security and Privacy Workshops(SPW).IEEE,2016:81-87.
[12]WANG Z W,LIU G Q,HAN X H,et al.Survey on Machine-learning-based Malware Identification Research[J].Journal of Chinese Computer Systems,2022,43(12):2628-2637.
[13]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[J].Communications of the ACM,2017,60(6):84-90.
[14]DING Y X,ZHU S Y.Malware detection based on deep learning algorithm[J].Neural Computing and Applications,2019,31(2):461-472.
[15]RAFF E,BARKER J,SYLVESTER J,et al.Malware detection by eating a whole exe[C]//Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence.2018.
[16]YAZDINEJAD A,HADDADPAJOUH H,DEHGHANTANHAA,et al.Cryptocurrency malware hunting:A deep recurrent neural network approach[J].Applied Soft Computing,2020,96:106630.
[17]YAN J,YAN G,JIN D.Classifying malware represented as control flow graphs using deep graph convolutional neural network[C]//2019 49th annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN).IEEE,2019:52-63.
[18]LE Q,BOYDELL O,MAC NAMEE B,et al.Deep learning at the shallow end:Malware classification for non-domain experts[J].Digital Investigation,2018,26:S118-S126.
[19]AZEEZ N A,ODUFUWA O E,MISRA S,et al.Windows PE malware detection using ensemble learning[J].Informatics,2021,8(1):1-22.
[20]YU Z,CAO R,TANG Q,et al.Order matters:semantic-aware neural networks for binary code similarity detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:1145-1152.
[21]AHMADI M,ULYANOV D,SEMENOV S,et al.Novel feature extraction,selection and fusion for effective malware family classification[C]//Proceedings of the sixth ACM Conference on Data and Application Security and Privacy.2016:183-194.
[22]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross-platform binary code similarity detection[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:363-376.
[23]HASSEN M,CHAN P K.Scalable function call graph-basedmalware classification[C]//Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy.2017:239-248.
[24]“pre-trained PalmTree model” [EB/OL].[2022-03-31].https://drive.google.com/file/d/1yC3M-kVTFWql6hCgM_QCbKtc1PbdVdvp/view/.
[25]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[26]YING C,CAI T,LUO S,et al.Do Transformers Really Perform Badly for Graph Representation?[J].arXiv:2106.05234,2021.
[27]“DataCon” [EB/OL].[2021-12-31].https://datacon.qianxin.com/opendata/maliciouscode.
[28]MASSARELLI L,LUNA G A D,PETRONI F,et al.Safe:Self-attentive function embeddings for binary similarity[C]//International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment.Cham:Springer,2019:309-329.
[29]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross-platform binary code similarity detection[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:363-376.
[30]ZUO F,LI X,YOUNG P,et al.Neural machine translation inspired binary code similarity comparison beyond function pairs[J].arXiv:1808.04706,2018.
[31]DING S H H,FUNG B C M,CHARLAND P.Asm2vec:Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization[C]//2019 IEEE Symposium on Security and Privacy(SP).IEEE,2019:472-489.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!