计算机科学 ›› 2019, Vol. 46 ›› Issue (5): 92-99.doi: 10.11896/j.issn.1002-137X.2019.05.014
李翼宏, 刘方正, 杜镇宇
LI Yi-hong, LIU Fang-zheng, DU Zhen-yu
摘要: 传统的恶意代码检测技术依赖于大量的已标记样本,然而新出现的恶意代码的标记数量往往较少,使得传统的机器学习检测方法难以取得较好的检测效果。针对该问题,研究了一种改进主动学习的恶意代码检测算法,提出了基于最大距离(Maximum Distance)的样本选择策略和基于最小估计风险(Minimum Risk Estimate)的样本标记策略,实现了已标记样本较少情况下的恶意代码检测。实验结果显示,相比于未使用主动学习的方法,该算法的总体检测效果更好,在已标记样本数量占比为10%的情况下,其比随机选择策略的主动学习的效果更好,在时间性能上比人工标记策略的主动学习效果更好。
中图分类号:
[1]LIU J,SU P R,YANG M,et al.Software and Cyber Security-A Survey [J].Journal of Software,2018,29(1):42-68.(in Chinese)刘剑,苏璞睿,杨珉,等.软件与网络安全研究综述[J].软件学报,2018,29(1):42-68. [2]TONG S,CHANG E.Support vector machine active learning for image retrieval[C]∥Proceedings of the 9th ACM International Conference on Multimedia.New York:ACM,2001:107-118. [3]TONG S,KOLLER D.Support vector machine active learning with applications to text classiflcation[J].The Journal of Machine Learning Research,2002,2(1):999-1006. [4]CHEN Y D,WANG T,CHEN H W.Combining Semi-Super-vised Learning and Active Learning for Shallow Semantic Parsing[J].Journal of Chinese Information Processing,2008,22(2):70-75.(in Chinese)陈耀东,王挺,陈火旺.半监督学习和主动学习相结合的浅层语义分析[J].中文信息学报,2008,22(2):70-75. [5]JOACHIMS T.Transductive Inference for Text Classification using Support Vector Machines[C]∥Sixteenth International Conference on Machine Learning.Morgan Kaufmann Publishers Inc.,1999:200-209. [6]SEUNG H S,OPPER M,SOMPOLINSKY H.Query By Committee[C]∥Proceedings of the 15th Annual ACM Workshop on Computational Learning Theory.California:ACM,1992:287-294. [7]FREUND Y,SEUNG H S,SAMIR E,et al.Selective Sampling Using the Query By Committee Algorithm[J].Machine Lear-ning,1997,28(23):133-168. [8]MAO W X,CAI Z M,TONG L.Malware Detection MethodBased on Active Learning [J].Journal of Software,2017,28(2):384-397.(in Chinese)毛蔚轩,蔡忠闽,童力.一种基于主动学习的恶意代码检测方法[J].软件学报,2017,28(2):384-397. [9]MANKU G S,JAIN A,SARMA A D.Detecting near-duplicates for web crawling[C]∥Proceeding of the 16th International Conference on World Wide Web.USA:ACM Press,2007:141-149. [10]ZHENG Y,WANG Y J,XUE Z.Android Malware Detection of Calls Tracing with Android Manifest and API[J].Journal of Computer Research and Development,2017(3):126-130.(in Chinese)郑尧,王轶骏,薛质.通过Android Manifest和API调用追踪的恶意检测[J].计算机技术与发展,2017(3):126-130. [11]DUAN X Y.Research on the Malware Detection Based on Windows API Call Behavior[D].Chengdu:Southwest Jiaotong University,2016.(in Chinese)段晓云.基于Windows API调用行为的恶意软件检测研究[D].成都:西南交通大学,2016. [12]ZHANG H J.Text Similarity Computing Based on HammingDistance[J].Computer Engineering and Applications,2001,37(19):21-22.(in Chinese)张焕炯.基于汉明距离的文本相似度计算[J].计算机工程与应用,2001,37(19):21-22. [13]LIU D Y,QIU W J.Active Learning for Multi-label Classification Based on SVM’s Expect Margin[J].Computer Science,2011,38(4):230-232.(in Chinese)刘端阳,邱卫杰.基于SVM期望间隔的多标签分类的主动学习[J].计算机科学,2011,38(4):230-232. [14]GOKHAN T,DILEK H,ROBERT E.Combining active andsemi-supervised learning for spoken language understanding.Speech Communication,2005,45(2):171-186. [15]LI Z Y.A Automatic Detection Method of Malware Behavior Based on Sandbox[D].Wuhan:Huazhong University of Science and Technology,2015.(in Chinese)李志勇.基于沙箱技术的恶意代码行为自动化检测方法[D].武汉:华中科技大学,2015. |
[1] | 胡安祥, 尹小康, 朱肖雅, 刘胜利. 基于数据流特征的比较类函数识别方法 Strcmp-like Function Identification Method Based on Data Flow Feature Matching 计算机科学, 2022, 49(9): 326-332. https://doi.org/10.11896/jsjkx.220200163 |
[2] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[3] | 陈晶, 吴玲玲. 多源异构环境下的车联网大数据混合属性特征检测方法 Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment 计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273 |
[4] | 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094 |
[5] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[6] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[7] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[8] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[9] | 黄觉, 周春来. 基于本地化差分隐私的频率特征提取 Frequency Feature Extraction Based on Localized Differential Privacy 计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229 |
[10] | 帅剑波, 王金策, 黄飞虎, 彭舰. 基于神经架构搜索的点击率预测模型 Click-Through Rate Prediction Model Based on Neural Architecture Search 计算机科学, 2022, 49(7): 10-17. https://doi.org/10.11896/jsjkx.210600009 |
[11] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[12] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[13] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[14] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[15] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
|