计算机科学 ›› 2022, Vol. 49 ›› Issue (12): 353-361.doi: 10.11896/jsjkx.211000059
吕小少, 舒辉, 康绯, 黄宇垚
LYU Xiao-shao, SHU Hui, KANG Fei, HUANG Yu-yao
摘要: 针对软件在线升级的劫持攻击是网络攻击最常用的手段之一。程序分析是快速自动化评估软件升级安全的重要方法,软件中升级功能函数快速逆向定位是实现静态分析和提高动态分析效率的关键前提。传统的程序分析逆向定位,依靠人工经验,根据字符串、API函数等语义信息的交叉引用链关系来实现,效率较低,且无法实现自动化。为解决该问题,提出了一种基于语义分析与逆向分析相结合的软件升级功能定位方法。首先针对软件二进制程序中常见的语义信息(如字符串、函数名、API函数等),建立一个基于自然语言处理的升级语义分类模型;然后借助逆向分析工具提取软件的语义信息,并通过升级语义分类模型来识别升级语义信息;最后定义了一种函数关系调用图形树上的升级函数关键节点求解算法,对升级函数进行求解。文中设计并实现了一个软件在线升级功能定位原型系统,并对常用的153款软件实施了升级功能逆向定位分析,其中126款软件定位成功。通过定位分析初步评估部分软件升级的安全性,获得CNNVD编号漏洞1个,CNVD编号漏洞5个。
中图分类号:
[1]ZHANG J,ZHANG C,XUAN J F,et al.Research Progress of Program Analysis[J].Journal of Software,2019,30(1):80-109. [2]FU J M,LIU G,LI P W,et al.A security analysis method for antivirus software upgrade process [J].Journal of Wuhan University(Science Edition),2015(12):509-516. [3]TENG J H,GUANG Y,SHU H,et al.Automatic DetectionMethod of Software Upgrade Vulnerability Based on Traffic Analysis[J].Journal of Network and Information Security,2020,6(1):94-108. [4]CIFUENTES C,MIKE V.Recovery of jump table case statements from binary code[J].Science of Computer Programming,1999,40(10):171-188. [5]KINDER J.Static Analysis of x86 Executables[D].Darmstadt:Technische Universitat Darmstadt,2010. [6]KINDER J,VEITH H.JAKSTAB.A Static Analysis Platform for Binaries[C]//Proceedings of the 20th International Confe-rence on Computer Aided Verification.2008:423-427. [7]CHUA Z,SHEN S,SAXENA P,et al.Neural nets can learn function type signatures from binaries[C]//Proceedings of the USENIX Security.2017:99-116. [8]WEI Y,ZONG P,CHEN K,et al.SemFuzz:Semantics-basedAutomatic Generation of Proof-of-Concept Exploits[C]//ACM SIGSAC Conference on Computer and Communications Security.2017:2139-2154. [9]BIAN P,LIANG B,HUANG J,et al.SinkFinder:Harvesting Hundreds of Unknown Interesting Function Pairs with Just One Seed[C]//Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2020:1101-1113. [10]HU Y,WANG H,ZHANG Y,et al.A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison[J].IEEE Transactions on Software Engineering,2021(6):1241-1258. [11]NAN Y H,YANG Z,WANG X,et al.Finding Clues for Your Secrets:Semantics-Driven,Learning-Based Privacy Discovery in Mobile Apps[C]//Proceedings of the 24th Annual Network and Distributed System Security Symposium.2018. [12]DUAN G.Encryption and decryption [M].Publishing House of Electronics Industry.2018:65-94. [13]DEREK A.Wordninja[EB/OL].https://github.com/kered-son/wordninja. [14]NLTK.NLTK Document[EB/OL].http://www.nltk.org/. [15]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//Proceedings of the International Conference on Learning Representations(ICLR 2013).2013:1-12. [16]MIKOLOV T,SUTSKEVER I,CHEN K,et al.DistributedRepresentations of Words and Phrases and Their Compositiona-lity[C]//Proceedings of the Advances in Neural Information Processing Systems.2013:3111-3119. [17]CHARLES E,KEITH N.Learning Classifiers from Only Positive and Unlabeled Data[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2008:213-220. [18]OLIVIER C,BERNHARD S,ALEXANDER Z.Semi-Supervised Learning[J].IEEE Transactions on Neural Networks,2009,20(3):542-542. [19]ZHOU Z H,LI M.Semi-Supervised Learning by Disagreement[J].Knowledge and Information Systems,2010,24(3):415-439. [20]CHARLES E,KEITH N.Learning Classifiers from Only Positive and Unlabeled Data[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2008:213-220. [21]KABOUTARI A,BAGHERZADEH J.An Evaluation of Two-Step Techniques for Positive-Unlabeled Learning in Text Classification[J].International Journal of Computer Applications Technology and Research,2014,3(9):592-594. [22]MARTHINUS C,GANG N,MASASHI S.Analysis of Learning from Positive and Unlabeled Data[C]//Advances in Neural Information Processing Systems.2014:703-711. [23]HWANJO Y,JIAWEI H,CHANG K.PEBL:Web page classification without negative examples[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(1):70-81. [24]FLARE.IDA Pro Script Series:Automating Function Argument Extraction[EB/OL].https://www.fireeye.com/blog/threat-research/2015/11/flare_ida_pro_script.html. |
[1] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[2] | 郭亮, 杨兴耀, 于炯, 韩晨, 黄仲浩. 基于注意力机制和门控网络相结合的混合推荐系统 Hybrid Recommender System Based on Attention Mechanisms and Gating Network 计算机科学, 2022, 49(6): 158-164. https://doi.org/10.11896/jsjkx.210500013 |
[3] | 潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松. 基于交互注意力图卷积网络的方面情感分类 Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification 计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180 |
[4] | 邵海琳, 季怡, 刘纯平, 徐云龙. 基于增强特征金字塔网络的场景文本检测算法 Scene Text Detection Algorithm Based on Enhanced Feature Pyramid Network 计算机科学, 2022, 49(2): 248-255. https://doi.org/10.11896/jsjkx.201100072 |
[5] | 程华龄, 陈艳平, 杨卫哲, 秦永彬, 黄瑞章. 基于多维语义映射的关系抽取方法研究 Relation Extraction Based on Multidimensional Semantic Mapping 计算机科学, 2022, 49(11): 206-211. https://doi.org/10.11896/jsjkx.210900120 |
[6] | 吴兰, 王涵, 李斌全. 基于自监督任务最优选择的无监督域自适应方法 Unsupervised Domain Adaptive Method Based on Optimal Selection of Self-supervised Tasks 计算机科学, 2021, 48(6A): 357-363. https://doi.org/10.11896/jsjkx.201000030 |
[7] | 蒋宗礼, 李苗苗, 张津丽. 基于融合元路径图卷积的异质网络表示学习 Graph Convolution of Fusion Meta-path Based Heterogeneous Network Representation Learning 计算机科学, 2020, 47(7): 231-235. https://doi.org/10.11896/jsjkx.190600085 |
[8] | 霍丹, 张生杰, 万路军. 基于上下文的情感词向量混合模型 Context-based Emotional Word Vector Hybrid Model 计算机科学, 2020, 47(11A): 28-34. https://doi.org/10.11896/jsjkx.191100114 |
[9] | 卢海川, 符海东, 刘宇. 基于CAN的地理语义数据存储与检索机制 Geo-semantic Data Storage and Retrieval Mechanism Based on CAN 计算机科学, 2019, 46(2): 171-177. https://doi.org/10.11896/j.issn.1002-137X.2019.02.027 |
[10] | 尤红桃,张延园,林奕,刘胜. 基于语义信息的存储能效的研究 Based on the Semantic Information of the Stored Energy Efficiency Research 计算机科学, 2013, 40(Z6): 112-114. |
|