计算机科学 ›› 2022, Vol. 49 ›› Issue (12): 353-361.doi: 10.11896/jsjkx.211000059

• 信息安全 • 上一篇    下一篇

基于语义导向的软件在线升级功能逆向定位

吕小少, 舒辉, 康绯, 黄宇垚   

  1. 信息工程大学数学工程与先进计算国家重点实验室 郑州450001
  • 收稿日期:2021-10-10 修回日期:2022-04-23 发布日期:2022-12-14
  • 通讯作者: 舒辉(shuhui123@126.com)
  • 作者简介:(289541163@qq.com)
  • 基金资助:
    国家重点研发计划“前沿科技创新专项”(2019QY1305)

Reverse Location of Software Online Upgrade Function Based on Semantic Guidance

LYU Xiao-shao, SHU Hui, KANG Fei, HUANG Yu-yao   

  1. State Key Laboratory of Mathematical Engineering and Advanced Computing,Information Engineering University,Zhenzhou 450001,China
  • Received:2021-10-10 Revised:2022-04-23 Published:2022-12-14
  • About author:LYU Xiao-shao,born in 1989,postgra-duate.His main research interests include cyber security and reverse engineering.SHU Hui,born in 1974,Ph.D,professor,Ph.D supervisor.His main research interests include cyber security and reverse engineering.
  • Supported by:
    National Key R & D Program of China(2019QY1305).

摘要: 针对软件在线升级的劫持攻击是网络攻击最常用的手段之一。程序分析是快速自动化评估软件升级安全的重要方法,软件中升级功能函数快速逆向定位是实现静态分析和提高动态分析效率的关键前提。传统的程序分析逆向定位,依靠人工经验,根据字符串、API函数等语义信息的交叉引用链关系来实现,效率较低,且无法实现自动化。为解决该问题,提出了一种基于语义分析与逆向分析相结合的软件升级功能定位方法。首先针对软件二进制程序中常见的语义信息(如字符串、函数名、API函数等),建立一个基于自然语言处理的升级语义分类模型;然后借助逆向分析工具提取软件的语义信息,并通过升级语义分类模型来识别升级语义信息;最后定义了一种函数关系调用图形树上的升级函数关键节点求解算法,对升级函数进行求解。文中设计并实现了一个软件在线升级功能定位原型系统,并对常用的153款软件实施了升级功能逆向定位分析,其中126款软件定位成功。通过定位分析初步评估部分软件升级的安全性,获得CNNVD编号漏洞1个,CNVD编号漏洞5个。

关键词: 软件在线升级, 语义信息, 文本分类模型, 二进制程序逆向分析, 函数定位

Abstract: The hijacking attack for software online upgrade is one of the most common methods of network attack.Program ana-lysis is an important method to evaluate the security of software upgrade quickly and automatically.Rapid reverse positioning of upgrade functions in software is a key premise to realize static analysis and improve the efficiency of dynamic analysis.Traditional program analysis reverse localization relies on manual experience based on the cross reference chain relation of semantic information,such as string and API function,which is inefficient and cannot be automated.To solve this problem,this paper proposes a software upgrade function localization method based on semantic analysis and reverse analysis.Firstly,an upgrade semantic classification model based on natural language processing is established for common semantic information(string,function name,API function,etc.) in software binary program.Secondly,the software semantic information is extracted by reverse analysis tool,and the upgrade semantic classification model is used to identify the upgrade semantic information.Finally,an algorithm is defined to solve the key nodes of the upgrade function in the graph tree of function call relationship.This paper designs and implements a software online upgrade positioning system,and carries out reverse positioning analysis on 153 commonly used softwares,126 of which are successfully located.The security of some software upgrades is preliminarily evaluated by positioning analysis,and one CNNVD vulnerability and five CNVD vulnerabilities are found.

Key words: Software online update, Semantic information, Text classification model, Binary program reverse analysis, Function positioning

中图分类号: 

  • TP393
[1]ZHANG J,ZHANG C,XUAN J F,et al.Research Progress of Program Analysis[J].Journal of Software,2019,30(1):80-109.
[2]FU J M,LIU G,LI P W,et al.A security analysis method for antivirus software upgrade process [J].Journal of Wuhan University(Science Edition),2015(12):509-516.
[3]TENG J H,GUANG Y,SHU H,et al.Automatic DetectionMethod of Software Upgrade Vulnerability Based on Traffic Analysis[J].Journal of Network and Information Security,2020,6(1):94-108.
[4]CIFUENTES C,MIKE V.Recovery of jump table case statements from binary code[J].Science of Computer Programming,1999,40(10):171-188.
[5]KINDER J.Static Analysis of x86 Executables[D].Darmstadt:Technische Universitat Darmstadt,2010.
[6]KINDER J,VEITH H.JAKSTAB.A Static Analysis Platform for Binaries[C]//Proceedings of the 20th International Confe-rence on Computer Aided Verification.2008:423-427.
[7]CHUA Z,SHEN S,SAXENA P,et al.Neural nets can learn function type signatures from binaries[C]//Proceedings of the USENIX Security.2017:99-116.
[8]WEI Y,ZONG P,CHEN K,et al.SemFuzz:Semantics-basedAutomatic Generation of Proof-of-Concept Exploits[C]//ACM SIGSAC Conference on Computer and Communications Security.2017:2139-2154.
[9]BIAN P,LIANG B,HUANG J,et al.SinkFinder:Harvesting Hundreds of Unknown Interesting Function Pairs with Just One Seed[C]//Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2020:1101-1113.
[10]HU Y,WANG H,ZHANG Y,et al.A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison[J].IEEE Transactions on Software Engineering,2021(6):1241-1258.
[11]NAN Y H,YANG Z,WANG X,et al.Finding Clues for Your Secrets:Semantics-Driven,Learning-Based Privacy Discovery in Mobile Apps[C]//Proceedings of the 24th Annual Network and Distributed System Security Symposium.2018.
[12]DUAN G.Encryption and decryption [M].Publishing House of Electronics Industry.2018:65-94.
[13]DEREK A.Wordninja[EB/OL].https://github.com/kered-son/wordninja.
[14]NLTK.NLTK Document[EB/OL].http://www.nltk.org/.
[15]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//Proceedings of the International Conference on Learning Representations(ICLR 2013).2013:1-12.
[16]MIKOLOV T,SUTSKEVER I,CHEN K,et al.DistributedRepresentations of Words and Phrases and Their Compositiona-lity[C]//Proceedings of the Advances in Neural Information Processing Systems.2013:3111-3119.
[17]CHARLES E,KEITH N.Learning Classifiers from Only Positive and Unlabeled Data[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2008:213-220.
[18]OLIVIER C,BERNHARD S,ALEXANDER Z.Semi-Supervised Learning[J].IEEE Transactions on Neural Networks,2009,20(3):542-542.
[19]ZHOU Z H,LI M.Semi-Supervised Learning by Disagreement[J].Knowledge and Information Systems,2010,24(3):415-439.
[20]CHARLES E,KEITH N.Learning Classifiers from Only Positive and Unlabeled Data[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2008:213-220.
[21]KABOUTARI A,BAGHERZADEH J.An Evaluation of Two-Step Techniques for Positive-Unlabeled Learning in Text Classification[J].International Journal of Computer Applications Technology and Research,2014,3(9):592-594.
[22]MARTHINUS C,GANG N,MASASHI S.Analysis of Learning from Positive and Unlabeled Data[C]//Advances in Neural Information Processing Systems.2014:703-711.
[23]HWANJO Y,JIAWEI H,CHANG K.PEBL:Web page classification without negative examples[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(1):70-81.
[24]FLARE.IDA Pro Script Series:Automating Function Argument Extraction[EB/OL].https://www.fireeye.com/blog/threat-research/2015/11/flare_ida_pro_script.html.
[1] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[2] 郭亮, 杨兴耀, 于炯, 韩晨, 黄仲浩.
基于注意力机制和门控网络相结合的混合推荐系统
Hybrid Recommender System Based on Attention Mechanisms and Gating Network
计算机科学, 2022, 49(6): 158-164. https://doi.org/10.11896/jsjkx.210500013
[3] 潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松.
基于交互注意力图卷积网络的方面情感分类
Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification
计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180
[4] 邵海琳, 季怡, 刘纯平, 徐云龙.
基于增强特征金字塔网络的场景文本检测算法
Scene Text Detection Algorithm Based on Enhanced Feature Pyramid Network
计算机科学, 2022, 49(2): 248-255. https://doi.org/10.11896/jsjkx.201100072
[5] 程华龄, 陈艳平, 杨卫哲, 秦永彬, 黄瑞章.
基于多维语义映射的关系抽取方法研究
Relation Extraction Based on Multidimensional Semantic Mapping
计算机科学, 2022, 49(11): 206-211. https://doi.org/10.11896/jsjkx.210900120
[6] 吴兰, 王涵, 李斌全.
基于自监督任务最优选择的无监督域自适应方法
Unsupervised Domain Adaptive Method Based on Optimal Selection of Self-supervised Tasks
计算机科学, 2021, 48(6A): 357-363. https://doi.org/10.11896/jsjkx.201000030
[7] 蒋宗礼, 李苗苗, 张津丽.
基于融合元路径图卷积的异质网络表示学习
Graph Convolution of Fusion Meta-path Based Heterogeneous Network Representation Learning
计算机科学, 2020, 47(7): 231-235. https://doi.org/10.11896/jsjkx.190600085
[8] 霍丹, 张生杰, 万路军.
基于上下文的情感词向量混合模型
Context-based Emotional Word Vector Hybrid Model
计算机科学, 2020, 47(11A): 28-34. https://doi.org/10.11896/jsjkx.191100114
[9] 卢海川, 符海东, 刘宇.
基于CAN的地理语义数据存储与检索机制
Geo-semantic Data Storage and Retrieval Mechanism Based on CAN
计算机科学, 2019, 46(2): 171-177. https://doi.org/10.11896/j.issn.1002-137X.2019.02.027
[10] 尤红桃,张延园,林奕,刘胜.
基于语义信息的存储能效的研究
Based on the Semantic Information of the Stored Energy Efficiency Research
计算机科学, 2013, 40(Z6): 112-114.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!