计算机科学 ›› 2023, Vol. 50 ›› Issue (1): 373-379.doi: 10.11896/jsjkx.211100121

• 信息安全 • 上一篇    

基于交叉指纹分析的公共组件库特征提取方法

郭威, 武泽慧, 吴茜琼, 李锡星   

  1. 信息工程大学网络空间安全学院 郑州 450001
    数学工程与先进计算国家重点实验室 郑州 450001
  • 收稿日期:2021-11-11 修回日期:2022-06-21 出版日期:2023-01-15 发布日期:2023-01-09
  • 通讯作者: 武泽慧(wuzehui2010@foxmail.com)
  • 作者简介:1037057802@qq.com
  • 基金资助:
    国家重点研发计划(2019QY0501)

Feature Extraction Method for Public Component Libraries Based on Cross-fingerprint Analysis

GUO Wei, WU Zehui, WU Qianqiong, LI Xixing   

  1. School of Cyberspace Security,University of Information Engineering,Zhengzhou 450001,China
    State Key Laboratory of Mathematical Engineering and Advanced Computing,Information Engineering University,Zhengzhou 450001,China
  • Received:2021-11-11 Revised:2022-06-21 Online:2023-01-15 Published:2023-01-09
  • About author:GUO Wei,born in 1996,postgraduate.His main research interests include cyberspace security and software engineering.
    WU Zehui,born in 1988,Ph.D.His main research interests include software security analysis and vulnerability ana-lysis of cloud platform.
  • Supported by:
    National Key Research and Development Project(2019QY0501).

摘要: 软件公共组件库的广泛使用在提升了软件开发效率的同时,也扩大了软件的攻击面。存在于公共组件库中的漏洞会广泛分布在使用了该库文件的软件中,并且由于兼容性、稳定性以及开发延迟等问题,使得该类漏洞的修复难度大,修补周期长。软件成分分析是解决该类问题的重要手段,但是受限于特征选择有效程度不高和公共组件库的精准特征提取困难的问题,成分分析的准确度不高,普遍停留在种类定位水平。文中提出了一种基于交叉指纹分析的公共组件库特征提取方法,基于GitHub平台25 000个开源项目构建指纹库,提出利用源码字符串角色分类、导出函数指纹分析、二进制编译指纹分析等方式来提取组件库的交叉指纹,实现了公共组件库的精准定位,开发了原型工具LVRecognizer,对516个真实软件进行了测试和评估,精确率达到94.74%。

关键词: 软件成分分析, 组件识别, 动态链接库, 版本识别

Abstract: The widespread use of software public component libraries increases the speed of software development while expanding the attack surface of software.Vulnerabilities that exist in public component libraries are widely distributed in software that uses the library files,and the compatibility,stability,and development delays make it difficult to fix such vulnerabilities and the patching period is long.Software component analysis is an important tool to solve such problems,but limited by the problem of ineffective feature selection and difficulties in extracting accurate features from public component libraries,the accuracy of component analysis is not high and generally stays at the level of kind location.In this paper,we propose a public component library feature extraction method based on cross-fingerprint analysis,build a fingerprint library based on 25 000 open source projects on GitHub platform,propose source string role classification,export function fingerprint analysis,binary compilation fingerprint analysis,etc.to extract cross-fingerprints of component libraries,realize the accurate localization of public component libraries,develop a prototype tool LVRecognizer,test and evaluate 516 real softwares,and obtain a accuracy rate of 94.74%.

Key words: Software component analysis, Component identification, Dynamically linked library, Version identification

中图分类号: 

  • TP311
[1]GRACE M C,ZHOU W,JIANG X,et al.Unsafe exposure ana-lysis of mobile in-app advertisements[C]//Proceedings of the fifth ACM Conference on Security and Privacy in Wireless and Mobile Networks.2012:101-112.
[2]TANG W,CHEN D,LUO P.Bcfinder:A lightweight and platform-independent tool to find third-party components in binaries[C]//2018 25th Asia-Pacific Software Engineering Conference(APSEC).2018:288-297.
[3]ZHANG D,LUO P,TANG W,et al.OSLDetector:identifying open-source libraries through binary analysis[C]//2020 35th IEEE/ACM International Conference on Automated Software Engineering(ASE).2020:1312-1315.
[4]DUAN R,BIJLANI A,XU M,et al.Identifying open-source license violation and 1-day security risk at large scale[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:2169-2185.
[5]WANG H,GUO Y,MA Z,et al.Wukong:A scalable and accurate two-phase approach to android app clone detection[C]//Proceedings of the 2015 International Symposium on Software Testing and Analysis.2015:71-82.
[6]MA Z,WANG H,GUO Y,et al.Libradar:fast and accurate detection of third-party libraries in android apps[C]//Proceedings of the 38th International Conference on Software Engineering Companion.2016:653-656.
[7]MIYANI D,ZHEN H,DAVID L.Binpro:A tool for binarysource code provenance[J].arXiv:1711.00830,2018.
[8]SOH C,TAN H B,ARNATOVICH Y L,et al.Libsift:automated detection of third-party libraries in android applications[C]//2016 23rd Asia-Pacific Software Engineering Conference(APSEC).2016:41-48.
[9]ZHANG Z,DIAO W,HU C,et al.An empirical study of potentially malicious third-party libraries in Android apps[C]//Proceedings of the 13th ACM Conference on Security and Privacy in Wireless and Mobile Networks.2020:144-154.
[10]ZHANG Y,DAI J,ZHANG X,et al.Detecting third-party li-braries in android applications with high precision and recall[C]//2018 IEEE 25th International Conference on Software Analysis,Evolution and Reengineering(SANER).2018:141-152.
[11]GLANZ L,AMANN S,EICHBERG M,et al.CodeMatch:obfuscation won't conceal your repackaged app[C]//Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.2017:638-648.
[12]TANG W,LUO P,FU J,et al.LibDX:A Cross-Platform andAccurate System to Detect Third-Party Libraries in Binary Code[C]//2020 IEEE 27th International Conference on Software Analysis,Evolution and Reengineering(SANER).2020:104-115.
[13]FANG L,WU Z H,WEI Q.Summary of Binary Code Similarity Detection Techniques[J].Computer Science,2021,48(5):1-8.
[1] 侯尚文, 黄建军, 梁彬, 游伟, 石文昌.
一种基于实时代码装卸载的代码重用攻击防御方法
Defense Method Against Code Reuse Attack Based on Real-time Code Loading and Unloading
计算机科学, 2022, 49(10): 279-284. https://doi.org/10.11896/jsjkx.220500091
[2] 陈志泊,林 健.
基于DirectUl可扩展应用程序架构的设计与实现
Expanded Application Framework Based on DirectUI
计算机科学, 2012, 39(Z11): 295-300.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!