Computer Science ›› 2023, Vol. 50 ›› Issue (1): 373-379.doi: 10.11896/jsjkx.211100121

• Information Security • Previous Articles    

Feature Extraction Method for Public Component Libraries Based on Cross-fingerprint Analysis

GUO Wei, WU Zehui, WU Qianqiong, LI Xixing   

  1. School of Cyberspace Security,University of Information Engineering,Zhengzhou 450001,China
    State Key Laboratory of Mathematical Engineering and Advanced Computing,Information Engineering University,Zhengzhou 450001,China
  • Received:2021-11-11 Revised:2022-06-21 Online:2023-01-15 Published:2023-01-09
  • About author:GUO Wei,born in 1996,postgraduate.His main research interests include cyberspace security and software engineering.
    WU Zehui,born in 1988,Ph.D.His main research interests include software security analysis and vulnerability ana-lysis of cloud platform.
  • Supported by:
    National Key Research and Development Project(2019QY0501).

Abstract: The widespread use of software public component libraries increases the speed of software development while expanding the attack surface of software.Vulnerabilities that exist in public component libraries are widely distributed in software that uses the library files,and the compatibility,stability,and development delays make it difficult to fix such vulnerabilities and the patching period is long.Software component analysis is an important tool to solve such problems,but limited by the problem of ineffective feature selection and difficulties in extracting accurate features from public component libraries,the accuracy of component analysis is not high and generally stays at the level of kind location.In this paper,we propose a public component library feature extraction method based on cross-fingerprint analysis,build a fingerprint library based on 25 000 open source projects on GitHub platform,propose source string role classification,export function fingerprint analysis,binary compilation fingerprint analysis,etc.to extract cross-fingerprints of component libraries,realize the accurate localization of public component libraries,develop a prototype tool LVRecognizer,test and evaluate 516 real softwares,and obtain a accuracy rate of 94.74%.

Key words: Software component analysis, Component identification, Dynamically linked library, Version identification

CLC Number: 

  • TP311
[1]GRACE M C,ZHOU W,JIANG X,et al.Unsafe exposure ana-lysis of mobile in-app advertisements[C]//Proceedings of the fifth ACM Conference on Security and Privacy in Wireless and Mobile Networks.2012:101-112.
[2]TANG W,CHEN D,LUO P.Bcfinder:A lightweight and platform-independent tool to find third-party components in binaries[C]//2018 25th Asia-Pacific Software Engineering Conference(APSEC).2018:288-297.
[3]ZHANG D,LUO P,TANG W,et al.OSLDetector:identifying open-source libraries through binary analysis[C]//2020 35th IEEE/ACM International Conference on Automated Software Engineering(ASE).2020:1312-1315.
[4]DUAN R,BIJLANI A,XU M,et al.Identifying open-source license violation and 1-day security risk at large scale[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:2169-2185.
[5]WANG H,GUO Y,MA Z,et al.Wukong:A scalable and accurate two-phase approach to android app clone detection[C]//Proceedings of the 2015 International Symposium on Software Testing and Analysis.2015:71-82.
[6]MA Z,WANG H,GUO Y,et al.Libradar:fast and accurate detection of third-party libraries in android apps[C]//Proceedings of the 38th International Conference on Software Engineering Companion.2016:653-656.
[7]MIYANI D,ZHEN H,DAVID L.Binpro:A tool for binarysource code provenance[J].arXiv:1711.00830,2018.
[8]SOH C,TAN H B,ARNATOVICH Y L,et al.Libsift:automated detection of third-party libraries in android applications[C]//2016 23rd Asia-Pacific Software Engineering Conference(APSEC).2016:41-48.
[9]ZHANG Z,DIAO W,HU C,et al.An empirical study of potentially malicious third-party libraries in Android apps[C]//Proceedings of the 13th ACM Conference on Security and Privacy in Wireless and Mobile Networks.2020:144-154.
[10]ZHANG Y,DAI J,ZHANG X,et al.Detecting third-party li-braries in android applications with high precision and recall[C]//2018 IEEE 25th International Conference on Software Analysis,Evolution and Reengineering(SANER).2018:141-152.
[11]GLANZ L,AMANN S,EICHBERG M,et al.CodeMatch:obfuscation won't conceal your repackaged app[C]//Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.2017:638-648.
[12]TANG W,LUO P,FU J,et al.LibDX:A Cross-Platform andAccurate System to Detect Third-Party Libraries in Binary Code[C]//2020 IEEE 27th International Conference on Software Analysis,Evolution and Reengineering(SANER).2020:104-115.
[13]FANG L,WU Z H,WEI Q.Summary of Binary Code Similarity Detection Techniques[J].Computer Science,2021,48(5):1-8.
[1] WANG Yitan, WANG Yishu, YUAN Ye. Survey of Learned Index [J]. Computer Science, 2023, 50(1): 1-8.
[2] SHAN Zhongyuan, YANG Kai, ZHAO Junfeng, WANG Yasha, XU Yongxin. Ontology-Schema Mapping Based Incremental Entity Model Construction and Evolution Approach of Knowledge Graph [J]. Computer Science, 2023, 50(1): 18-24.
[3] LU Mingchen, LYU Yanqi, LIU Ruicheng, JIN Peiquan. Fast Storage System for Time-series Big Data Streams Based on Waterwheel Model [J]. Computer Science, 2023, 50(1): 25-33.
[4] JIAO Tianzhe, HE Hongyan, ZHANG Zexin, SONG Jie. Study on Big Graph Traversals for Storage Medium Optimization [J]. Computer Science, 2023, 50(1): 34-40.
[5] MENG Yiyue, PENG Rong, LYU Qibiao. Text Material Recommendation Method Combining Label Classification and Semantic QueryExpansion [J]. Computer Science, 2023, 50(1): 76-86.
[6] HUANG Yuzhou, WANG Lisong, QIN Xiaolin. Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning [J]. Computer Science, 2023, 50(1): 194-204.
[7] LI Bei, WU Hao, HE Xiaowei, WANG Bin, XU Ergang. Survey of Storage Scalability in Blockchain Systems [J]. Computer Science, 2023, 50(1): 318-333.
[8] CHEN Yan, LIN Bing, CHEN Xiaona, CHEN Xing. Blockchain-based Trusted Service-oriented Architecture [J]. Computer Science, 2023, 50(1): 342-350.
[9] YAN Qian-yu, LI Yi, PENG Xin. Research Progress and Challenge of Programming by Examples [J]. Computer Science, 2022, 49(11): 1-7.
[10] NI Zhen, LI Bin, SUN Xiao-bing, LI Bi-xin, ZHU Cheng. Research and Progress on Bug Report-oriented Bug Localization Techniques [J]. Computer Science, 2022, 49(11): 8-23.
[11] ZHANG Bing-qing, FEI Qi, WANG Yi-chen, Yang Zhao. Study on Integration Test Order Generation Algorithm for SOA [J]. Computer Science, 2022, 49(11): 24-29.
[12] LI Kang-le, REN Zhi-lei, ZHOU Zhi-de, JIANG He. Decision Tree Algorithm-based API Misuse Detection [J]. Computer Science, 2022, 49(11): 30-38.
[13] GUO Ya-lin, LI Xiao-chen, REN Zhi-lei, JIANG He. Study on Effectiveness of Quality Objectives and Non-quality Objectives for Automated Software Refactoring [J]. Computer Science, 2022, 49(11): 55-64.
[14] GAO Xiu-wu, HUANG Liang-ming, JIANG Jun. Optimization Method of Streaming Storage Based on GCC Compiler [J]. Computer Science, 2022, 49(11): 76-82.
[15] HUANG Ying, JIANG Shu-juan, JIANG Ting-ting. Patch Validation Approach Combining Doc2Vec and BERT Embedding Technologies [J]. Computer Science, 2022, 49(11): 83-89.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!