计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 395-403.doi: 10.11896/jsjkx.241000014
赵宁, 王金双, 崔帅
ZHAO Ning, WANG Jinshuang, CUI Shuai
摘要: Dockerfile错误配置容易引发容器安全漏洞。现有检测方法侧重于文本的结构分析和语义理解,缺乏对指令使用频率、镜像层数、代码复杂度等度量特征的关注,准确率不高。针对该问题,提出一种融合特征度量与深度语义理解的双流深度学习检测方法。该方法首先采用静态检测工具识别并标注含有安全误配置的Dockerfile样本;然后构建抽象语法树解析并提取代码度量特征,并使用随机森林算法进一步筛选关键安全特征;最后提取文本信息和安全特征度量信息,输入双流模型进行检测。该模型采用双向长短期记忆网络追踪指令序列前后依赖,挖掘深度语义关联;应用Transformer模型构建高维度量表示,捕捉度量到安全配置缺陷的复杂映射;使用卷积神经网络子层和ReLU激活函数融合双流信息,实现Dockerfile安全误配置检测。实验表明,所提方法的查准率、查全率和F1值分别达到了96%,98%和97%,其性能相较于已有检测方法有所提升,可以有效识别Dockerfile安全误配置。
中图分类号:
[1]CITO J,SCHERMANN G,WITTERN J E,et al.An Empirical Analysis of the Docker Container Ecosystem on GitHub[C]//2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).IEEE,2017:323-333. [2]WU Y W,ZHANG Y,WANG T,et al.Development Exploration of Container Technology Through Docker Containers:A Systematic Literature Review Perspective[J].Journal of Software,2023,34(12):5527-5551. [3]HENKEL J,BIRD C,LAHIRI S K,et al.Learning from,Understanding,and Supporting DevOps Artifacts for Docker[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.ACM,2020:38-49. [4]HENKEL J,SILVA D,TEIXEIRA L,et al.Shipwright:A Hu-man-in-the-Loop System for Dockerfile Repair[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).IEEE,2021:1148-1160. [5]WIST K,HELSEM M,GLIGOROSKI D.Vulnerability Analysis of 2500 Docker Hub Images[C]//Advances in Security,Networks,and Internet of Things:Proceedings from SAM'20,ICWN'20,ICOMP'20,and ESCS'20.Springer,2021:307-327. [6]LI M,BAI X,MA M,et al.DockerMock:Pre-build detection of dockerfile faults through mocking instruction execution[J].ar-Xiv:2104.05490,2021. [7]WU Y,ZHANG Y,CHANG J,et al.Using Configuration Se-mantic Features and Machine Learning Algorithms to Predict Build Result in Cloud-Based Container Environment[C]//2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS).IEEE,2020:248-255. [8]BOROVITS N,KUMARA I,KRISHNAN P,et al.DeepIaC:deep learning-based linguistic anti-pattern detection in IaC[C]//Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation.2020:7-12. [9]SHAO S S,LI K,RAO H C,et al.Research on a Docker risk prediction method based on deep learning[J].Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition),2021,41(2):104-112. [10]DE GIORGI L A.Security Misconfigurations Detection and Repair in Dockerfile[D].Torino:Politecnico di Torino,2022. [11]HAO J,LU H,JIANG Y,et al.DFScan:Security Scanner of the Dockerfile Based on Instruction Coverage and Attack Perspective[J].Human-centric Computing and Information Sciences,2024,14:article 10. [12]WU Y,ZHANG Y,WANG T,et al.Dockerfile Changes in Practice:A Large-Scale Empirical Study of 4,110 Projects on GitHub[C]//2020 27th Asia-Pacific Software Engineering Conference (APSEC).IEEE,2020:247-256. [13]DURIEUX T.Empirical Study of the Docker Smells Impact on the Image Size[C]//Proceedings of the IEEE/ACM 46th International Conference on Software Engineering.2024:1-12. [14]WU Y.Exploring the relationship between dockerfile qualityand project characteristics[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering:Companion Proceedings.ACM,2020:128-130. [15]ZHANG Y,VASILESCU B,WANG H,et al.One Size Does Not Fit All:An Empirical Study of Containerized Continuous Deployment Workflows[C]//Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.ACM,2018:295-306. [16]SCHERMANN G,ZUMBERI S,CITO J.Structured informa-tion on state and evolution of dockerfiles on github[C]//Proceedings of the 15th International Conference on Mining Software Repositories.ACM,2018:26-29. [17]BUI Q C,LAUKOTTER M,SCANDARIATO R.DockerCleaner:Automatic Repair of Security Smells in Dockerfiles[C]//2023 IEEE International Conference on Software Maintenance and Evolution (ICSME).IEEE,2023:160-170. [18]ZHOU Y,ZHAN W,LI Z,et al.DRIVE:Dockerfile Rule Mi-ning and Violation Detection[J].ACM Transactions on Software Engineering and Methodology,2023,33(2):1-23. [19]YU D J,YANG Q X,CHEN X,et al.Actionable code smellidentification with fusion learning of metrics and semantics[J].Science of Computer Programming,2024,236:103110. [20]ZHANG Y,DONG C H,LIU H,et al.Code Smell Detection Approach Based on Pre-training Model and Multi-level Information[J].Journal of Software,2022,33(5):1551-1568. [21]WANG H,LIU J,KANG J,et al.Feature Envy Detection based on Bi-LSTM with Self-Attention Mechanism[C]//2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom).IEEE,2020:448-457. [22]DALLA PALMA S,DI NUCCI D,PALOMBA F,et al.Within-Project Defect Prediction of Infrastructure-as-Code Using Pro-duct and Process Metrics[J].IEEE Transactions on Software Engineering,2022,48(6):2086-2104. [23]RAHMAN A,WILLIAMS L.Characterizing Defective Configuration Scripts Used for Continuous Deployment[C]//2018 IEEE 11th International Conference on Software Testing,Verification and Validation (ICST).IEEE,2018:34-45. [24]RAHMAN A,WILLIAMS L.Source Code Properties of Defective Infrastructure as Code Scripts[J].Information and Software Technology,2019,112:148-163. [25]ZHANG Y,GE C,LIU H,et al.Code smell detection based on supervised learning models:A survey[J].Neurocomputing,2024,565:127014. [26]AHAMED W S S,ZAVARSKY P,SWAR B.Security Audit of Docker Container Images in Cloud Architecture[C]//2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC).IEEE,2021:202-207. [27]RAHMAN A,WILLIAMS L.Source Code Properties of Defective Infrastructure as Code Scripts[J].Information and Software Technology,2019,112:148-163. [28]DALLA PALMA S,DI NUCCI D,TAMBURRI D A.AnsibleMetrics:A Python library for measuring Infrastructure-as-Code blueprints in Ansible[J].SoftwareX,2020,12:100633. [29]VAN DER BENT E,HAGE J,VISSER J,et al.How good is your puppet? An empirically defined and validated quality model for puppet[C]//2018 IEEE 25th International Conference on Software Analysis,Evolution and Reengineering (SANER).IEEE,2018:164-174. [30]YILMAZ S,TOKLU S.A deep learning analysis on question classification task using Word2vec representations[J].Neural Computing and Applications,2020,32(7):2909-2928. [31]MUKHOTI J,KULHARIA V,SANYAL A,et al.Calibrating deep neural networks using focal loss[J].Advances in Neural Information Processing Systems,2020,33:15288-15299. [32]LIU H,JIN J,XU Z,et al.Deep learning based code smell detection[J].IEEE Transactions on Software Engineering,2019,47(9):1811-1837. |
|