计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 395-403.doi: 10.11896/jsjkx.241000014

• 信息安全 • 上一篇    下一篇

基于双流深度学习的Dockerfile安全误配置检测方法

赵宁, 王金双, 崔帅   

  1. 陆军工程大学指挥控制工程学院 南京 210007
  • 收稿日期:2024-10-08 修回日期:2024-12-06 出版日期:2025-10-15 发布日期:2025-10-14
  • 通讯作者: 王金双(siyezhishuang@163.com)
  • 作者简介:(zhaonig@yeah.net)

Dual-stream Feature Fusion Approach for Dockerfile Security Misconfiguration Detection

ZHAO Ning, WANG Jinshuang, CUI Shuai   

  1. Institute of Command Control Engineering,Army Engineering University,Nanjing 210007,China
  • Received:2024-10-08 Revised:2024-12-06 Online:2025-10-15 Published:2025-10-14
  • About author:ZHAO Ning,born in 1993,postgra-duate.Her main research interest is system security.
    WANG Jinshuang,born in 1978,Ph.D,associate professor.His main research interest is system security.

摘要: Dockerfile错误配置容易引发容器安全漏洞。现有检测方法侧重于文本的结构分析和语义理解,缺乏对指令使用频率、镜像层数、代码复杂度等度量特征的关注,准确率不高。针对该问题,提出一种融合特征度量与深度语义理解的双流深度学习检测方法。该方法首先采用静态检测工具识别并标注含有安全误配置的Dockerfile样本;然后构建抽象语法树解析并提取代码度量特征,并使用随机森林算法进一步筛选关键安全特征;最后提取文本信息和安全特征度量信息,输入双流模型进行检测。该模型采用双向长短期记忆网络追踪指令序列前后依赖,挖掘深度语义关联;应用Transformer模型构建高维度量表示,捕捉度量到安全配置缺陷的复杂映射;使用卷积神经网络子层和ReLU激活函数融合双流信息,实现Dockerfile安全误配置检测。实验表明,所提方法的查准率、查全率和F1值分别达到了96%,98%和97%,其性能相较于已有检测方法有所提升,可以有效识别Dockerfile安全误配置。

关键词: 容器安全, Dockerfile, 安全误配置检测, 深度学习, 双流模型

Abstract: Dockerfile misconfigurations frequently lead to container security vulnerabilities.Current detection methods rely on structural analysis and semantic understanding of the text,while pay little attention to metrics such as command frequency,image layer counts,code complexity,etc.To solve this problem,a dual-stream deep learning detection approach is proposed,which integrates feature metrics with semantic comprehension.Firstly,it identifies and annotates Dockerfile samples containing security misconfigurations using static detection tools such as Hadolint and KICS.Then,by constructing abstract syntax trees,it parses and extracts code metric features and refines crucial security features using the random forest algorithm.Lastly,it extracts textual information and security feature metrics and then inputs them into a dual-stream model for detection.Bi-LSTM network is utilized to trace the forward and backward dependencies within instruction sequences,which is helpful for uncovering deep semantic associations.Transformer model is employed to create high-dimensional metric representations,which can model mappings from me-tric to security misconfiguration.CNN sublayers with ReLU activation functions are used to fuse information from both streams.Experimental results indicate that the proposed method achieves 96%,98% and 97% in precision,recall,and F1-score respectively.The proposed approach can detect security misconfiguration more accurately compared to existing approaches.

Key words: Container security,Dockerfile,Security misconfiguration detection,Deep learning,Dual-stream model

中图分类号: 

  • TP391
[1]CITO J,SCHERMANN G,WITTERN J E,et al.An Empirical Analysis of the Docker Container Ecosystem on GitHub[C]//2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).IEEE,2017:323-333.
[2]WU Y W,ZHANG Y,WANG T,et al.Development Exploration of Container Technology Through Docker Containers:A Systematic Literature Review Perspective[J].Journal of Software,2023,34(12):5527-5551.
[3]HENKEL J,BIRD C,LAHIRI S K,et al.Learning from,Understanding,and Supporting DevOps Artifacts for Docker[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.ACM,2020:38-49.
[4]HENKEL J,SILVA D,TEIXEIRA L,et al.Shipwright:A Hu-man-in-the-Loop System for Dockerfile Repair[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).IEEE,2021:1148-1160.
[5]WIST K,HELSEM M,GLIGOROSKI D.Vulnerability Analysis of 2500 Docker Hub Images[C]//Advances in Security,Networks,and Internet of Things:Proceedings from SAM'20,ICWN'20,ICOMP'20,and ESCS'20.Springer,2021:307-327.
[6]LI M,BAI X,MA M,et al.DockerMock:Pre-build detection of dockerfile faults through mocking instruction execution[J].ar-Xiv:2104.05490,2021.
[7]WU Y,ZHANG Y,CHANG J,et al.Using Configuration Se-mantic Features and Machine Learning Algorithms to Predict Build Result in Cloud-Based Container Environment[C]//2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS).IEEE,2020:248-255.
[8]BOROVITS N,KUMARA I,KRISHNAN P,et al.DeepIaC:deep learning-based linguistic anti-pattern detection in IaC[C]//Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation.2020:7-12.
[9]SHAO S S,LI K,RAO H C,et al.Research on a Docker risk prediction method based on deep learning[J].Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition),2021,41(2):104-112.
[10]DE GIORGI L A.Security Misconfigurations Detection and Repair in Dockerfile[D].Torino:Politecnico di Torino,2022.
[11]HAO J,LU H,JIANG Y,et al.DFScan:Security Scanner of the Dockerfile Based on Instruction Coverage and Attack Perspective[J].Human-centric Computing and Information Sciences,2024,14:article 10.
[12]WU Y,ZHANG Y,WANG T,et al.Dockerfile Changes in Practice:A Large-Scale Empirical Study of 4,110 Projects on GitHub[C]//2020 27th Asia-Pacific Software Engineering Conference (APSEC).IEEE,2020:247-256.
[13]DURIEUX T.Empirical Study of the Docker Smells Impact on the Image Size[C]//Proceedings of the IEEE/ACM 46th International Conference on Software Engineering.2024:1-12.
[14]WU Y.Exploring the relationship between dockerfile qualityand project characteristics[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering:Companion Proceedings.ACM,2020:128-130.
[15]ZHANG Y,VASILESCU B,WANG H,et al.One Size Does Not Fit All:An Empirical Study of Containerized Continuous Deployment Workflows[C]//Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.ACM,2018:295-306.
[16]SCHERMANN G,ZUMBERI S,CITO J.Structured informa-tion on state and evolution of dockerfiles on github[C]//Proceedings of the 15th International Conference on Mining Software Repositories.ACM,2018:26-29.
[17]BUI Q C,LAUKOTTER M,SCANDARIATO R.DockerCleaner:Automatic Repair of Security Smells in Dockerfiles[C]//2023 IEEE International Conference on Software Maintenance and Evolution (ICSME).IEEE,2023:160-170.
[18]ZHOU Y,ZHAN W,LI Z,et al.DRIVE:Dockerfile Rule Mi-ning and Violation Detection[J].ACM Transactions on Software Engineering and Methodology,2023,33(2):1-23.
[19]YU D J,YANG Q X,CHEN X,et al.Actionable code smellidentification with fusion learning of metrics and semantics[J].Science of Computer Programming,2024,236:103110.
[20]ZHANG Y,DONG C H,LIU H,et al.Code Smell Detection Approach Based on Pre-training Model and Multi-level Information[J].Journal of Software,2022,33(5):1551-1568.
[21]WANG H,LIU J,KANG J,et al.Feature Envy Detection based on Bi-LSTM with Self-Attention Mechanism[C]//2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom).IEEE,2020:448-457.
[22]DALLA PALMA S,DI NUCCI D,PALOMBA F,et al.Within-Project Defect Prediction of Infrastructure-as-Code Using Pro-duct and Process Metrics[J].IEEE Transactions on Software Engineering,2022,48(6):2086-2104.
[23]RAHMAN A,WILLIAMS L.Characterizing Defective Configuration Scripts Used for Continuous Deployment[C]//2018 IEEE 11th International Conference on Software Testing,Verification and Validation (ICST).IEEE,2018:34-45.
[24]RAHMAN A,WILLIAMS L.Source Code Properties of Defective Infrastructure as Code Scripts[J].Information and Software Technology,2019,112:148-163.
[25]ZHANG Y,GE C,LIU H,et al.Code smell detection based on supervised learning models:A survey[J].Neurocomputing,2024,565:127014.
[26]AHAMED W S S,ZAVARSKY P,SWAR B.Security Audit of Docker Container Images in Cloud Architecture[C]//2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC).IEEE,2021:202-207.
[27]RAHMAN A,WILLIAMS L.Source Code Properties of Defective Infrastructure as Code Scripts[J].Information and Software Technology,2019,112:148-163.
[28]DALLA PALMA S,DI NUCCI D,TAMBURRI D A.AnsibleMetrics:A Python library for measuring Infrastructure-as-Code blueprints in Ansible[J].SoftwareX,2020,12:100633.
[29]VAN DER BENT E,HAGE J,VISSER J,et al.How good is your puppet? An empirically defined and validated quality model for puppet[C]//2018 IEEE 25th International Conference on Software Analysis,Evolution and Reengineering (SANER).IEEE,2018:164-174.
[30]YILMAZ S,TOKLU S.A deep learning analysis on question classification task using Word2vec representations[J].Neural Computing and Applications,2020,32(7):2909-2928.
[31]MUKHOTI J,KULHARIA V,SANYAL A,et al.Calibrating deep neural networks using focal loss[J].Advances in Neural Information Processing Systems,2020,33:15288-15299.
[32]LIU H,JIN J,XU Z,et al.Deep learning based code smell detection[J].IEEE Transactions on Software Engineering,2019,47(9):1811-1837.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!