基于双流深度学习的Dockerfile安全误配置检测方法

doi:10.11896/jsjkx.241000014

Abstract

Abstract: Dockerfile misconfigurations frequently lead to container security vulnerabilities.Current detection methods rely on structural analysis and semantic understanding of the text,while pay little attention to metrics such as command frequency,image layer counts,code complexity,etc.To solve this problem,a dual-stream deep learning detection approach is proposed,which integrates feature metrics with semantic comprehension.Firstly,it identifies and annotates Dockerfile samples containing security misconfigurations using static detection tools such as Hadolint and KICS.Then,by constructing abstract syntax trees,it parses and extracts code metric features and refines crucial security features using the random forest algorithm.Lastly,it extracts textual information and security feature metrics and then inputs them into a dual-stream model for detection.Bi-LSTM network is utilized to trace the forward and backward dependencies within instruction sequences,which is helpful for uncovering deep semantic associations.Transformer model is employed to create high-dimensional metric representations,which can model mappings from me-tric to security misconfiguration.CNN sublayers with ReLU activation functions are used to fuse information from both streams.Experimental results indicate that the proposed method achieves 96%,98% and 97% in precision,recall,and F1-score respectively.The proposed approach can detect security misconfiguration more accurately compared to existing approaches.

Key words: Container security,Dockerfile,Security misconfiguration detection,Deep learning,Dual-stream model

CLC Number:

TP391

ZHAO Ning, WANG Jinshuang, CUI Shuai. Dual-stream Feature Fusion Approach for Dockerfile Security Misconfiguration Detection[J].Computer Science, 2025, 52(10): 395-403.

References

[1]CITO J,SCHERMANN G,WITTERN J E,et al.An Empirical Analysis of the Docker Container Ecosystem on GitHub[C]//2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).IEEE,2017:323-333.
[2]WU Y W,ZHANG Y,WANG T,et al.Development Exploration of Container Technology Through Docker Containers:A Systematic Literature Review Perspective[J].Journal of Software,2023,34(12):5527-5551.
[3]HENKEL J,BIRD C,LAHIRI S K,et al.Learning from,Understanding,and Supporting DevOps Artifacts for Docker[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.ACM,2020:38-49.
[4]HENKEL J,SILVA D,TEIXEIRA L,et al.Shipwright:A Hu-man-in-the-Loop System for Dockerfile Repair[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).IEEE,2021:1148-1160.
[5]WIST K,HELSEM M,GLIGOROSKI D.Vulnerability Analysis of 2500 Docker Hub Images[C]//Advances in Security,Networks,and Internet of Things:Proceedings from SAM'20,ICWN'20,ICOMP'20,and ESCS'20.Springer,2021:307-327.
[6]LI M,BAI X,MA M,et al.DockerMock:Pre-build detection of dockerfile faults through mocking instruction execution[J].ar-Xiv:2104.05490,2021.
[7]WU Y,ZHANG Y,CHANG J,et al.Using Configuration Se-mantic Features and Machine Learning Algorithms to Predict Build Result in Cloud-Based Container Environment[C]//2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS).IEEE,2020:248-255.
[8]BOROVITS N,KUMARA I,KRISHNAN P,et al.DeepIaC:deep learning-based linguistic anti-pattern detection in IaC[C]//Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation.2020:7-12.
[9]SHAO S S,LI K,RAO H C,et al.Research on a Docker risk prediction method based on deep learning[J].Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition),2021,41(2):104-112.
[10]DE GIORGI L A.Security Misconfigurations Detection and Repair in Dockerfile[D].Torino:Politecnico di Torino,2022.
[11]HAO J,LU H,JIANG Y,et al.DFScan:Security Scanner of the Dockerfile Based on Instruction Coverage and Attack Perspective[J].Human-centric Computing and Information Sciences,2024,14:article 10.
[12]WU Y,ZHANG Y,WANG T,et al.Dockerfile Changes in Practice:A Large-Scale Empirical Study of 4,110 Projects on GitHub[C]//2020 27th Asia-Pacific Software Engineering Conference (APSEC).IEEE,2020:247-256.
[13]DURIEUX T.Empirical Study of the Docker Smells Impact on the Image Size[C]//Proceedings of the IEEE/ACM 46th International Conference on Software Engineering.2024:1-12.
[14]WU Y.Exploring the relationship between dockerfile qualityand project characteristics[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering:Companion Proceedings.ACM,2020:128-130.
[15]ZHANG Y,VASILESCU B,WANG H,et al.One Size Does Not Fit All:An Empirical Study of Containerized Continuous Deployment Workflows[C]//Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.ACM,2018:295-306.
[16]SCHERMANN G,ZUMBERI S,CITO J.Structured informa-tion on state and evolution of dockerfiles on github[C]//Proceedings of the 15th International Conference on Mining Software Repositories.ACM,2018:26-29.
[17]BUI Q C,LAUKOTTER M,SCANDARIATO R.DockerCleaner:Automatic Repair of Security Smells in Dockerfiles[C]//2023 IEEE International Conference on Software Maintenance and Evolution (ICSME).IEEE,2023:160-170.
[18]ZHOU Y,ZHAN W,LI Z,et al.DRIVE:Dockerfile Rule Mi-ning and Violation Detection[J].ACM Transactions on Software Engineering and Methodology,2023,33(2):1-23.
[19]YU D J,YANG Q X,CHEN X,et al.Actionable code smellidentification with fusion learning of metrics and semantics[J].Science of Computer Programming,2024,236:103110.
[20]ZHANG Y,DONG C H,LIU H,et al.Code Smell Detection Approach Based on Pre-training Model and Multi-level Information[J].Journal of Software,2022,33(5):1551-1568.
[21]WANG H,LIU J,KANG J,et al.Feature Envy Detection based on Bi-LSTM with Self-Attention Mechanism[C]//2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom).IEEE,2020:448-457.
[22]DALLA PALMA S,DI NUCCI D,PALOMBA F,et al.Within-Project Defect Prediction of Infrastructure-as-Code Using Pro-duct and Process Metrics[J].IEEE Transactions on Software Engineering,2022,48(6):2086-2104.
[23]RAHMAN A,WILLIAMS L.Characterizing Defective Configuration Scripts Used for Continuous Deployment[C]//2018 IEEE 11th International Conference on Software Testing,Verification and Validation (ICST).IEEE,2018:34-45.
[24]RAHMAN A,WILLIAMS L.Source Code Properties of Defective Infrastructure as Code Scripts[J].Information and Software Technology,2019,112:148-163.
[25]ZHANG Y,GE C,LIU H,et al.Code smell detection based on supervised learning models:A survey[J].Neurocomputing,2024,565:127014.
[26]AHAMED W S S,ZAVARSKY P,SWAR B.Security Audit of Docker Container Images in Cloud Architecture[C]//2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC).IEEE,2021:202-207.
[27]RAHMAN A,WILLIAMS L.Source Code Properties of Defective Infrastructure as Code Scripts[J].Information and Software Technology,2019,112:148-163.
[28]DALLA PALMA S,DI NUCCI D,TAMBURRI D A.AnsibleMetrics:A Python library for measuring Infrastructure-as-Code blueprints in Ansible[J].SoftwareX,2020,12:100633.
[29]VAN DER BENT E,HAGE J,VISSER J,et al.How good is your puppet? An empirically defined and validated quality model for puppet[C]//2018 IEEE 25th International Conference on Software Analysis,Evolution and Reengineering (SANER).IEEE,2018:164-174.
[30]YILMAZ S,TOKLU S.A deep learning analysis on question classification task using Word2vec representations[J].Neural Computing and Applications,2020,32(7):2909-2928.
[31]MUKHOTI J,KULHARIA V,SANYAL A,et al.Calibrating deep neural networks using focal loss[J].Advances in Neural Information Processing Systems,2020,33:15288-15299.
[32]LIU H,JIN J,XU Z,et al.Deep learning based code smell detection[J].IEEE Transactions on Software Engineering,2019,47(9):1811-1837.

Related Articles 15

[1]	WANG Baocai, WU Guowei. Interpretable Credit Risk Assessment Model:Rule Extraction Approach Based on AttentionMechanism [J]. Computer Science, 2025, 52(10): 50-59.
[2]	ZHENG Hanyuan, GE Rongjun, HE Shengji, LI Nan. Direct PET to CT Attenuation Correction Algorithm Based on Imaging Slice Continuity [J]. Computer Science, 2025, 52(10): 115-122.
[3]	XU Hengyu, CHEN Kun, XU Lin, SUN Mingzhai, LU Zhou. SAM-Retina:Arteriovenous Segmentation in Dual-modal Retinal Image Based on SAM [J]. Computer Science, 2025, 52(10): 123-133.
[4]	WEN Jing, ZHANG Songsong, LI Xufeng. Target Tracking Method Based on Cross Scale Fusion of Features and Trajectory Prompts [J]. Computer Science, 2025, 52(10): 144-150.
[5]	SHENG Xiaomeng, ZHAO Junli, WANG Guodong, WANG Yang. Immediate Generation Algorithm of High-fidelity Head Avatars Based on NeRF [J]. Computer Science, 2025, 52(10): 159-167.
[6]	ZHENG Dichen, HE Jikai, LIU Yi, GAO Fan, ZHANG Dengyin. Low Light Image Adaptive Enhancement Algorithm Based on Retinex Theory [J]. Computer Science, 2025, 52(10): 168-175.
[7]	RUAN Ning, LI Chun, MA Haoyue, JIA Yi, LI Tao. Review of Quantum-inspired Metaheuristic Algorithms and Its Applications [J]. Computer Science, 2025, 52(10): 190-200.
[8]	XIONG Zhuozhi, GU Zhouhong, FENG Hongwei, XIAO Yanghua. Subject Knowledge Evaluation Method for Language Models Based on Multiple ChoiceQuestions [J]. Computer Science, 2025, 52(10): 201-207.
[9]	WANG Jian, WANG Jingling, ZHANG Ge, WANG Zhangquan, GUO Shiyuan, YU Guiming. Multimodal Information Extraction Fusion Method Based on Dempster-Shafer Theory [J]. Computer Science, 2025, 52(10): 208-216.
[10]	CHEN Yuyan, JIA Jiyuan, CHANG Jingwen, ZUO Kaiwen, XIAO Yanghua. SPEAKSMART:Evaluating Empathetic Persuasive Responses by Large Language Models [J]. Computer Science, 2025, 52(10): 217-230.
[11]	LI Sihui, CAI Guoyong, JIANG Hang, WEN Yimin. Novel Discrete Diffusion Text Generation Model with Convex Loss Function [J]. Computer Science, 2025, 52(10): 231-238.
[12]	ZHANG Jiawei, WANG Zhongqing, CHEN Jiali. Multi-grained Sentiment Analysis of Comments Based on Text Generation [J]. Computer Science, 2025, 52(10): 239-246.
[13]	CHEN Jiahao, DUAN Liguo, CHANG Xuanwei, LI Aiping, CUI Juanjuan, HAO Yuanbin. Text Sentiment Classification Method Based on Large-batch Adversarial Strategy and EnhancedFeature Extraction [J]. Computer Science, 2025, 52(10): 247-257.
[14]	WANG Ye, WANG Zhongqing. Text Simplification for Aspect-based Sentiment Analysis Based on Large Language Model [J]. Computer Science, 2025, 52(10): 258-265.
[15]	ZHAO Jinshuang, HUANG Degen. Summary Faithfulness Evaluation Based on Data Augmentation and Two-stage Training [J]. Computer Science, 2025, 52(10): 266-274.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Dual-stream Feature Fusion Approach for Dockerfile Security Misconfiguration Detection

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0