Computer Science ›› 2024, Vol. 51 ›› Issue (12): 79-86.doi: 10.11896/jsjkx.231200100

• Computer Software • Previous Articles     Next Articles

Ensemble Learning Based Open Source License Detection and Compatibility Assessment

BAI Jianghao, PIAO Yong   

  1. School of Software Engineering, Dalian University of Technology, Dalian, Liaoning 116620, China
  • Received:2023-12-15 Revised:2024-05-06 Online:2024-12-15 Published:2024-12-10
  • About author:BAI Jianghao,born in 1999,postgra-duate.His main research interests include natural language processing and so on.
    PIAO Yong,born in 1975,Ph.D,asso-ciate professor,is a member of CCF(No.E3677M).His main research interests include data mining and intelligent computing.

Abstract: The quality and evolution of software are profoundly influenced by the security and reliability of the software supply chain.An essential element of this chain is the analysis of licenses associated with different software components.Open source licenses play a vital role in defining conditions for using open source software,safeguarding intellectual property,and ensuring the sustained development of open source projects.To mitigate legal risks and protect against property losses,it is imperative to accurately identify open source software licenses and assess their compatibility.In this paper,we propose an innovative method for detecting open source licenses using ensemble learning,complemented by a recommendation system based on compatibility.Our main approach leverages ensemble learning techniques,particularly emphasizing the use of large language models.To bolster the accuracy of open source license detection,this methodology is augmented with rule matching.Subsequently,compatibility assessments and license recommendations are derived using directed graph algorithms.Experimental results validate the effectiveness of our method,showcasing not only reduced maintenance costs and heightened scalability but also superior detection performance in comparison to traditional methods.The proposed approach excels in identifying compatibility issues and provides dependable recommendations,thereby contributing to a more secure and reliable software supply chain.

Key words: Large language model, Ensemble learning, Open source license, Sentence vector similarity, Compatibility assessment

CLC Number: 

  • TP391
[1]XU S,GAO Y,FAN L,et al.Lidetector:License incompatibility detection for open source software[J].ACM Transactions on Software Engineering and Methodology,2023,32(1):1-28.
[2]ZHAO L.An Analysis of Open Source Components in Mixed Source Software Projects[J].Computer Science, 2020, 47(S2):541-543,583.
[3]TU L Y.An Analysis of Legal Attribute of Open Source Software License[J].Legal System and Society,2021(17):189-190.
[4]LIU B B.Research on the Legal Issues of Open Source License Agreement[D].Lanzhou:Lanzhou University, 2020:1-45.
[5]KAPITSAKI G M, CHARALAMBOUS G.Find your OpenSource License Now![C]//Asia-Pacific Software Engineering Conference(APSEC).IEEE, 2016:1-8.
[6]WOLTER T,BARCOMB A,RIEHLE D,et al.Open source license inconsistencies on github[J].ACM Transactions on Software Engineering and Methodology,2023,32(5):1-23.
[7]ALMEIDA D A,MURPHY G C,WILSON G,et al.Investigating whether and how software developers understand open source software licensing[J].Empirical Software Engineering,2019,24:211-239.
[8]PASHCHENKO I,VU D L,MASSACCI F.A qualitative study of dependency management and its security implications[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security.2020:1513-1531.
[9]OSI approved licenses[OL].https://opensource.org/licenses/
[10]GERMAN D M,MANABE Y,INOUE K.A sentence-matching method for automatic license identification of source code files[C]//Proceedings of the 25th IEEE/ACM International Confe-rence on Automated Software Engineering.2010:437-446.
[11]JAEGER M C, FENDT O, GOBEILLE R, et al.The FOSSology Project:10 Years Of License Scanning[J].International Free and Open Source Software Law Review, 2018, 9(1):9-18.
[12]KAPITSAKI G M, TSELIKAS N D, FOUKARAKIS I E.An insight into license tools for open source software systems[M].Elsevier Science Inc.,2015:72-87.
[13]HARUTYUNYAN N,BAUER A,RIEHLE D.Industry re-quirements for FLOSS governance tools to facilitate the use of open source software in commercial products[J].The Journal of Systems and Software,2019, 158(Dec.):110390.1-110390.1-12.
[14]KAPITSAKI G M,CHARALAMBOUS G.Modeling and recommending open source licenses with findOSSLicense[J].IEEE Transactions on Software Engineering,2019,47(5):919-935.
[15]LIU X,HUANG L G,GE J,et al.Predicting licenses forchanged source code[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2019:686-697.
[16]BALLHAUSEN M.Free and open source software licenses explained[J].Computer,2019,52(6):82-86.
[17]KAPITSAKI G,KRAMER F,TSELIKAS N D.Automating the license compatibility process in open source software with SPDX[J].Journal of Systems & Software,2016,131(Sep.):386-401.
[18]WANG Z Q,WU S,XIAO G Q,et al.How to Properly Choose Open Source Software Licenses for Open Source Software[J].Journal of Software,2021,32(5):1227-1229.
[19]LI B,ZHOU H,HE J,et al.On the sentence embeddings from pre-trained language models[J].arXiv:2011.05864,2020.
[20]SU J,CAO J,LIU W,et al.Whitening sentence representations for better semantics and faster retrieval[J].arXiv:2103.15316,2021.
[21]YAN Y,LI R,WANG S,et al.Consert:A contrastive framework for self-supervised sentence representation transfer[J].arXiv:2105.11741,2021.
[22]GAO T,YAO X,CHEN D.Simcse:Simple contrastive learning of sentence embeddings[J].arXiv:2104.08821,2021.
[23]WU X,GAO C,ZANG L,et al.Esimcse:Enhanced samplebuilding method for contrastive learning of unsupervised sentence embedding[J].arXiv:2109.04380,2021.
[24]REIMERS N,GUREVYCH I.Sentence-bert:Sentence embeddings using siamese bert-networks[J].arXiv:1908.10084,2019.
[1] LU Xulin, LI Zhihua. IoT Device Recognition Method Combining Multimodal IoT Device Fingerprint and Ensemble Learning [J]. Computer Science, 2024, 51(9): 371-382.
[2] LIU Yumeng, ZHAO Yijing, WANG Bicong, WANG Chao, ZHANG Baomin. Advances in SQL Intelligent Synthesis Technology [J]. Computer Science, 2024, 51(7): 40-48.
[3] LIANG Meiyan, FAN Yingying, WANG Lin. Fine-grained Colon Pathology Images Classification Based on Heterogeneous Ensemble Learningwith Multi-distance Measures [J]. Computer Science, 2024, 51(6A): 230400043-7.
[4] LI Xinrui, ZHANG Yanfang, KANG Xiaodong, LI Bo, HAN Junling. Intelligent Diagnosis of Brain Tumor with MRI Based on Ensemble Learning [J]. Computer Science, 2024, 51(6A): 230600043-7.
[5] ZHUO Peiyan, ZHANG Yaona, LIU Wei, LIU Zijin, SONG You. CTGANBoost:Credit Fraud Detection Based on CTGAN and Boosting [J]. Computer Science, 2024, 51(6A): 230600199-7.
[6] KANG Wei, LI Lihui, WEN Yimin. Semi-supervised Classification of Data Stream with Concept Drift Based on Clustering Model Reuse [J]. Computer Science, 2024, 51(4): 124-131.
[7] LI Zhanqi, WU Xinwei, ZHANG Lei, LIU Quanzhou, XIE Hui, XIONG Deyi. Automatic Test Case Generation Method for Automotive Electronic Control System Verification [J]. Computer Science, 2024, 51(12): 63-70.
[8] ZHU Yangfu, LI Meiling, TAN Jiachen, WU Bin. Study on Text-based Personality Detection-A Review [J]. Computer Science, 2024, 51(12): 209-222.
[9] MA Qimin, LI Xiangmin, ZHOU Yaqian. Large Language Model-based Method for Mobile App Accessibility Enhancement [J]. Computer Science, 2024, 51(12): 223-233.
[10] ZHANG Jinying, WANG Tiankun, YAO Changying, XIE Hua, CHAI Linzheng, LIU Shukai, LI Tongliang, LI Zhoujun. Construction and Evaluation of Intelligent Question Answering System for Electric Power Knowledge Base Based on Large Language Model [J]. Computer Science, 2024, 51(12): 286-292.
[11] KA Zuming, ZHAO Peng, ZHANG Bo, FU Xiaoning. Survey of Recommender Systems for Large Language Models [J]. Computer Science, 2024, 51(11A): 240800111-11.
[12] CHEN Jing, ZHOU Gang, LI Shunhang, ZHENG Jiali, LU Jicang, HAO Yaohui. Review of Fake News Detection on Social Media [J]. Computer Science, 2024, 51(11): 1-14.
[13] ZHAO Yue, HE Jinwen, ZHU Shenchen, LI Congyi, ZHANG Yingjie, CHEN Kai. Security of Large Language Models:Current Status and Challenges [J]. Computer Science, 2024, 51(1): 68-71.
[14] ZHANG Desheng, CHEN Bo, ZHANG Jianhui, BU Youjun, SUN Chongxin, SUN Jia. Browser Fingerprint Recognition Based on Improved Self-paced Ensemble Algorithm [J]. Computer Science, 2023, 50(7): 317-324.
[15] YANG Qianlong, JIANG Lingyun. Study on Load Balancing Algorithm of Microservices Based on Machine Learning [J]. Computer Science, 2023, 50(5): 313-321.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!