计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 78-84.doi: 10.11896/jsjkx.230300224

• 计算机软件 • 上一篇    下一篇

基于序列的程序语义规则挖掘与违规检测方法

李孜1, 周宇1,2   

  1. 1 南京航空航天大学计算机科学与技术学院 南京 210016
    2 高安全系统的软件开发与验证技术工信部重点实验室 南京 211100
  • 收稿日期:2023-03-30 修回日期:2023-09-14 出版日期:2024-06-15 发布日期:2024-06-05
  • 通讯作者: 周宇(zhouyu@nuaa.edu.cn)
  • 作者简介:(zl.2021@nuaa.edu.cn)
  • 基金资助:
    国家自然科学基金(61972197);国防基础科研项目(JCKY2022605C006);江苏省自然科学基金(BK20201292)

Sequence-based Program Semantic Rule Mining and Violation Detection

LI Zi1, ZHOU Yu1,2   

  1. 1 College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
    2 Ministry Key Laboratory for Safety-Critical Software Development and Verification,Nanjing 211100,China
  • Received:2023-03-30 Revised:2023-09-14 Online:2024-06-15 Published:2024-06-05
  • About author:LI Zi,born in 1998,postgraduate.His main research interests include software evolution analysis,artificial intelligence and software artifacts mining.
    ZHOU Yu,born in 1981,postdoctor,professor.His main research interests include software evolution analysis,mining software repositories,software architecture,and reliability analysis.
  • Supported by:
    National Natural Science Foundation of China(61972197),Defense Industrial Technology Development Program(JCKY2022605C006) and Natural Science Foundation of Jiangsu Province,China(BK20201292).

摘要: 在软件开发中,违反语义规则的源码可以正常地编译或运行,但却存在性能或功能上的缺陷。因此,如何准确地检测此类缺陷成为了一项挑战。已有的研究通常采用基于项集的规则挖掘与检测方法,但由于未能良好地结合源码的顺序信息与控制流信息,此类方法在检测能力以及准确率上都存在较大的提升空间。针对该问题,提出了一种基于序列的程序语义规则提取与违规检测方法 SPUME。该方法将程序源码转化为中间表示序列,使用序列规则挖掘算法从中提取语义规则,并基于语义规则对源码中的缺陷进行检测。为验证SPUME 的有效性,文中将其与3种基线方法进行了对比,包括PR-Miner,Tikanga以及Bugram。实验结果表明,相较于基于无序项集进行规则挖掘的PR-Miner,以及结合了图模型的Tikanga,SPUME在检测效果、检测速度以及准确率上都有显著提升。相比基于Ngram语言模型的Bugram方法,SPUME在准确率与其相当的情况下,高效地检测出了更多程序缺陷。

关键词: 语义规则挖掘, 重叠聚类, 缺陷检测

Abstract: In software development,source code that violates semantic rules may compile or run normally but may have defects in performance or functionality.Therefore,accurately detecting such defects has become a challenge.Existing research usually adopts itemset-based rule mining and detection methods,but these methods have significant room for improvement in detection ability and accuracy due to the failure to integrate the order information and control flow information of source code effectively.To address this problem,this paper propose a sequence-based method called SPUME for extracting and detecting program semantic rules.The method converts program source code into an intermediate representation sequence,extracts semantic rules from it using sequence rule mining algorithms,and detects defects in the source code based on these rules.To verify the effectiveness of SPUME,it is compared with three baseline methods,including PR-Miner,Tikanga,and Bugram.Experimental results show that compared with PR-Miner,which is based on unordered itemset mining,and Tikanga,which combines graph models,SPUME has significantly improved detection performance,speed,and accuracy.Compared with Bugram,which is based on Ngram language models,SPUME detects more program defects more efficiently while maintaining a similar level of accuracy.

Key words: Semantic rule mining, Overlapping clustering, Defect detection

中图分类号: 

  • TP391
[1]LI Z,ZHOU Y.PR-Miner:automatically extracting implicit programming rules and detecting violations in large software code[J].ACM SIGSOFT Software Engineering Notes,2005,30(5):306-315.
[2]LIVSHITS B,ZIMMERMANN T.Dynamine:finding commonerror patterns by mining software revision histories[J].ACM SIGSOFT Software Engineering Notes,2005,30(5):296-305.
[3]CHANG R Y,PODGURSKI A,YANG J.Finding what’s not there:a new approach to revealing neglected conditions in software[C]//Proceedings of the 2007 International Symposium on Software Testing and Analysis.2007:163-173.
[4]ANDRZEJ W,ANDREAS Z,CHRISTIAN L.Detecting object usage anomalies[C]//Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Enginee-ring(ESEC-FSE’07).Association for Computing Machinery,New York,NY,USA,2007:35-44.
[5]WASYLKOWSKI A,ZELLER A.Mining temporal specifica-tions from object usage[J].Automated Software Engineering,2011,18:263-292.
[6]WANG S,CHOLLAK D,MOVSHOVITZ-ATTIAS D,et al.Bugram:bug detection with n-gram language models[C]//Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering.2016:708-719.
[7]ZHOU Y,ZHAN W,LI Z,et al.DRIVE:Dockerfile Rule Mi-ning and Violation Detection[J].arXiv:2212.05648,2022.
[8]LUNA J M,FOURNIER-VIGER P,VENTURA S.Frequent itemset mining:A 25 years review[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2019,9(6):e1329.
[9]CHEN S F,GOODMAN J.An empirical study of smoothingtechniques for language modeling[J].Computer Speech & Language,1999:13(4):359-394.
[10]KAUR J,MADAN N.Association rule mining:A survey[J].International Journal of Hybrid Information Technology,2015,8(7):239-242.
[11]FOURNIER-VIGER P,NKAMBOU R,TSENG V S M.RuleGrowth:mining sequential rules common to several sequences by pattern-growth[C]//Proceedings of the 2011 ACM Sympo-sium on Applied Computing.2011:956-961.
[12]MCNICHOLAS P D,MURPHY T B,O’REGAN M.Standardi-sing the lift of an association rule[J].Computational Statistics &Data Analysis,2008,52(10):4712-4721.
[13]FOURNIER-VIGER P,GUENICHE T,TSENG V S.Usingpartially-ordered sequential rules to generate more accurate sequence prediction[C]//Advanced Data Mining and Applications:8th International Conference(ADMA 2012).Nanjing,China,Springer Berlin Heidelberg,2012:431-442.
[14]PEI J,HAN J,MORTAZAVI-ASL B,et al.Mining sequential patterns by pattern-growth:The prefixspan approach[J].IEEE Transactions on knowledge and data engineering,2004,16(11):1424-1440.
[15]NAYROLLES M,MOHA N,VALTCHEV P.Improving SOA antipatterns detection in Service Based Systems by mining execution traces[C]//2013 20th Working Conference on Reverse Engineering(WCRE).IEEE,2013:321-330.
[16]KAMSU-FOGUEM B,RIGAL F,MAUGET F.Mining association rules for the quality improvement of the production process[J].Expert Systems with Applications,2013,40(4):1034-1045.
[17]ELIBEN.PyCParser[EB/OL].https://github.com-/eliben/pycparser.
[18]BAADEL S,THABTAH F,LU J.Overlapping clustering:A review[C]//2016 SAI Computing Conference(SAI).IEEE,2016:233-237.
[19]IENCO D,BORDOGNA G.Fuzzy extensions of the DBScanclustering algorithm[J].Soft Computing,2018,22(5):1719-1730.
[20]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[21]FENG Z,GUO D,TANG D,et al.Codebert:A pre-trained mo-del for programming and natural languages[J].arXiv:2002.08155,2020.
[22]LAI W,ZHOU M,HU F,et al.A new DBSCAN parameters determination method based on improved MVO[J].IEEE Access,2019,7:104085-104095.
[23]KIM I Y,DE WECK O L.Adaptive weighted-sum method for bi-objective optimization:Pareto front generation[J].Structural and Multidisciplinary Optimization,2005,29:149-158.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!