计算机科学 ›› 2017, Vol. 44 ›› Issue (Z6): 486-490.doi: 10.11896/j.issn.1002-137X.2017.6A.108

• 大数据与数据挖掘 • 上一篇    下一篇

基于聚类和偏序序列的API用法模式挖掘

王树怡,董东   

  1. 河北师范大学数学与信息科学学院 石家庄050024,河北师范大学数学与信息科学学院 石家庄050024
  • 出版日期:2017-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受河北省自然科学基金(F2013205192)资助

Mining of API Usage Pattern Based on Clustering and Partial Order Sequences

WANG Shu-yi and DONG Dong   

  • Online:2017-12-01 Published:2018-12-01

摘要: 在软件开发过程中,开发人员经常需要遵循特定的API用法模式,而这些用法模式几乎没有相关文档作为参考。为了挖掘API用法模式,提出基于聚类和频繁闭合偏序序列的API用法模式挖掘途径。通过抽象语法树对源代码进行解析,对提取API方法调用序列进行层次聚类,最后使用频繁闭合偏序挖掘算法DFP进行API用法模式的挖掘。实验结果表明,在相同的数据集上,与SPADE算法和BIDE算法相比,所得候选API用法模式集更加精简。

关键词: API用法模式,序列模式挖掘,层次聚类,偏序

Abstract: During software development,a developer often needs to follow specific usage patterns of application programming interface (API).However,few of those is well documented for developers to refer to in order to mining the API usage pattern,this paper proposed an approach that discovers the API usage pattern based on clustering and frequent closed partial order sequence mining.After parsing the source code by abstract syntax tree,the extracted API sequences is hierarchically clustered.Finally,API usage patterns by depth-first frequent closed partial order algorithm (DFP) is excauated.The experiment shows that this approach can obtain more succinct candidate API usage pattern compared to SPADE and BIDE on the same dataset.

Key words: API usage pattern,Sequential pattern mining,Hierarchical clustering,Partial order

[1] KHATOON S,MAHMOOD A,LI G.An evaluation of source code mining techniques[C]∥International Conference on Fuzzy Systems and Knowledge Discovery.2011:1929-1933.
[2] PICCIONI M,FURIA C A,MEYER B.An Empirical Study of API Usability[C]∥Empirical Software Engineering and Mea-surement.New York:ACM,2013:35-44.
[3] ROBILLARD M P.What makes apis hard to learn? Answers from developers[J].IEEE Software,2009,26(6):27-34.
[4] THUMMALAPENTA S,XIE V.PARSEWeb:a programmerassistant for reusing open source code on the Web[C]∥Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering.New York:ACM,2007:204-213.
[5] LI Z,ZHOU Y.PR-Miner:automatically extractingimplicit programming rules and detecting violations in large software code[C]∥European Software Engineering Conference/Foundations of Software Engineering.New York:ACM,2005:306-315.
[6] XIE T,PEI J.MAPO:Mining API usages from open source repositories[C]∥Proceedings of the 2006 international workshop on Mining software repositories.New York:ACM,2006:54-57.
[7] NGUYEN T T,NGUYEN H A,PHAM N H,et al.Graph-based Mining of Multiple Object Usage Patterns[C]∥European Software Engineering Conference/Foundations of Software Engineering.ACM,2009:383-392.
[8] AKBAR R J,OMORI T,MARUYAMA K.Mining API Usage Patterns by Applying Method Categorization to Improve Code Completion[J].IEICE Transactions on Information and Systems,2014,7(5):1069-1083.
[9] SAIED M A,BENOMA R O,SAHRAOU H,et al.Mining multi-level API usage patterns[C]∥2015 IEEE 22nd International Confe-rence on Software Analysis.2015:23-32.
[10] 廖兴,尹俊文,蔡放.基于Java语言的抽象语法树的创建与遍历[J].长沙大学学报,2004,18(4):50-53.
[11] 孙吉贵,刘杰,赵连雨.聚类算法研究[J].软件学报,2008,19(1):48-61.
[12] ACHARYA M,XIE T,PEI J.Mining API patterns as partial orders from source code:from usage scenarios to specification[C]∥European Software Engineering Conference/Foundations of Software Engineering.New York:ACM,2007:25-34.
[13] CASAS-GARRIGA G.Summarizing Sequential Data with Clo-sed Partial Orders[C]∥5th SIAM International Conference on Data Mining.2005.
[14] WANG J,XIE T,ZHANG D,et al.Mining succinct and high-coverage api usage patterns from source code[C]∥Working Conference on Mining Software Repositories.2013:319-328.
[15] PEI J,WANG H,YU P,et al.Discovering frequent closed partial orders from strings[J].IEEE Transactions on Knowledge and Data Engineering,2006,8(11):1467-1481.
[16] ZHONG H,XIE T,PEI J,et al.MAPO:mining and recommending API usage patterns[C]∥the 23rd European Conference on ECOOP.2009:318-343.
[17] ZAKI M.SPADE:An Efficient Algorithm for Mining Frequent Sequences[J].Machine Learning,2001,2(1):31-60.
[18] WANG J,HAN J.BIDE:efficient mining of frequent closed sequences[C]∥20th International Conference on Data Enginee-ring.2004:79-91.
[19] MICHAIL A.Data mining library reuse patterns using genera-lized association rules[C]∥Proceedings of the 22nd InternationalConference on Software Engineering.2000:167-176.
[20] SAHAVECHAPHAN N,CLAYPOOL K.XSnippet:mining For sample code[J].ACM SIGPLAN Notices,2006,1(10):413-430.
[21] HSU S K,LIN S J.MACs:Mining API code snippets for code reuse[J].Expert Systems with Applications,2011,8(6):7291-7301.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!