计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 34-40.doi: 10.11896/jsjkx.190300053
张云帆1,周宇1,2,黄志球1,2
ZHANG Yun-fan1,ZHOU Yu1,2,HUANG Zhi-qiu1,2
摘要: 在软件开发过程中,复用应用程序编程接口(Application Programming Interface,API)可以提高软件开发效率,但是使用不熟悉的API是一项耗时且困难的挑战。已有的研究往往将API作为用户输入的查询,通过在语料库中搜索该API的使用模式来进行推荐,但这并不符合开发人员的查询习惯。文中提出了一种基于自然语言语义相似度的API使用模式推荐方法(Semantic Similazing Based API Recommendation,SSAPIR)。该方法使用层次聚类算法来提取API使用模式,然后通过计算查询信息和API使用模式来描述信息之间的语意相似度,向开发人员推荐相关度高且被广泛使用的API使用模式。为了验证SSAPIR的有效性,文中从GitHub的高质量Java项目中提取9个流行的第三方API库的API使用模式以及API使用模式的描述信息,并根据这9个流行的第三方API库的自然语言查询进行API使用模式推荐。通过计算推荐结果的Hit@K准确率来验证SSAPIR的有效性,实验结果表明,层次聚类能有效提高推荐准确率,且SSAPIR在Hit@10平均准确率上达到了85.02%,优于现有研究工作,能够很好地完成API使用模式推荐任务,为开发人员输入的自然语言查询提供精准的API使用模式。
中图分类号:
[1]PiCCIONI M,FURIA C A,MEYER B.An empirical study of API usability[C]∥2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.IEEE,2013:5-14. [2]ZHOU Y,WANG C,YAN X,et al.Automatic Detection and Repair Recommendation of Directive Defects in Java API Documentation[J].IEEE Transactions on Software Engineering,2018. [3]ZHANG J X,JIANG H,REN Z L,et al.Recommending APIs for API Related Questions in Stack Overflow[J].IEEE Access,2018,6:6205-6219. [4]ZHONG H,XIE T,ZHANG L,et al.MAPO:Mining and recommending API usage patterns[C]∥Proceedings of the 23 rdEuropean Conference on ECOOP 2009-Object-Oriented Programming.Berlin:Springer,2009:318-343. [5]BUSE R P L,WEIMER W.Synthesizing API usage examples [C]∥Proceedings of the 34th International Conference on Software Engineering.IEEE Press,2012:782-792. [6]WANG J,DANG Y N,ZHANG H Y,et al.Mining succinct and high-coverage API usage patterns from source code[C]∥Proceedings of the 10th Working Conference on Mining Software Repositories.IEEE Press,2013:319-328. [7]NIU H,KEIVANLOO I,ZOU Y.API usage pattern recommendation for software development[J].Journal of Systems and Software,2017,129(C):127-139. [8]GU X D,ZHANG H Y,ZHANG D M,et al.Deep API learning[C]∥Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.New York:ACM,2016:631-642. [9]HUANG Q,XIA X,XING Z,et al.API method recommendation without worrying about the task-API knowledge gap[C]∥Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.ACM,2018:293-304. [10]LI X C,JIANG H,KAMEI Y,et al.Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding[J].arXiv:1810.09723,2018. [11]HELLENDOORN V J,DEVANBU P.Are deep neural net- works the best choice for modeling source code?[C]∥Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.New York:ACM,2017:763-773. [12]LU Y,HSIAO I H.Exploring Programming Semantic Analytics with Deep Learning Models[C]∥Proceedings of the 9th International Conference on Learning Analytics & Knowledge.ACM,2019:155-159. [13]THUNG F,WANG S,LO D,et al.Automatic recommendation of API methods from feature requests[C]∥Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering.IEEE Press,2013:290-300. [14]MANNING C,RAGHAVAN P,SCHÜTZE H.Introduction to information retrieval[J].Natural Language Engineering,2010,16(1):100-103. [15]MANNING C,SURDEANU M,BAUER J,et al.The Stanford CoreNLP natural language processing toolkit[C]∥Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations.Baltimore:ACL,2014:55-60. [16]WordNet English Stopword List[EB/OL]. http://www.d. umn.edu/~tpederse/Group01/WordNet/wordnet-stoplist.html. [17]RAMOS J.Using tf-idf to determine word relevance in document queries[C]∥Proceedings of the First Instructional Conference on Machine Learning.2003:133-142. [18]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].ACM computing surveys (CSUR),1999,31(3):264-323. [19]PUDI V.Data mining:concepts and techniques[M].New York:Oxford University Press,2011. [20]XU C Y,SUN X B,LI B,et al.MULAPI:Improving API method recommendation with API usage location[J].Journal of Systems and Software,2018,142:195-205. [21]AVAZPOUR I,PITAKRAT T,GRUNSKE L,et al.Dimensions and metrics for evaluating recommendation systems[M]∥Re-commendation Systems in Software Engineering.Berlin:Sprin-ger,2014:245-273. [22]MCMILLAN C,POSHYVANYK D,GRECHANIK M,et al. Portfolio:Searching for relevant functions and their usages in millions of lines of code[J].ACM Transactions on Software Engineering and Methodology (TOSEM),2013,22(4):37. |
[1] | 陈庆超, 王韬, 冯文博, 尹世庄, 刘丽君. 基于最长连续间隔的未知二进制协议格式推断[J]. 计算机科学, 2020, 47(8): 313-318. |
[2] | 许飞翔,叶霞,李琳琳,曹军博,王馨. 基于SA-BP算法的本体概念语义相似度综合计算[J]. 计算机科学, 2020, 47(1): 199-204. |
[3] | 夏英, 李刘杰, 张旭, 裴海英. 基于层次聚类的不平衡数据加权过采样方法[J]. 计算机科学, 2019, 46(4): 22-27. |
[4] | 吴祎凡, 崔艳鹏, 胡建伟. 基于层次聚类的警报处理方法[J]. 计算机科学, 2019, 46(4): 203-209. |
[5] | 唐家琪, 吴璟莉, 廖元秀, 王金艳. 基于双加权投票的蛋白质功能预测[J]. 计算机科学, 2019, 46(4): 222-227. |
[6] | 杨开平, 李明奇, 覃思义. 基于网络回复的律师评价方法[J]. 计算机科学, 2018, 45(9): 237-242. |
[7] | 王树怡,董东. 基于聚类和偏序序列的API用法模式挖掘[J]. 计算机科学, 2017, 44(Z6): 486-490. |
[8] | 李锋,谢嗣弘. 基于无监督学习的移动心电信号异常诊断研究[J]. 计算机科学, 2017, 44(Z11): 68-71. |
[9] | 李寒,佟宁,陈峰. 一种基于层次聚类的软件架构恢复方法[J]. 计算机科学, 2017, 44(4): 75-78. |
[10] | 林江豪,周咏梅,阳爱民,陈锦. 基于语义相似度的情感特征向量提取方法[J]. 计算机科学, 2017, 44(10): 296-301. |
[11] | 洪海燕,刘维. 基于PPI网络的关键蛋白质的高效预测算法[J]. 计算机科学, 2016, 43(Z11): 16-20. |
[12] | 潘树银,高建瓴. 基于数据挖掘的弱关系社交网络及弱关系强化的研究[J]. 计算机科学, 2016, 43(8): 229-232. |
[13] | 彭丽针,吴扬扬. 基于维基百科社区挖掘的词语语义相似度计算[J]. 计算机科学, 2016, 43(4): 45-49. |
[14] | 杨艳林,叶枫,吕鑫,余霖,刘璇. 一种基于DTW聚类的水文时间序列相似性挖掘方法[J]. 计算机科学, 2016, 43(2): 245-249. |
[15] | 郑志蕴,阮春阳,李伦,李钝. 本体语义相似度自适应综合加权算法研究[J]. 计算机科学, 2016, 43(10): 242-247. |
|