计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 34-40.doi: 10.11896/jsjkx.190300053
所属专题: 智能软件工程
张云帆1,周宇1,2,黄志球1,2
ZHANG Yun-fan1,ZHOU Yu1,2,HUANG Zhi-qiu1,2
摘要: 在软件开发过程中,复用应用程序编程接口(Application Programming Interface,API)可以提高软件开发效率,但是使用不熟悉的API是一项耗时且困难的挑战。已有的研究往往将API作为用户输入的查询,通过在语料库中搜索该API的使用模式来进行推荐,但这并不符合开发人员的查询习惯。文中提出了一种基于自然语言语义相似度的API使用模式推荐方法(Semantic Similazing Based API Recommendation,SSAPIR)。该方法使用层次聚类算法来提取API使用模式,然后通过计算查询信息和API使用模式来描述信息之间的语意相似度,向开发人员推荐相关度高且被广泛使用的API使用模式。为了验证SSAPIR的有效性,文中从GitHub的高质量Java项目中提取9个流行的第三方API库的API使用模式以及API使用模式的描述信息,并根据这9个流行的第三方API库的自然语言查询进行API使用模式推荐。通过计算推荐结果的Hit@K准确率来验证SSAPIR的有效性,实验结果表明,层次聚类能有效提高推荐准确率,且SSAPIR在Hit@10平均准确率上达到了85.02%,优于现有研究工作,能够很好地完成API使用模式推荐任务,为开发人员输入的自然语言查询提供精准的API使用模式。
中图分类号:
[1]PiCCIONI M,FURIA C A,MEYER B.An empirical study of API usability[C]∥2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.IEEE,2013:5-14. [2]ZHOU Y,WANG C,YAN X,et al.Automatic Detection and Repair Recommendation of Directive Defects in Java API Documentation[J].IEEE Transactions on Software Engineering,2018. [3]ZHANG J X,JIANG H,REN Z L,et al.Recommending APIs for API Related Questions in Stack Overflow[J].IEEE Access,2018,6:6205-6219. [4]ZHONG H,XIE T,ZHANG L,et al.MAPO:Mining and recommending API usage patterns[C]∥Proceedings of the 23 rdEuropean Conference on ECOOP 2009-Object-Oriented Programming.Berlin:Springer,2009:318-343. [5]BUSE R P L,WEIMER W.Synthesizing API usage examples [C]∥Proceedings of the 34th International Conference on Software Engineering.IEEE Press,2012:782-792. [6]WANG J,DANG Y N,ZHANG H Y,et al.Mining succinct and high-coverage API usage patterns from source code[C]∥Proceedings of the 10th Working Conference on Mining Software Repositories.IEEE Press,2013:319-328. [7]NIU H,KEIVANLOO I,ZOU Y.API usage pattern recommendation for software development[J].Journal of Systems and Software,2017,129(C):127-139. [8]GU X D,ZHANG H Y,ZHANG D M,et al.Deep API learning[C]∥Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.New York:ACM,2016:631-642. [9]HUANG Q,XIA X,XING Z,et al.API method recommendation without worrying about the task-API knowledge gap[C]∥Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.ACM,2018:293-304. [10]LI X C,JIANG H,KAMEI Y,et al.Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding[J].arXiv:1810.09723,2018. [11]HELLENDOORN V J,DEVANBU P.Are deep neural net- works the best choice for modeling source code?[C]∥Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.New York:ACM,2017:763-773. [12]LU Y,HSIAO I H.Exploring Programming Semantic Analytics with Deep Learning Models[C]∥Proceedings of the 9th International Conference on Learning Analytics & Knowledge.ACM,2019:155-159. [13]THUNG F,WANG S,LO D,et al.Automatic recommendation of API methods from feature requests[C]∥Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering.IEEE Press,2013:290-300. [14]MANNING C,RAGHAVAN P,SCHÜTZE H.Introduction to information retrieval[J].Natural Language Engineering,2010,16(1):100-103. [15]MANNING C,SURDEANU M,BAUER J,et al.The Stanford CoreNLP natural language processing toolkit[C]∥Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations.Baltimore:ACL,2014:55-60. [16]WordNet English Stopword List[EB/OL]. http://www.d. umn.edu/~tpederse/Group01/WordNet/wordnet-stoplist.html. [17]RAMOS J.Using tf-idf to determine word relevance in document queries[C]∥Proceedings of the First Instructional Conference on Machine Learning.2003:133-142. [18]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].ACM computing surveys (CSUR),1999,31(3):264-323. [19]PUDI V.Data mining:concepts and techniques[M].New York:Oxford University Press,2011. [20]XU C Y,SUN X B,LI B,et al.MULAPI:Improving API method recommendation with API usage location[J].Journal of Systems and Software,2018,142:195-205. [21]AVAZPOUR I,PITAKRAT T,GRUNSKE L,et al.Dimensions and metrics for evaluating recommendation systems[M]∥Re-commendation Systems in Software Engineering.Berlin:Sprin-ger,2014:245-273. [22]MCMILLAN C,POSHYVANYK D,GRECHANIK M,et al. Portfolio:Searching for relevant functions and their usages in millions of lines of code[J].ACM Transactions on Software Engineering and Methodology (TOSEM),2013,22(4):37. |
[1] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[2] | 王胜, 张仰森, 陈若愚, 向尕. 基于细粒度差异特征的文本匹配方法 Text Matching Method Based on Fine-grained Difference Features 计算机科学, 2021, 48(8): 60-65. https://doi.org/10.11896/jsjkx.200700008 |
[3] | 陈庆超, 王韬, 冯文博, 尹世庄, 刘丽君. 基于最长连续间隔的未知二进制协议格式推断 Unknown Binary Protocol Format Inference Method Based on Longest Continuous Interval 计算机科学, 2020, 47(8): 313-318. https://doi.org/10.11896/jsjkx.190700031 |
[4] | 许飞翔,叶霞,李琳琳,曹军博,王馨. 基于SA-BP算法的本体概念语义相似度综合计算 Comprehensive Calculation of Semantic Similarity of Ontology Concept Based on SA-BP Algorithm 计算机科学, 2020, 47(1): 199-204. https://doi.org/10.11896/jsjkx.181202351 |
[5] | 吴祎凡, 崔艳鹏, 胡建伟. 基于层次聚类的警报处理方法 Alert Processing Method Based on Hierarchical Clustering 计算机科学, 2019, 46(4): 203-209. https://doi.org/10.11896/j.issn.1002-137X.2019.04.032 |
[6] | 唐家琪, 吴璟莉, 廖元秀, 王金艳. 基于双加权投票的蛋白质功能预测 Prediction of Protein Functions Based on Bi-weighted Vote 计算机科学, 2019, 46(4): 222-227. https://doi.org/10.11896/j.issn.1002-137X.2019.04.035 |
[7] | 夏英, 李刘杰, 张旭, 裴海英. 基于层次聚类的不平衡数据加权过采样方法 Weighted Oversampling Method Based on Hierarchical Clustering for Unbalanced Data 计算机科学, 2019, 46(4): 22-27. https://doi.org/10.11896/j.issn.1002-137X.2019.04.004 |
[8] | 杨开平, 李明奇, 覃思义. 基于网络回复的律师评价方法 Lawyer Evaluation Method Based on Network Response 计算机科学, 2018, 45(9): 237-242. https://doi.org/10.11896/j.issn.1002-137X.2018.09.039 |
[9] | 王树怡,董东. 基于聚类和偏序序列的API用法模式挖掘 Mining of API Usage Pattern Based on Clustering and Partial Order Sequences 计算机科学, 2017, 44(Z6): 486-490. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.108 |
[10] | 李锋,谢嗣弘. 基于无监督学习的移动心电信号异常诊断研究 Study on Abnormal Diagnosis of Moving ECG Signals Based on Unsupervised Learning 计算机科学, 2017, 44(Z11): 68-71. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.013 |
[11] | 李寒,佟宁,陈峰. 一种基于层次聚类的软件架构恢复方法 Hierarchical Clustering Based Software Architecture Recovery Approach 计算机科学, 2017, 44(4): 75-78. https://doi.org/10.11896/j.issn.1002-137X.2017.04.016 |
[12] | 林江豪,周咏梅,阳爱民,陈锦. 基于语义相似度的情感特征向量提取方法 Extraction Method of Sentimental Feature Vector Based on Semantic Similarity 计算机科学, 2017, 44(10): 296-301. https://doi.org/10.11896/j.issn.1002-137X.2017.10.053 |
[13] | 洪海燕,刘维. 基于PPI网络的关键蛋白质的高效预测算法 Efficient Prediction Method of Essential Proteins Based on PPI Network 计算机科学, 2016, 43(Z11): 16-20. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.004 |
[14] | 潘树银,高建瓴. 基于数据挖掘的弱关系社交网络及弱关系强化的研究 Research on Weak Relation Social Network and Weak Relation Strengthening Based on Data Mining 计算机科学, 2016, 43(8): 229-232. https://doi.org/10.11896/j.issn.1002-137X.2016.08.046 |
[15] | 彭丽针,吴扬扬. 基于维基百科社区挖掘的词语语义相似度计算 Semantic Similarity Computing Based on Community Mining of Wikipedia 计算机科学, 2016, 43(4): 45-49. https://doi.org/10.11896/j.issn.1002-137X.2016.04.009 |
|