计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 34-40.doi: 10.11896/jsjkx.190300053

• 智能软件工程 • 上一篇    下一篇

基于语义相似度的API使用模式推荐

张云帆1,周宇1,2,黄志球1,2   

  1. (南京航空航天大学计算机科学与技术学院 南京210016)1;
    (南京航空航天大学高安全系统的软件开发与验证技术工信部重点实验室 南京211100)2
  • 收稿日期:2019-03-15 出版日期:2020-03-15 发布日期:2020-03-30
  • 通讯作者: 周宇(zhouyu@nuaa.edu.cn)
  • 基金资助:
    国家重点研发计划项目(2018YFB1003900);中央高校基本科研业务费专项资金(NS2019055);江苏高校“青蓝工程”

Semantic Similarity Based API Usage Pattern Recommendation

ZHANG Yun-fan1,ZHOU Yu1,2,HUANG Zhi-qiu1,2   

  1. (College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China)1;
    (Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China)2
  • Received:2019-03-15 Online:2020-03-15 Published:2020-03-30
  • About author:ZHANG Yun-fan,postgraduate.His research interests include software evolution analysis,artificial intelligence, and mining software repositories. ZHOU Yu,postdoctor,professor.His research interests mainly include software evolution analysis,mining software repositories,software architecture,and reliability analysis.
  • Supported by:
    This work was supported by the National Key R&D Program of China (2018YFB1003902), Fundamental Research Funds for the Central Universities (NS2019055) and Qing Lan Project.

摘要: 在软件开发过程中,复用应用程序编程接口(Application Programming Interface,API)可以提高软件开发效率,但是使用不熟悉的API是一项耗时且困难的挑战。已有的研究往往将API作为用户输入的查询,通过在语料库中搜索该API的使用模式来进行推荐,但这并不符合开发人员的查询习惯。文中提出了一种基于自然语言语义相似度的API使用模式推荐方法(Semantic Similazing Based API Recommendation,SSAPIR)。该方法使用层次聚类算法来提取API使用模式,然后通过计算查询信息和API使用模式来描述信息之间的语意相似度,向开发人员推荐相关度高且被广泛使用的API使用模式。为了验证SSAPIR的有效性,文中从GitHub的高质量Java项目中提取9个流行的第三方API库的API使用模式以及API使用模式的描述信息,并根据这9个流行的第三方API库的自然语言查询进行API使用模式推荐。通过计算推荐结果的Hit@K准确率来验证SSAPIR的有效性,实验结果表明,层次聚类能有效提高推荐准确率,且SSAPIR在Hit@10平均准确率上达到了85.02%,优于现有研究工作,能够很好地完成API使用模式推荐任务,为开发人员输入的自然语言查询提供精准的API使用模式。

关键词: API使用模式推荐, 语义相似度, 层次聚类

Abstract: In the process of software development,reusing application programming interface (API) can improve the efficiency of software development.However,it is difficult and time-consuming for developers to use unfamiliar APIs.Previous researches tend to take APIs as inputs to search corpus and recommend API usage patterns,which does not conform to the habits of developers searching for API usage patterns.This paper proposed a novel Semantic Similarity based API Usage Pattern Recommendation approach (SSAPIR).This approach first adopts hierarchical clustering algorithm to extract API usage patterns,and then calculates the semantic similarity between queries and API usage patterns’ description information,aiming to recommend highly relevant and widely used API usage patterns to developers.To verify the effectiveness of SSAPIR,Java projects are collected from GitHub,from which the API usage patterns related to the 9 popular third-party API libraries and their description information are extracted.Ultimately,this paper recommended API usage patterns based on natural language queries which are related to the 9 third-party API libraries.To verify the effectiveness of SSAPIR,this paper measured the Hit@K of the recommendation results.The experimental results demonstrate that SSAPIR can effectively improve the accuracy of recommendation results and achieves an average accuracy of 85.02% in terms of Hit@10,which outperforms the state-of-art work.SSAPIR can complete the API usage pattern recommendation task greatly and provide accurate API usage pattern recommendation for developers by taking natural language queries as inputs.

Key words: API usage pattern recommendation, Semantic similarity, Hierarchical clustering

中图分类号: 

  • TP391
[1]PiCCIONI M,FURIA C A,MEYER B.An empirical study of API usability[C]∥2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.IEEE,2013:5-14.
[2]ZHOU Y,WANG C,YAN X,et al.Automatic Detection and Repair Recommendation of Directive Defects in Java API Documentation[J].IEEE Transactions on Software Engineering,2018.
[3]ZHANG J X,JIANG H,REN Z L,et al.Recommending APIs for API Related Questions in Stack Overflow[J].IEEE Access,2018,6:6205-6219.
[4]ZHONG H,XIE T,ZHANG L,et al.MAPO:Mining and recommending API usage patterns[C]∥Proceedings of the 23 rdEuropean Conference on ECOOP 2009-Object-Oriented Programming.Berlin:Springer,2009:318-343.
[5]BUSE R P L,WEIMER W.Synthesizing API usage examples [C]∥Proceedings of the 34th International Conference on Software Engineering.IEEE Press,2012:782-792.
[6]WANG J,DANG Y N,ZHANG H Y,et al.Mining succinct and high-coverage API usage patterns from source code[C]∥Proceedings of the 10th Working Conference on Mining Software Repositories.IEEE Press,2013:319-328.
[7]NIU H,KEIVANLOO I,ZOU Y.API usage pattern recommendation for software development[J].Journal of Systems and Software,2017,129(C):127-139.
[8]GU X D,ZHANG H Y,ZHANG D M,et al.Deep API learning[C]∥Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.New York:ACM,2016:631-642.
[9]HUANG Q,XIA X,XING Z,et al.API method recommendation without worrying about the task-API knowledge gap[C]∥Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.ACM,2018:293-304.
[10]LI X C,JIANG H,KAMEI Y,et al.Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding[J].arXiv:1810.09723,2018.
[11]HELLENDOORN V J,DEVANBU P.Are deep neural net- works the best choice for modeling source code?[C]∥Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.New York:ACM,2017:763-773.
[12]LU Y,HSIAO I H.Exploring Programming Semantic Analytics with Deep Learning Models[C]∥Proceedings of the 9th International Conference on Learning Analytics & Knowledge.ACM,2019:155-159.
[13]THUNG F,WANG S,LO D,et al.Automatic recommendation of API methods from feature requests[C]∥Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering.IEEE Press,2013:290-300.
[14]MANNING C,RAGHAVAN P,SCHÜTZE H.Introduction to information retrieval[J].Natural Language Engineering,2010,16(1):100-103.
[15]MANNING C,SURDEANU M,BAUER J,et al.The Stanford CoreNLP natural language processing toolkit[C]∥Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations.Baltimore:ACL,2014:55-60.
[16]WordNet English Stopword List[EB/OL]. http://www.d. umn.edu/~tpederse/Group01/WordNet/wordnet-stoplist.html.
[17]RAMOS J.Using tf-idf to determine word relevance in document queries[C]∥Proceedings of the First Instructional Conference on Machine Learning.2003:133-142.
[18]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].ACM computing surveys (CSUR),1999,31(3):264-323.
[19]PUDI V.Data mining:concepts and techniques[M].New York:Oxford University Press,2011.
[20]XU C Y,SUN X B,LI B,et al.MULAPI:Improving API method recommendation with API usage location[J].Journal of Systems and Software,2018,142:195-205.
[21]AVAZPOUR I,PITAKRAT T,GRUNSKE L,et al.Dimensions and metrics for evaluating recommendation systems[M]∥Re-commendation Systems in Software Engineering.Berlin:Sprin-ger,2014:245-273.
[22]MCMILLAN C,POSHYVANYK D,GRECHANIK M,et al. Portfolio:Searching for relevant functions and their usages in millions of lines of code[J].ACM Transactions on Software Engineering and Methodology (TOSEM),2013,22(4):37.
[1] 陈庆超, 王韬, 冯文博, 尹世庄, 刘丽君. 基于最长连续间隔的未知二进制协议格式推断[J]. 计算机科学, 2020, 47(8): 313-318.
[2] 许飞翔,叶霞,李琳琳,曹军博,王馨. 基于SA-BP算法的本体概念语义相似度综合计算[J]. 计算机科学, 2020, 47(1): 199-204.
[3] 夏英, 李刘杰, 张旭, 裴海英. 基于层次聚类的不平衡数据加权过采样方法[J]. 计算机科学, 2019, 46(4): 22-27.
[4] 吴祎凡, 崔艳鹏, 胡建伟. 基于层次聚类的警报处理方法[J]. 计算机科学, 2019, 46(4): 203-209.
[5] 唐家琪, 吴璟莉, 廖元秀, 王金艳. 基于双加权投票的蛋白质功能预测[J]. 计算机科学, 2019, 46(4): 222-227.
[6] 杨开平, 李明奇, 覃思义. 基于网络回复的律师评价方法[J]. 计算机科学, 2018, 45(9): 237-242.
[7] 王树怡,董东. 基于聚类和偏序序列的API用法模式挖掘[J]. 计算机科学, 2017, 44(Z6): 486-490.
[8] 李锋,谢嗣弘. 基于无监督学习的移动心电信号异常诊断研究[J]. 计算机科学, 2017, 44(Z11): 68-71.
[9] 李寒,佟宁,陈峰. 一种基于层次聚类的软件架构恢复方法[J]. 计算机科学, 2017, 44(4): 75-78.
[10] 林江豪,周咏梅,阳爱民,陈锦. 基于语义相似度的情感特征向量提取方法[J]. 计算机科学, 2017, 44(10): 296-301.
[11] 洪海燕,刘维. 基于PPI网络的关键蛋白质的高效预测算法[J]. 计算机科学, 2016, 43(Z11): 16-20.
[12] 潘树银,高建瓴. 基于数据挖掘的弱关系社交网络及弱关系强化的研究[J]. 计算机科学, 2016, 43(8): 229-232.
[13] 彭丽针,吴扬扬. 基于维基百科社区挖掘的词语语义相似度计算[J]. 计算机科学, 2016, 43(4): 45-49.
[14] 杨艳林,叶枫,吕鑫,余霖,刘璇. 一种基于DTW聚类的水文时间序列相似性挖掘方法[J]. 计算机科学, 2016, 43(2): 245-249.
[15] 郑志蕴,阮春阳,李伦,李钝. 本体语义相似度自适应综合加权算法研究[J]. 计算机科学, 2016, 43(10): 242-247.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .