计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 34-40.doi: 10.11896/jsjkx.190300053

所属专题: 智能软件工程

• 智能软件工程 • 上一篇    下一篇

基于语义相似度的API使用模式推荐

张云帆1,周宇1,2,黄志球1,2   

  1. (南京航空航天大学计算机科学与技术学院 南京210016)1;
    (南京航空航天大学高安全系统的软件开发与验证技术工信部重点实验室 南京211100)2
  • 收稿日期:2019-03-15 出版日期:2020-03-15 发布日期:2020-03-30
  • 通讯作者: 周宇(zhouyu@nuaa.edu.cn)
  • 基金资助:
    国家重点研发计划项目(2018YFB1003900);中央高校基本科研业务费专项资金(NS2019055);江苏高校“青蓝工程”

Semantic Similarity Based API Usage Pattern Recommendation

ZHANG Yun-fan1,ZHOU Yu1,2,HUANG Zhi-qiu1,2   

  1. (College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China)1;
    (Ministry Key Laboratory for Safety-Critical Software Development and Verification, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China)2
  • Received:2019-03-15 Online:2020-03-15 Published:2020-03-30
  • About author:ZHANG Yun-fan,postgraduate.His research interests include software evolution analysis,artificial intelligence, and mining software repositories. ZHOU Yu,postdoctor,professor.His research interests mainly include software evolution analysis,mining software repositories,software architecture,and reliability analysis.
  • Supported by:
    This work was supported by the National Key R&D Program of China (2018YFB1003902), Fundamental Research Funds for the Central Universities (NS2019055) and Qing Lan Project.

摘要: 在软件开发过程中,复用应用程序编程接口(Application Programming Interface,API)可以提高软件开发效率,但是使用不熟悉的API是一项耗时且困难的挑战。已有的研究往往将API作为用户输入的查询,通过在语料库中搜索该API的使用模式来进行推荐,但这并不符合开发人员的查询习惯。文中提出了一种基于自然语言语义相似度的API使用模式推荐方法(Semantic Similazing Based API Recommendation,SSAPIR)。该方法使用层次聚类算法来提取API使用模式,然后通过计算查询信息和API使用模式来描述信息之间的语意相似度,向开发人员推荐相关度高且被广泛使用的API使用模式。为了验证SSAPIR的有效性,文中从GitHub的高质量Java项目中提取9个流行的第三方API库的API使用模式以及API使用模式的描述信息,并根据这9个流行的第三方API库的自然语言查询进行API使用模式推荐。通过计算推荐结果的Hit@K准确率来验证SSAPIR的有效性,实验结果表明,层次聚类能有效提高推荐准确率,且SSAPIR在Hit@10平均准确率上达到了85.02%,优于现有研究工作,能够很好地完成API使用模式推荐任务,为开发人员输入的自然语言查询提供精准的API使用模式。

关键词: API使用模式推荐, 层次聚类, 语义相似度

Abstract: In the process of software development,reusing application programming interface (API) can improve the efficiency of software development.However,it is difficult and time-consuming for developers to use unfamiliar APIs.Previous researches tend to take APIs as inputs to search corpus and recommend API usage patterns,which does not conform to the habits of developers searching for API usage patterns.This paper proposed a novel Semantic Similarity based API Usage Pattern Recommendation approach (SSAPIR).This approach first adopts hierarchical clustering algorithm to extract API usage patterns,and then calculates the semantic similarity between queries and API usage patterns’ description information,aiming to recommend highly relevant and widely used API usage patterns to developers.To verify the effectiveness of SSAPIR,Java projects are collected from GitHub,from which the API usage patterns related to the 9 popular third-party API libraries and their description information are extracted.Ultimately,this paper recommended API usage patterns based on natural language queries which are related to the 9 third-party API libraries.To verify the effectiveness of SSAPIR,this paper measured the Hit@K of the recommendation results.The experimental results demonstrate that SSAPIR can effectively improve the accuracy of recommendation results and achieves an average accuracy of 85.02% in terms of Hit@10,which outperforms the state-of-art work.SSAPIR can complete the API usage pattern recommendation task greatly and provide accurate API usage pattern recommendation for developers by taking natural language queries as inputs.

Key words: API usage pattern recommendation, Hierarchical clustering, Semantic similarity

中图分类号: 

  • TP391
[1]PiCCIONI M,FURIA C A,MEYER B.An empirical study of API usability[C]∥2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.IEEE,2013:5-14.
[2]ZHOU Y,WANG C,YAN X,et al.Automatic Detection and Repair Recommendation of Directive Defects in Java API Documentation[J].IEEE Transactions on Software Engineering,2018.
[3]ZHANG J X,JIANG H,REN Z L,et al.Recommending APIs for API Related Questions in Stack Overflow[J].IEEE Access,2018,6:6205-6219.
[4]ZHONG H,XIE T,ZHANG L,et al.MAPO:Mining and recommending API usage patterns[C]∥Proceedings of the 23 rdEuropean Conference on ECOOP 2009-Object-Oriented Programming.Berlin:Springer,2009:318-343.
[5]BUSE R P L,WEIMER W.Synthesizing API usage examples [C]∥Proceedings of the 34th International Conference on Software Engineering.IEEE Press,2012:782-792.
[6]WANG J,DANG Y N,ZHANG H Y,et al.Mining succinct and high-coverage API usage patterns from source code[C]∥Proceedings of the 10th Working Conference on Mining Software Repositories.IEEE Press,2013:319-328.
[7]NIU H,KEIVANLOO I,ZOU Y.API usage pattern recommendation for software development[J].Journal of Systems and Software,2017,129(C):127-139.
[8]GU X D,ZHANG H Y,ZHANG D M,et al.Deep API learning[C]∥Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.New York:ACM,2016:631-642.
[9]HUANG Q,XIA X,XING Z,et al.API method recommendation without worrying about the task-API knowledge gap[C]∥Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.ACM,2018:293-304.
[10]LI X C,JIANG H,KAMEI Y,et al.Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding[J].arXiv:1810.09723,2018.
[11]HELLENDOORN V J,DEVANBU P.Are deep neural net- works the best choice for modeling source code?[C]∥Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.New York:ACM,2017:763-773.
[12]LU Y,HSIAO I H.Exploring Programming Semantic Analytics with Deep Learning Models[C]∥Proceedings of the 9th International Conference on Learning Analytics & Knowledge.ACM,2019:155-159.
[13]THUNG F,WANG S,LO D,et al.Automatic recommendation of API methods from feature requests[C]∥Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering.IEEE Press,2013:290-300.
[14]MANNING C,RAGHAVAN P,SCHÜTZE H.Introduction to information retrieval[J].Natural Language Engineering,2010,16(1):100-103.
[15]MANNING C,SURDEANU M,BAUER J,et al.The Stanford CoreNLP natural language processing toolkit[C]∥Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations.Baltimore:ACL,2014:55-60.
[16]WordNet English Stopword List[EB/OL]. http://www.d. umn.edu/~tpederse/Group01/WordNet/wordnet-stoplist.html.
[17]RAMOS J.Using tf-idf to determine word relevance in document queries[C]∥Proceedings of the First Instructional Conference on Machine Learning.2003:133-142.
[18]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].ACM computing surveys (CSUR),1999,31(3):264-323.
[19]PUDI V.Data mining:concepts and techniques[M].New York:Oxford University Press,2011.
[20]XU C Y,SUN X B,LI B,et al.MULAPI:Improving API method recommendation with API usage location[J].Journal of Systems and Software,2018,142:195-205.
[21]AVAZPOUR I,PITAKRAT T,GRUNSKE L,et al.Dimensions and metrics for evaluating recommendation systems[M]∥Re-commendation Systems in Software Engineering.Berlin:Sprin-ger,2014:245-273.
[22]MCMILLAN C,POSHYVANYK D,GRECHANIK M,et al. Portfolio:Searching for relevant functions and their usages in millions of lines of code[J].ACM Transactions on Software Engineering and Methodology (TOSEM),2013,22(4):37.
[1] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[2] 王胜, 张仰森, 陈若愚, 向尕.
基于细粒度差异特征的文本匹配方法
Text Matching Method Based on Fine-grained Difference Features
计算机科学, 2021, 48(8): 60-65. https://doi.org/10.11896/jsjkx.200700008
[3] 陈庆超, 王韬, 冯文博, 尹世庄, 刘丽君.
基于最长连续间隔的未知二进制协议格式推断
Unknown Binary Protocol Format Inference Method Based on Longest Continuous Interval
计算机科学, 2020, 47(8): 313-318. https://doi.org/10.11896/jsjkx.190700031
[4] 许飞翔,叶霞,李琳琳,曹军博,王馨.
基于SA-BP算法的本体概念语义相似度综合计算
Comprehensive Calculation of Semantic Similarity of Ontology Concept Based on SA-BP Algorithm
计算机科学, 2020, 47(1): 199-204. https://doi.org/10.11896/jsjkx.181202351
[5] 吴祎凡, 崔艳鹏, 胡建伟.
基于层次聚类的警报处理方法
Alert Processing Method Based on Hierarchical Clustering
计算机科学, 2019, 46(4): 203-209. https://doi.org/10.11896/j.issn.1002-137X.2019.04.032
[6] 唐家琪, 吴璟莉, 廖元秀, 王金艳.
基于双加权投票的蛋白质功能预测
Prediction of Protein Functions Based on Bi-weighted Vote
计算机科学, 2019, 46(4): 222-227. https://doi.org/10.11896/j.issn.1002-137X.2019.04.035
[7] 夏英, 李刘杰, 张旭, 裴海英.
基于层次聚类的不平衡数据加权过采样方法
Weighted Oversampling Method Based on Hierarchical Clustering for Unbalanced Data
计算机科学, 2019, 46(4): 22-27. https://doi.org/10.11896/j.issn.1002-137X.2019.04.004
[8] 杨开平, 李明奇, 覃思义.
基于网络回复的律师评价方法
Lawyer Evaluation Method Based on Network Response
计算机科学, 2018, 45(9): 237-242. https://doi.org/10.11896/j.issn.1002-137X.2018.09.039
[9] 王树怡,董东.
基于聚类和偏序序列的API用法模式挖掘
Mining of API Usage Pattern Based on Clustering and Partial Order Sequences
计算机科学, 2017, 44(Z6): 486-490. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.108
[10] 李锋,谢嗣弘.
基于无监督学习的移动心电信号异常诊断研究
Study on Abnormal Diagnosis of Moving ECG Signals Based on Unsupervised Learning
计算机科学, 2017, 44(Z11): 68-71. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.013
[11] 李寒,佟宁,陈峰.
一种基于层次聚类的软件架构恢复方法
Hierarchical Clustering Based Software Architecture Recovery Approach
计算机科学, 2017, 44(4): 75-78. https://doi.org/10.11896/j.issn.1002-137X.2017.04.016
[12] 林江豪,周咏梅,阳爱民,陈锦.
基于语义相似度的情感特征向量提取方法
Extraction Method of Sentimental Feature Vector Based on Semantic Similarity
计算机科学, 2017, 44(10): 296-301. https://doi.org/10.11896/j.issn.1002-137X.2017.10.053
[13] 洪海燕,刘维.
基于PPI网络的关键蛋白质的高效预测算法
Efficient Prediction Method of Essential Proteins Based on PPI Network
计算机科学, 2016, 43(Z11): 16-20. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.004
[14] 潘树银,高建瓴.
基于数据挖掘的弱关系社交网络及弱关系强化的研究
Research on Weak Relation Social Network and Weak Relation Strengthening Based on Data Mining
计算机科学, 2016, 43(8): 229-232. https://doi.org/10.11896/j.issn.1002-137X.2016.08.046
[15] 彭丽针,吴扬扬.
基于维基百科社区挖掘的词语语义相似度计算
Semantic Similarity Computing Based on Community Mining of Wikipedia
计算机科学, 2016, 43(4): 45-49. https://doi.org/10.11896/j.issn.1002-137X.2016.04.009
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!