计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 48-56.doi: 10.11896/jsjkx.200800217
刘蕴涵, 沙朝锋, 牛军钰
LIU Yun-han, SHA Chao-feng, NIU Jun-yu
摘要: 数据库管理系统虽是一种较为成熟的软件系统,但开发人员在应用数据库系统进行数据管理以及数据分析时还是会遇到各种问题,因此会在Stack Overflow之类的问答论坛上寻求解决方法。文中获取了Stack Overflow上94473条与数据库相关的问题,应用LDA主题模型将这些问题归为25个主题,结果显示开发者的问题可归为“表”“SQL”“SELECT”等主题。通过研究与数据库相关的不同主题的流行度和困难程度发现,“SQL”主题相关的问题较为流行。除此以外,文中还分别研究了3种不同的数据库,即MySQL,Oracle和MongoDB,分析了与不同数据库系统相关的问题的主题分布。文中的研究成果有助于了解数据库开发者所面临的挑战,从而为数据库系统版本更新、数据库课程教学内容的设置,甚至是数据库领域的研究问题提供参考。
中图分类号:
[1]MAMYKINA L,MANOIM B,MITTAL M,et al.Design lessons from the fastest Q&A site in the west[C]//Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.ACM,2011:2857-2866. [2]Stack Overflow.Developer Survey Results 2019[EB/OL].(2020) [2020-03-20] https://insights.stackoverflow.com/survey/2019. [3]TREUDE C,BARZILAY O,STOREY M A.How do programmers ask and answer questions on the web?(Nier track)[C]//2011 33rd International Conference on Software Engineering(ICSE).IEEE,2011:804-807. [4]ALLAMANIS M,SUTTON C.Why,when,and what:analyzing stack overflow questions by topic,type,and code[C]//Procee-dings of the 10th Working Conference on Mining Software Repositories.IEEE,2013:53-56. [5]AHMED S,BAGHERZADEH M.What Do Concurrency Deve-lopers Ask About?A Large-scale Study Using Stack Overflow[C]//Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement(ESEM’18).ACM,2018:1-10. [6]BAJAJ K,PATTABIRAMAN K,MESBAH A.Mining ques-tions asked by web developers[C]//Proceedings of the 11th Working Conference on Mining Software Repositories.ACM,2014:112-121. [7]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022. [8]GALVIS CARREÑO L V,WINBLADH K.Analysis of usercomments:an approach for software requirements evolution[C]//Proceedings of the 2013 International Conference on Software Engineering.IEEE,2013:582-591. [9]BAGHERZADEH M,KHATCHADOURIAN R.Going Big:A Large-Scale Study on What Big Data Developers Ask [C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2019:432-442. [10]BARUA A,THOMAS S W,HASSAN A E.What are develo-pers talking about? an analysis of topics and trends in stack overflow[J].Empirical Software Engineering,2014,19(3):619-654. [11]ROSEN C,SHIHAB E.What are mobile developers askingabout? a large scale study using stack overflow[J].Empirical Software Engineering,2016,21(3):1192-1223. [12]YANG X L,LO D,XIA X,et al.What security questions do developers ask? a large-scale study of stack overflow posts[J].Journal of Computer Science and Technology,2016,31(5):910-924. [13]HAN J,SHIHAB E,WAN Z,et al.What do Programmers Discuss about Deep Learning Frameworks[J].Empirical Software Engineering,2020,25(4):2694-2747. [14]LUKINS S K,KRAFT N A,ETZKORN L H,Source code retrieval for bug localization using latent dirichlet allocation[C]//2008 15th Working Conference on Reverse Engineering(WCRE’08).IEEE,2008:155-164. [15]KUHN A,DUCASSE S,GIRBA T.Semantic clustering:identi-fying topics in source code[J].Inf Softw Technol,2007,49(3):230-243. [16]PLETEA D,VASILESCU B,SEREBRENIK A.Security andemotion:sentiment analysis of security discussions on github[C]//Proceedings of the 11th Working Conference on Mining Software Repositories(MSR).2014:348-351. [17]ISLAM M J,NGUYEN H A,PAN R,et al.What do developers ask about ml libraries? a large-scale study using stack overflow[J].arXiv:1906.11940,2019. [18]MILLER G A.Wordnet:a lexical database for english[J].Communications of the ACM,1995,38(11):39-41. [19]GRIFFITHS T L,STEYVERS M.Finding scientific topics[J].Proceedings of the National Academy of Sciences of the United States of America,2004,101(Supplement 1):5228-5235. [20]NEWMAN D,LAU J H,GRIESER K,et al.Automatic evaluation of topic coherence[C]//Human Language Technologies:Conference of the North American Chapter of the Association of Computational Linguistics.2010:100-108. |
[1] | 王润安, 邹兆年. 基于物理操作级模型的查询执行时间预测方法 Query Performance Prediction Based on Physical Operation-level Models 计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074 |
[2] | 余本功, 张子薇, 王惠灵. 一种融合多层次情感和主题信息的TS-AC-EWM在线商品排序方法 TS-AC-EWM Online Product Ranking Method Based on Multi-level Emotion and Topic Information 计算机科学, 2022, 49(6A): 165-171. https://doi.org/10.11896/jsjkx.210400238 |
[3] | 梁静茹, 鄂海红, 宋美娜. 基于属性图模型的领域知识图谱构建方法 Method of Domain Knowledge Graph Construction Based on Property Graph Model 计算机科学, 2022, 49(2): 174-181. https://doi.org/10.11896/jsjkx.210500076 |
[4] | 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究 Research on Big Data Governance for Science and Technology Forecast 计算机科学, 2021, 48(9): 36-42. https://doi.org/10.11896/jsjkx.210500207 |
[5] | 黄梅根, 刘川, 杜欢, 刘佳乐. 基于知识图谱的认知诊断模型及其在教辅中的应用研究 Research on Cognitive Diagnosis Model Based on Knowledge Graph and Its Application in Teaching Assistant 计算机科学, 2021, 48(6A): 644-648. https://doi.org/10.11896/jsjkx.200700163 |
[6] | 范鹏浩, 黄国锐, 金培权. NVRC:一种面向NVM的写限制日志方案 NVRC:Write-limited Logging for Non-volatile Memory 计算机科学, 2021, 48(3): 130-135. https://doi.org/10.11896/jsjkx.200900071 |
[7] | 刘立成, 徐一凡, 谢贵才, 段磊. 面向NoSQL数据库的JSON文档异常检测与语义消歧模型 Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database 计算机科学, 2021, 48(2): 93-99. https://doi.org/10.11896/jsjkx.200900039 |
[8] | 凌飞, 陈世平. 基于区块链的企业联盟共享数字积分管理机制 Shared Digital Credits Management Mechanism of Enterprise Alliance Based on Blockchain 计算机科学, 2021, 48(11A): 533-539. https://doi.org/10.11896/jsjkx.201200170 |
[9] | 鄂海红, 韩鹏昊, 宋美娜. 关系型数据库向图数据库的转换方法 Conversion Method from Relational Database to Graph Database 计算机科学, 2021, 48(10): 140-144. https://doi.org/10.11896/jsjkx.201100073 |
[10] | 鲁佳文, 严丽. 对象关系数据库到RDF(S)的映射方法 Mapping Method from Object-relational Database to RDF(S) 计算机科学, 2021, 48(10): 145-151. https://doi.org/10.11896/jsjkx.200800006 |
[11] | 陆龙龙, 陈统, 潘敏学, 张天. CodeSearcher:基于自然语言功能描述的代码查询 CodeSearcher:Code Query Using Functional Descriptions in Natural Languages 计算机科学, 2020, 47(9): 1-9. https://doi.org/10.11896/jsjkx.191200170 |
[12] | 田野, 寿黎但, 陈珂, 骆歆远, 陈刚. 基于字段嵌入的数据库自然语言查询接口 Natural Language Interface for Databases with Content-based Table Column Embeddings 计算机科学, 2020, 47(9): 60-66. https://doi.org/10.11896/jsjkx.190800138 |
[13] | 冯安然, 王旭仁, 汪秋云, 熊梦博. 基于PCA和随机树的数据库异常访问检测 Database Anomaly Access Detection Based on Principal Component Analysis and Random Tree 计算机科学, 2020, 47(9): 94-98. https://doi.org/10.11896/jsjkx.190800056 |
[14] | 张善彬, 袁金钊, 陈辉, 王玉荣, 王杰, 屠长河. 基于标准路牌的车辆自定位 Vehicle Self-localization Based on Standard Road Sign 计算机科学, 2020, 47(7): 97-102. https://doi.org/10.11896/jsjkx.190900011 |
[15] | 周凯, 任怡, 汪哲, 管剑波, 张芳, 赵言亢. 基于主题模型的Ubuntu操作系统缺陷报告的分类及分析 Classification and Analysis of Ubuntu Bug Reports Based on Topic Model 计算机科学, 2020, 47(12): 35-41. https://doi.org/10.11896/jsjkx.200100022 |
|