Computer Science ›› 2021, Vol. 48 ›› Issue (6): 48-56.doi: 10.11896/jsjkx.200800217

• Database & Big Data & Data Science • Previous Articles     Next Articles

Analysis of Topics on Database Systems in Stack Overflow

LIU Yun-han, SHA Chao-feng, NIU Jun-yu   

  1. School of Computer Science,Fudan University,Shanghai 200433,China
  • Received:2020-08-30 Revised:2020-10-24 Online:2021-06-15 Published:2021-06-03
  • About author:LIU Yun-han,born in 1996,postgra-duate.Her main research interests include natural language processing and software engineering.(18212010018@fudan.edu.cn)
    SHA Chao-feng,born in 1976,Ph.D,associate professor.His main research interests include machine learning & data mining,and natural language processing.
  • Supported by:
    National Key Research and Development Program of China(2018YFB0904503).

Abstract: Database management system has been a more mature software system,but software developers still encounter a variety of problems when using database systems to manage or analyze data.They would access Stack Overflow or other CQA forums to seek solutions.In this paper,94473 database related questions are obtained on Stack Overflow.Applying the LDA topic model on the dataset and grouping these questions into 25 topics,the results show that the developers’ questions can be classified as “table”“SQL” and “SELECT” etc.By studying the prevalence and difficulty of different database-related topics,it is found that a topic such as “SQL” is more popular.In addition,three different databases MySQL,Oracle and MongoDB are also studied,and the topic distribution of questions related to different database systems is analyzed in this paper.The findings of this paper will help to understand the challenges faced by database developers and thus provide suggestions for updating database system versions,design of database courses and even research questions in the field of database.

Key words: Database, LDA, Stack Overflow, Topic modeling

CLC Number: 

  • TP311
[1]MAMYKINA L,MANOIM B,MITTAL M,et al.Design lessons from the fastest Q&A site in the west[C]//Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.ACM,2011:2857-2866.
[2]Stack Overflow.Developer Survey Results 2019[EB/OL].(2020) [2020-03-20] https://insights.stackoverflow.com/survey/2019.
[3]TREUDE C,BARZILAY O,STOREY M A.How do programmers ask and answer questions on the web?(Nier track)[C]//2011 33rd International Conference on Software Engineering(ICSE).IEEE,2011:804-807.
[4]ALLAMANIS M,SUTTON C.Why,when,and what:analyzing stack overflow questions by topic,type,and code[C]//Procee-dings of the 10th Working Conference on Mining Software Repositories.IEEE,2013:53-56.
[5]AHMED S,BAGHERZADEH M.What Do Concurrency Deve-lopers Ask About?A Large-scale Study Using Stack Overflow[C]//Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement(ESEM’18).ACM,2018:1-10.
[6]BAJAJ K,PATTABIRAMAN K,MESBAH A.Mining ques-tions asked by web developers[C]//Proceedings of the 11th Working Conference on Mining Software Repositories.ACM,2014:112-121.
[7]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
[8]GALVIS CARREÑO L V,WINBLADH K.Analysis of usercomments:an approach for software requirements evolution[C]//Proceedings of the 2013 International Conference on Software Engineering.IEEE,2013:582-591.
[9]BAGHERZADEH M,KHATCHADOURIAN R.Going Big:A Large-Scale Study on What Big Data Developers Ask [C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2019:432-442.
[10]BARUA A,THOMAS S W,HASSAN A E.What are develo-pers talking about? an analysis of topics and trends in stack overflow[J].Empirical Software Engineering,2014,19(3):619-654.
[11]ROSEN C,SHIHAB E.What are mobile developers askingabout? a large scale study using stack overflow[J].Empirical Software Engineering,2016,21(3):1192-1223.
[12]YANG X L,LO D,XIA X,et al.What security questions do developers ask? a large-scale study of stack overflow posts[J].Journal of Computer Science and Technology,2016,31(5):910-924.
[13]HAN J,SHIHAB E,WAN Z,et al.What do Programmers Discuss about Deep Learning Frameworks[J].Empirical Software Engineering,2020,25(4):2694-2747.
[14]LUKINS S K,KRAFT N A,ETZKORN L H,Source code retrieval for bug localization using latent dirichlet allocation[C]//2008 15th Working Conference on Reverse Engineering(WCRE’08).IEEE,2008:155-164.
[15]KUHN A,DUCASSE S,GIRBA T.Semantic clustering:identi-fying topics in source code[J].Inf Softw Technol,2007,49(3):230-243.
[16]PLETEA D,VASILESCU B,SEREBRENIK A.Security andemotion:sentiment analysis of security discussions on github[C]//Proceedings of the 11th Working Conference on Mining Software Repositories(MSR).2014:348-351.
[17]ISLAM M J,NGUYEN H A,PAN R,et al.What do developers ask about ml libraries? a large-scale study using stack overflow[J].arXiv:1906.11940,2019.
[18]MILLER G A.Wordnet:a lexical database for english[J].Communications of the ACM,1995,38(11):39-41.
[19]GRIFFITHS T L,STEYVERS M.Finding scientific topics[J].Proceedings of the National Academy of Sciences of the United States of America,2004,101(Supplement 1):5228-5235.
[20]NEWMAN D,LAU J H,GRIESER K,et al.Automatic evaluation of topic coherence[C]//Human Language Technologies:Conference of the North American Chapter of the Association of Computational Linguistics.2010:100-108.
[1] WANG Run-an, ZOU Zhao-nian. Query Performance Prediction Based on Physical Operation-level Models [J]. Computer Science, 2022, 49(8): 49-55.
[2] YU Ben-gong, ZHANG Zi-wei, WANG Hui-ling. TS-AC-EWM Online Product Ranking Method Based on Multi-level Emotion and Topic Information [J]. Computer Science, 2022, 49(6A): 165-171.
[3] LIANG Jing-ru, E Hai-hong, Song Mei-na. Method of Domain Knowledge Graph Construction Based on Property Graph Model [J]. Computer Science, 2022, 49(2): 174-181.
[4] WANG Jun, WANG Xiu-lai, PANG Wei, ZHAO Hong-fei. Research on Big Data Governance for Science and Technology Forecast [J]. Computer Science, 2021, 48(9): 36-42.
[5] HUANG Mei-gen, LIU Chuan, DU Huan, LIU Jia-le. Research on Cognitive Diagnosis Model Based on Knowledge Graph and Its Application in Teaching Assistant [J]. Computer Science, 2021, 48(6A): 644-648.
[6] FAN Peng-hao, HUANG Guo-rui, JIN Pei-quan. NVRC:Write-limited Logging for Non-volatile Memory [J]. Computer Science, 2021, 48(3): 130-135.
[7] LIU Li-cheng, XU Yi-fan, XIE Gui-cai, DUAN Lei. Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database [J]. Computer Science, 2021, 48(2): 93-99.
[8] LING Fei, CHEN Shi-ping. Shared Digital Credits Management Mechanism of Enterprise Alliance Based on Blockchain [J]. Computer Science, 2021, 48(11A): 533-539.
[9] E Hai-hong, HAN Peng-hao, SONG Mei-na. Conversion Method from Relational Database to Graph Database [J]. Computer Science, 2021, 48(10): 140-144.
[10] LU Jia-wen, YAN Li. Mapping Method from Object-relational Database to RDF(S) [J]. Computer Science, 2021, 48(10): 145-151.
[11] TIAN Ye, SHOU Li-dan, CHEN Ke, LUO Xin-yuan, CHEN Gang. Natural Language Interface for Databases with Content-based Table Column Embeddings [J]. Computer Science, 2020, 47(9): 60-66.
[12] FENG An-ran, WANG Xu-ren, WANG Qiu-yun, XIONG Meng-bo. Database Anomaly Access Detection Based on Principal Component Analysis and Random Tree [J]. Computer Science, 2020, 47(9): 94-98.
[13] LU Long-long, CHEN Tong, PAN Min-xue, ZHANG Tian. CodeSearcher:Code Query Using Functional Descriptions in Natural Languages [J]. Computer Science, 2020, 47(9): 1-9.
[14] ZHANG Shan-bin, YUAN Jin-zhao, CHEN Hui, WANG Yu-rong, WANG Jie, TU Chang-he. Vehicle Self-localization Based on Standard Road Sign [J]. Computer Science, 2020, 47(7): 97-102.
[15] LAI Xin, ZENG Ji-wei. Study on Mapping Transformation from Geometric Aviation Data to Relational Database [J]. Computer Science, 2020, 47(11A): 570-572.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!