计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 35-41.doi: 10.11896/jsjkx.200100022

所属专题: 复杂系统的软件工程和需求工程

• 复杂系统的软件工程和需求工程* • 上一篇    下一篇

基于主题模型的Ubuntu操作系统缺陷报告的分类及分析

周凯, 任怡, 汪哲, 管剑波, 张芳, 赵言亢   

  1. 国防科技大学计算机学院 长沙 410073
  • 收稿日期:2020-01-05 修回日期:2020-05-30 出版日期:2020-12-15 发布日期:2020-12-17
  • 通讯作者: 任怡(renyi@nudt.edu.cn)
  • 作者简介:alidechengbao@163.com
  • 基金资助:
    国家自然科学基金(61872444);国家核高基重大专项(2017ZX01038104-002)

Classification and Analysis of Ubuntu Bug Reports Based on Topic Model

ZHOU Kai, REN Yi, WANG Zhe, GUAN Jian-bo, ZHANG Fang, ZHAO Yan-kang   

  1. College of Computer National University of Defense Technology Changsha 410073,China
  • Received:2020-01-05 Revised:2020-05-30 Online:2020-12-15 Published:2020-12-17
  • About author:ZHOU Kai,born in 1996postgraduateis a member of China Computer Federation.His main research interests include system software and software engineering.
    REN Yi,born in 1977Ph.Dresearcher.Her main research interests include operating systemcloud computing and virtualizationdistributed computinglarge-scale mixed-source software code feature analysis and so on.
  • Supported by:
    National Natural Science Foundation of China(61872444) and National High-End Generic Chips and Basic Software Project of China(2017ZX01038104-002).

摘要: 软件缺陷(Bug)是造成系统失效的主要原因之一为了更好地开发软件与修复软件失效需要对缺陷的分布等特征有更好的理解.Ubuntu是一款得到广泛应用的开源软件也是Linux操作系统当前在全球最成功的发行版之一.利用缺陷报告来发掘软件缺陷特征对缺陷进行合理分类并分析操作系统常见缺陷的分布规律及特点对于基于Ubuntu的国产混源操作系统开发、测试及维护过程中的代码质量分析及提升具有重要参考价值.首先获取Launchpad上32805份Ubuntu操作系统的缺陷报告.然后采用主题模型分析Ubuntu上常见的缺陷并结合操作系统的组成特点将其分为内核相关异常、桌面环境异常、网络相关异常、硬件驱动相关异常以及上层应用及开发环境相关异常.进一步利用F1值对分类结果进行评估结果表明缺陷分类具有较好的准确率.最后通过分析缺陷报告统计结果得到Ubuntu操作系统的近期缺陷的一般分布规律和特点同时通过缺陷报告的分类结果得到了有助于进一步认知Ubuntu操作系统缺陷的相关发现和结论.

关键词: LDA模型, Ubuntu操作系统, 缺陷报告分析, 缺陷分类

Abstract: Software bug is the main cause of system failure.Better understanding of bug characteristics is needed to develop software and repairing failure.Ubuntu is one of the most successful distributions of the Linux operating system and also a popular open-source software platform in the world.Using bug reports to discover software bug characteristicsanalyze and classify reasonably common bugs of the operating systemhas important guiding value for the bug analysis during the developmenttesting and maintenance of the domestic mixed source operating system based on Ubuntu.Firstly32805 bug reportsare downloaded from launchpad through crawler.Though analyzing the common bug of Ubuntu by using topic modebug are divided into 5 categories:kernel relateddesktop environmentnetworkhardware driver related anomaly and the abnormal system management based on Ubuntu operating system composition and experience.Nextthe results of the classification are evaluated through F1 value.Finallythe general distribution rules and characteristics of the recent bugs in the Ubuntu operating system are obtained by analyzing the statistical results of the bug reports.At the same timethrough further analysis of the classification resultsrelevant findings and conclusions that help to further understand the bugs of Ubuntu operating system are obtained.

Key words: Bug classification, Bug report analysis, Latent dirichlet allocation model, Ubuntu operating system

中图分类号: 

  • TP311
[1] GitStatus-linux[EB/OL].(2018-09-15)[2019-10-05].https://phoronix.com/misc/linux-20180915/index.html.
[2] Synopsys.2019 Open Source Security and Risk Analysis (OSSRA) Report[OL].(2019-04-02).https://www.synopsys.com/content/dam/synopsys/sig-assets/reports/rep-ossra-19.pdf.
[3] LI H,GAO G,CHEN R,et al.The Influence Ranking for Testers in Bug Tracking Systems[J].International Journal of Software Engineering &Knowledge Engineering,2019,29(1):93-113.
[4] DANG Y,WU R,ZHANG H,et al.Rebucket:a method forclustering duplicate crash reports based on call stacksimilarity[C]//Proceedings of the ACM/IEEE International Conference on Software Engineering.Zurich,2012:1084-1093.
[5] BETTENBURG N,PREMRAJ R,ZIMMERMANNT,et al.Duplicate bug reports considered harmful...really?[C]//Proceedings of the IEEE International Conference on Software Maintenance.Beijing,2008:337-345.
[6] CAVALCANTI Y,MOTA SILVEIRA NETO P,LUCRDIO D,et al.The bug report duplication problem:an exploratory study[J].Software Quality Journal,2013,21:39-66.
[7] CHEN M,HU D Y,WANG T,et al.Using Document Embedding Techniques for Similar Bug ReportsRecommendation[C]//9th IEEE International Conference on Software Engineering and Service Science (ICSESS 2018).Beijing,2018.
[8] HU D Y,CHEN M,WANG T,et al.Recommending SimilarBug Reports:A Novel Approach Using Document Embedding Model[C]//APSEC.2018:725-726.
[9] BOISSELLE V,ADAMS B.The impact of cross-distributionbug duplicates,empirical study on Debian and Ubuntu[C]//IEEE International Working Conference on Source Code Analysis &Manipulation.IEEE,2015.
[10] LI Z,TAN L,WANG X,et al.Have Things Changed Now? An Empirical Study of Bug Characteristics in Modern Open Source Software[C]//Workshop on Architectural &System Support for Improving Software Dependability.DBLP,2006:25-33.
[11] TAN L,LIU C,LI Z,et al.Bug characteristics in open source software[J].Empirical Software Engineering,2014,19(6):1665-1705.
[12] REN X,HUANG Q,XIA X,et al.Characterizing Common and Domain-Specific Package Bugs:A Case Study on Ubuntu[C]//2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).IEEE,2018:426-431.
[13] Bug life cycle[EB/OL].(2009-03-03)[2019-10-05].https://dev.Launchpad.net/BugTriage/Draft?#Bug_life_cycle.
[14] Launchpad[EB/OL].[2019-12-20].https://bugs.launchpad-.net/ubuntu.
[15] BLEI D M,NG A Y,LATENT M I.Dirichlet allocation[J].Journal of MachineLearning Research,2003,3:993-1022.
[16] Snowball[EB/OL].(2001-11)[2019-10-06].http://snowball.tartarus.org/algorithms/english/stop.txt.
[17] MIMNO D M,WALLACH H M,TALLEYE M,et al.Optimizing Semantic Coherence in Topic Models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing(EMNLP '11).2011:262-272.
[18] ROEDER M,BOTH A,HINNEBURG A.Exploring the space of topic coherence measures[C]//Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.2015:399-408.
[19] GUO S,CHEN R,LI H,et al.Identify Severity Bug Report with Distribution Imbalance by CR-SMOTE and ELM[J].International Journal of Software Engineering and Knowledge Engineering,2019,29(2):139-175.
[20] AHMED M F,GOKHALE S S.Linux bugs:Life cycle,resolution and architectural analysis[J].Information and Software Technology ,2009,51(11):1618-1627.
[1] 王俊, 王修来, 庞威, 赵鸿飞.
面向科技前瞻预测的大数据治理研究
Research on Big Data Governance for Science and Technology Forecast
计算机科学, 2021, 48(9): 36-42. https://doi.org/10.11896/jsjkx.210500207
[2] 段文静, 姜瑛.
基于用户反馈的APP软件缺陷识别
Defect Recognition of APP Software Based on User Feedback
计算机科学, 2020, 47(6): 44-50. https://doi.org/10.11896/jsjkx.191100133
[3] 王胜, 张仰森, 张雯, 蒋玉茹, 张睿.
基于SL-LDA的领域标签获取方法
Domain Label Acquisition Method Based on SL-LDA Model
计算机科学, 2020, 47(11): 95-100. https://doi.org/10.11896/jsjkx.190900012
[4] 邱先标, 陈笑蓉.
一种基于SA_LDA模型的文本相似度计算方法
Text Similarity Calculation Algorithm Based on SA_LDA Model
计算机科学, 2018, 45(6A): 106-109.
[5] 王振飞,刘凯莉,郑志蕴,王飞.
面向时间序列的微博话题演化模型研究
Research on Evolution Model of Microblog Topic Based on Time Sequence
计算机科学, 2017, 44(8): 270-273. https://doi.org/10.11896/j.issn.1002-137X.2017.08.046
[6] 李然,张华平,赵燕平,商建云.
基于主题模型与信息熵的中文文档自动摘要技术研究
Automatic Text Summarization Research Based on Topic Model and Information Entropy
计算机科学, 2014, 41(Z11): 298-300.
[7] 周利娟,林鸿飞,闫俊.
基于TLDA和SVSM的音乐信息检索模型
Tags Know You Better:A New Approach to Enhancing MIR System
计算机科学, 2014, 41(2): 174-178.
[8] 王斌,吴太文,胡培培.
软件缺陷分类和分析研究
Research on Software Defect Classification and Analysis
计算机科学, 2013, 40(9): 16-20.
[9] 卢露,丁才昌.
社区中最具影响力博客的探测模型
Model of Identifying the Influentials in Blog Community
计算机科学, 2011, 38(Z10): 165-168.
[10] 张晓艳,王挺,梁晓波.
LDA模型在话题追踪中的应用
Use of LDA Model in Topic Tracking
计算机科学, 2011, 38(Z10): 136-139.
[11] 李宁,李战怀.
软件缺陷数据处理研究综述
Overview of Software Defect Data Processing Research
计算机科学, 2009, 36(8): 21-25.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!