计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 35-41.doi: 10.11896/jsjkx.200100022

• 复杂系统的软件工程和需求工程* • 上一篇    下一篇

基于主题模型的Ubuntu操作系统缺陷报告的分类及分析

周凯, 任怡, 汪哲, 管剑波, 张芳, 赵言亢   

  1. 国防科技大学计算机学院 长沙 410073
  • 收稿日期:2020-01-05 修回日期:2020-05-30 出版日期:2020-12-15 发布日期:2020-12-17
  • 通讯作者: 任怡(renyi@nudt.edu.cn)
  • 作者简介:alidechengbao@163.com
  • 基金资助:
    国家自然科学基金(61872444);国家核高基重大专项(2017ZX01038104-002)

Classification and Analysis of Ubuntu Bug Reports Based on Topic Model

ZHOU Kai, REN Yi, WANG Zhe, GUAN Jian-bo, ZHANG Fang, ZHAO Yan-kang   

  1. College of Computer National University of Defense Technology Changsha 410073,China
  • Received:2020-01-05 Revised:2020-05-30 Online:2020-12-15 Published:2020-12-17
  • About author:ZHOU Kai,born in 1996postgraduateis a member of China Computer Federation.His main research interests include system software and software engineering.
    REN Yi,born in 1977Ph.Dresearcher.Her main research interests include operating systemcloud computing and virtualizationdistributed computinglarge-scale mixed-source software code feature analysis and so on.
  • Supported by:
    National Natural Science Foundation of China(61872444) and National High-End Generic Chips and Basic Software Project of China(2017ZX01038104-002).

摘要: 软件缺陷(Bug)是造成系统失效的主要原因之一为了更好地开发软件与修复软件失效需要对缺陷的分布等特征有更好的理解.Ubuntu是一款得到广泛应用的开源软件也是Linux操作系统当前在全球最成功的发行版之一.利用缺陷报告来发掘软件缺陷特征对缺陷进行合理分类并分析操作系统常见缺陷的分布规律及特点对于基于Ubuntu的国产混源操作系统开发、测试及维护过程中的代码质量分析及提升具有重要参考价值.首先获取Launchpad上32805份Ubuntu操作系统的缺陷报告.然后采用主题模型分析Ubuntu上常见的缺陷并结合操作系统的组成特点将其分为内核相关异常、桌面环境异常、网络相关异常、硬件驱动相关异常以及上层应用及开发环境相关异常.进一步利用F1值对分类结果进行评估结果表明缺陷分类具有较好的准确率.最后通过分析缺陷报告统计结果得到Ubuntu操作系统的近期缺陷的一般分布规律和特点同时通过缺陷报告的分类结果得到了有助于进一步认知Ubuntu操作系统缺陷的相关发现和结论.

关键词: Ubuntu操作系统, LDA模型, 缺陷分类, 缺陷报告分析

Abstract: Software bug is the main cause of system failure.Better understanding of bug characteristics is needed to develop software and repairing failure.Ubuntu is one of the most successful distributions of the Linux operating system and also a popular open-source software platform in the world.Using bug reports to discover software bug characteristicsanalyze and classify reasonably common bugs of the operating systemhas important guiding value for the bug analysis during the developmenttesting and maintenance of the domestic mixed source operating system based on Ubuntu.Firstly32805 bug reportsare downloaded from launchpad through crawler.Though analyzing the common bug of Ubuntu by using topic modebug are divided into 5 categories:kernel relateddesktop environmentnetworkhardware driver related anomaly and the abnormal system management based on Ubuntu operating system composition and experience.Nextthe results of the classification are evaluated through F1 value.Finallythe general distribution rules and characteristics of the recent bugs in the Ubuntu operating system are obtained by analyzing the statistical results of the bug reports.At the same timethrough further analysis of the classification resultsrelevant findings and conclusions that help to further understand the bugs of Ubuntu operating system are obtained.

Key words: Ubuntu operating system, Latent dirichlet allocation model, Bug classification, Bug report analysis

中图分类号: 

  • TP311
[1] GitStatus-linux[EB/OL].(2018-09-15)[2019-10-05].https://phoronix.com/misc/linux-20180915/index.html.
[2] Synopsys.2019 Open Source Security and Risk Analysis (OSSRA) Report[OL].(2019-04-02).https://www.synopsys.com/content/dam/synopsys/sig-assets/reports/rep-ossra-19.pdf.
[3] LI H,GAO G,CHEN R,et al.The Influence Ranking for Testers in Bug Tracking Systems[J].International Journal of Software Engineering &Knowledge Engineering,2019,29(1):93-113.
[4] DANG Y,WU R,ZHANG H,et al.Rebucket:a method forclustering duplicate crash reports based on call stacksimilarity[C]//Proceedings of the ACM/IEEE International Conference on Software Engineering.Zurich,2012:1084-1093.
[5] BETTENBURG N,PREMRAJ R,ZIMMERMANNT,et al.Duplicate bug reports considered harmful...really?[C]//Proceedings of the IEEE International Conference on Software Maintenance.Beijing,2008:337-345.
[6] CAVALCANTI Y,MOTA SILVEIRA NETO P,LUCRDIO D,et al.The bug report duplication problem:an exploratory study[J].Software Quality Journal,2013,21:39-66.
[7] CHEN M,HU D Y,WANG T,et al.Using Document Embedding Techniques for Similar Bug ReportsRecommendation[C]//9th IEEE International Conference on Software Engineering and Service Science (ICSESS 2018).Beijing,2018.
[8] HU D Y,CHEN M,WANG T,et al.Recommending SimilarBug Reports:A Novel Approach Using Document Embedding Model[C]//APSEC.2018:725-726.
[9] BOISSELLE V,ADAMS B.The impact of cross-distributionbug duplicates,empirical study on Debian and Ubuntu[C]//IEEE International Working Conference on Source Code Analysis &Manipulation.IEEE,2015.
[10] LI Z,TAN L,WANG X,et al.Have Things Changed Now? An Empirical Study of Bug Characteristics in Modern Open Source Software[C]//Workshop on Architectural &System Support for Improving Software Dependability.DBLP,2006:25-33.
[11] TAN L,LIU C,LI Z,et al.Bug characteristics in open source software[J].Empirical Software Engineering,2014,19(6):1665-1705.
[12] REN X,HUANG Q,XIA X,et al.Characterizing Common and Domain-Specific Package Bugs:A Case Study on Ubuntu[C]//2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).IEEE,2018:426-431.
[13] Bug life cycle[EB/OL].(2009-03-03)[2019-10-05].https://dev.Launchpad.net/BugTriage/Draft?#Bug_life_cycle.
[14] Launchpad[EB/OL].[2019-12-20].https://bugs.launchpad-.net/ubuntu.
[15] BLEI D M,NG A Y,LATENT M I.Dirichlet allocation[J].Journal of MachineLearning Research,2003,3:993-1022.
[16] Snowball[EB/OL].(2001-11)[2019-10-06].http://snowball.tartarus.org/algorithms/english/stop.txt.
[17] MIMNO D M,WALLACH H M,TALLEYE M,et al.Optimizing Semantic Coherence in Topic Models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing(EMNLP '11).2011:262-272.
[18] ROEDER M,BOTH A,HINNEBURG A.Exploring the space of topic coherence measures[C]//Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.2015:399-408.
[19] GUO S,CHEN R,LI H,et al.Identify Severity Bug Report with Distribution Imbalance by CR-SMOTE and ELM[J].International Journal of Software Engineering and Knowledge Engineering,2019,29(2):139-175.
[20] AHMED M F,GOKHALE S S.Linux bugs:Life cycle,resolution and architectural analysis[J].Information and Software Technology ,2009,51(11):1618-1627.
[1] 段文静, 姜瑛. 基于用户反馈的APP软件缺陷识别[J]. 计算机科学, 2020, 47(6): 44-50.
[2] 王胜, 张仰森, 张雯, 蒋玉茹, 张睿. 基于SL-LDA的领域标签获取方法[J]. 计算机科学, 2020, 47(11): 95-100.
[3] 邱先标, 陈笑蓉. 一种基于SA_LDA模型的文本相似度计算方法[J]. 计算机科学, 2018, 45(6A): 106-109.
[4] 王振飞,刘凯莉,郑志蕴,王飞. 面向时间序列的微博话题演化模型研究[J]. 计算机科学, 2017, 44(8): 270-273.
[5] 李然,张华平,赵燕平,商建云. 基于主题模型与信息熵的中文文档自动摘要技术研究[J]. 计算机科学, 2014, 41(Z11): 298-300.
[6] 周利娟,林鸿飞,闫俊. 基于TLDA和SVSM的音乐信息检索模型[J]. 计算机科学, 2014, 41(2): 174-178.
[7] 王斌,吴太文,胡培培. 软件缺陷分类和分析研究[J]. 计算机科学, 2013, 40(9): 16-20.
[8] 卢露,丁才昌. 社区中最具影响力博客的探测模型[J]. 计算机科学, 2011, 38(Z10): 165-168.
[9] 张晓艳,王挺,梁晓波. LDA模型在话题追踪中的应用[J]. 计算机科学, 2011, 38(Z10): 136-139.
[10] 李宁,李战怀. 软件缺陷数据处理研究综述[J]. 计算机科学, 2009, 36(8): 21-25.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[2] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[3] 杨羽琦,章国安,金喜龙. 车载自组织网络中基于车辆密度的双簇头路由协议[J]. 计算机科学, 2018, 45(4): 126 -130 .
[4] 郑秀林,宋海燕,付伊鹏. MORUS-1280-128算法的区分分析[J]. 计算机科学, 2018, 45(4): 152 -156 .
[5] 朱淑芹,王文宏,李俊青. 针对基于感知器模型的混沌图像加密算法的选择明文攻击[J]. 计算机科学, 2018, 45(4): 178 -181 .
[6] 张景,朱国宾. 基于CBOW-LDA主题模型的Stack Overflow编程网站热点主题发现研究[J]. 计算机科学, 2018, 45(4): 208 -214 .
[7] 朱金彬,武继刚,隋秀峰. 基于极大团的边缘云节点聚合算法[J]. 计算机科学, 2018, 45(4): 60 -65 .
[8] 瞿中,赵从梅. 一种抗遮挡的自适应尺度目标跟踪算法[J]. 计算机科学, 2018, 45(4): 296 -300 .
[9] 李键红,吴亚榕,吕巨建. 基于组稀疏表示的在线单帧图像超分辨率算法[J]. 计算机科学, 2018, 45(4): 312 -318 .
[10] 项英倬, 谭菊仙, 韩杰思, 石浩. 图匹配技术研究[J]. 计算机科学, 2018, 45(6): 27 -31 .