计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 150-158.doi: 10.11896/jsjkx.210500065

• 智能计算 • 上一篇    下一篇

融合Bert和图卷积的深度集成学习软件需求分类

康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩   

  1. 云南大学软件学院 昆明 650091
  • 出版日期:2022-06-10 发布日期:2022-06-08
  • 通讯作者: 李浩(lihao707@ynu.edu.cn)
  • 作者简介:(562530855@qq.com)
  • 基金资助:
    国家自然科学基金(61762092);云南省科技厅重大专项(2019ZE001-1,202002AB080001-6);云南省软件工程重点实验室开放基金(2020SE303)

Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution

KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao   

  1. College of Software,Yunnan University,Kunming 650091,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:KANG Yan,born in 1972,Ph.D,asso-ciate professor.Her main research inte-rests includetransfer learning,deep learning and integrated learning.
    LI Hao,born in 1970,Ph.D,professor.His main research interests include distributed computing,grid and cloud computing.
  • Supported by:
    National Natural Science Foundation of China(61762092),Major Project of Yunnan Provincial Science and Technology Department(2019ZE001-1,202002AB080001-6) and Open Foundation of Yunnan Key Laboratory of Software Engineering(2020SE303).

摘要: 随着软件数量和种类的快速增长,有效地挖掘软件需求的文本特征,并对软件功能性需求的文本特征进行分类,成为软件工程领域的一大挑战。软件功能性需求分类为整个软件开发过程提供了可靠的保障,并减小了需求分析阶段潜在的风险和负面影响。但是,软件需求文本的高分散性、高噪声、数据稀疏等特点限制了软件需求分析的有效性。提出双层词汇图卷积网络模型,创新性地对软件需求文本进行图建模,建立软件需求的图神经网络,有效捕获单词的知识边以及单词与文本之间的关系;并提出深度集成学习模型,集成多个深度学习分类模型,对软件需求文本进行分类。在数据集Windows_a和数据集Windows_b的实验中,融合Bert和图卷积的深度集成学习模型的准确率分别达到96.73%和95.60%,其明显优于其他文本分类模型,充分证明融合Bert和图卷积的深度集成学习模型能有效判别软件需求文本的功能特性,提高软件需求文本分类的准确性。

关键词: BERT, GCN, 集成学习, 软件需求, 文本分类, 文本特征

Abstract: With the rapid growth of software quantity and types,effectively mine the textual features of software requirements and classify the textual features of software functional requirements becomes a major challenge in the field of software enginee-ring.The classification of software functional requirements provides a reliable guarantee for the whole software development process and reduces the potential risks and negative effects in the requirements analysis stage.However,the validity of software requirement analysis is limited by the high dispersion,high noise and sparse data of software requirement text.In this paper,a two-layer lexical graph convolutional network model(TVGCCN) is proposed to model the graph of software requirement text innovatively,build the graph neural network of software requirement,and effectively capture the knowledge edge of words and the relationship between words and text.A deep integrated learning model is proposed,which integrates several deep learning classification models to classify software requirement text.In experiments of data set Wiodows_A and data Wiodows_B,the accuracy of deep ensemble learning model integrating Bert and graph convolution reaches 96.73% and 95.60% respectively,which is ob-viously better than that of other text classification models.It is fully proved that the deep ensemble learning model integrating Bert and graph convolution can effectively distinguish the functional characteristics of software requirement text and improve the accuracy of software requirement text classification.

Key words: BERT, Ensemble learning, GCN, Software requirements, Text classification, Text features

中图分类号: 

  • TP181
[1] ERNST N A,MYLOPOULOS J.On the perception of software quality requirements during the project lifecycle[C]//16th International Working Conference(REFSQ 2010).Springer Berlin Heidelberg,2010:143-157.
[2] NIU NEASTERBROOK S.Extracting and modeling productline functional requirements[C]//16th IEEE International Requirements Engineering Conference.2008:155-164.
[3] KNAUSS E,DAMIAN D,POO-CAAMANO G,et al.Detecting and classifying patterns of requirements clarifications[J].IEEE Computer Society,2012:251-260.
[4] KO Y,PARK S,SEO J,et al.Using classification techniques for informal requirements in the requirements analysis-supporting system[J].Information & Software Technology,2007,49(11/12):1128-1140.
[5] RAHIMI N,EASSA F,ELREFAEI L.An Ensemble Machine Learning Technique for Functional Requirement Classification[J].Symmetry,2020,12(10):1601.
[6] HU W S,YANG J F,ZHAO M.Demand analysis based on greyclustering algorithm[J].Computer Science,2016,43(S1):471-475.
[7] MARTIN J,KLEINROCK L.Excerpts from:An InformationSystems Manifesto[J].Communications of the ACM,1985,28(3):252-255.
[8] ABAD Z,KARRAS O,GHAZI P,et al.What Works Better? A Study of Classifying Requirements[C]//2017 IEEE 25th International Requirements Engineering Conference.IEEE,2017:496-501.
[9] TIUN S,MOKHTAR U A,BAKAR S H,et al.Classification of functional and non-functional requirement in software requirement using Word2vec and fast Text[J].Journal of Physics:Conference Series,2020,1529(4):042077.
[10] KIM Y.Convolutional Neural Networks for Sentence Classification[J].arXiv:1408.5882,2014.
[11] YAO L,MAO C,LUO Y.Graph Convolutional Networks for Text Classification[J].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):7370-7377.
[12] DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2018.
[13] KIP F N,WELLING M.Semi-Supervised Classification withGraph Convolutional Networks[J].arXiv:1609.02907,2016.
[14] JEONG C,JANG S,SHIN H,et al.A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks[J].arXiv:1903.06464,2019.
[15] RASCHKA S.Ensemble Vote Classifier-mlxtend[EB/OL].http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/.
[16] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[J].arXiv:1311.2524,2013.
[17] JOHNSON R,TONG Z.Deep Pyramid Convolutional NeuralNetworks for Text Categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.2017.
[18] WARSTADT A,SINGH A,BOWMAN S R.Neural network acceptability judgments[J].arXiv:1805.12471,2018.
[19] SOCHER R,PERELYGIN A,WU J,et al.Recursive deep mo-dels for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing(EMNLP).2013:1631-1642.
[20] KANG Y,CUI G R,LI H,et al.Software Requirements Clustering Algorithm Based on Self-attention Mechanism and Multi-channel Pyramid Convolution[J].Computer Science,2020,47(3):48-53.
[21] YAO L,MAO C,LUO Y.Graph convolutional networks fortext classification[J].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):7370-7377.
[22] LU Z,DU P,NIE J Y.VGCN-BERT:augmenting BERT with graph embedding for text classification[C]//European Confe-rence on Information Retrieval.Cham:Springer,2020:369-382.
[23] HOCHREITER,SEPP,SCHMIDHUBER,et al.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[24] DEVLIN J,CHANG M W,LEE K,et al.:Bert Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[25] LEVER J,KRZYWINSKI M,ALTMAN N.Classification evaluation[J].Nature Methods,2016,13(8):603-604.
[1] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[2] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[3] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[4] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[5] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[6] 于家畦, 康晓东, 白程程, 刘汉卿.
一种新的中文电子病历文本检索模型
New Text Retrieval Model of Chinese Electronic Medical Records
计算机科学, 2022, 49(6A): 32-38. https://doi.org/10.11896/jsjkx.210400198
[7] 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓.
一种可快速迁移的领域知识图谱构建方法
Fast and Transmissible Domain Knowledge Graph Construction Method
计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[8] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[9] 余本功, 张子薇, 王惠灵.
一种融合多层次情感和主题信息的TS-AC-EWM在线商品排序方法
TS-AC-EWM Online Product Ranking Method Based on Multi-level Emotion and Topic Information
计算机科学, 2022, 49(6A): 165-171. https://doi.org/10.11896/jsjkx.210400238
[10] 邵欣欣.
TI-FastText自动商品分类算法
TI-FastText Automatic Goods Classification Algorithm
计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[11] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[12] 郭雨欣, 陈秀宏.
融合BERT词嵌入表示和主题信息增强的自动摘要模型
Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement
计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[13] 邓朝阳, 仲国强, 王栋.
基于注意力门控图神经网络的文本分类
Text Classification Based on Attention Gated Graph Neural Network
计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218
[14] 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳.
基于共同子空间分类学习的跨媒体检索研究
Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning
计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157
[15] 刘硕, 王庚润, 彭建华, 李柯.
基于混合字词特征的中文短文本分类算法
Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words
计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!