计算机科学 ›› 2019, Vol. 46 ›› Issue (12): 201-207.doi: 10.11896/jsjkx.181001856

• 软件与数据库技术 • 上一篇    下一篇

一种基于重叠社区发现的软件特征提取方法

刘春, 张国良   

  1. (河南大学计算机与信息工程学院 河南 开封475001)
  • 收稿日期:2018-10-08 出版日期:2019-12-15 发布日期:2019-12-17
  • 通讯作者: 张国良(1993-),男,硕士生,主要研究方向为软件需求工程,E-mail:glzhang@vip.henu.edu.cn。
  • 作者简介:刘春(1982-),男,副教授,主要研究方向为软件需求工程。

Software Feature Extraction Method Based on Overlapping Community Detection

LIU Chun, ZHANG Guo-liang   

  1. (School of Computer and Information Engineering,Henan University,Kaifeng,Henan 475001,China)
  • Received:2018-10-08 Online:2019-12-15 Published:2019-12-17

摘要: 近年来从软件产品的文本描述中提取软件特征获得了大量关注。考虑到产品文本描述中的句子能够更加清晰地表达一个特征的含义,并且文本描述中的每个句子可能会涉及多个软件特征,文中提出了一种通过发现软件产品文本描述中重叠的句子聚簇来提取软件特征的方法。基于复杂网络中的LMF重叠社区发现算法,所提方法通过自定义文本描述中句子之间的相识性度量,构建句子之间的相似性网络,然后发现句子相似性网络中的句子社区,实现对句子的聚类。每个句子社区蕴含一个软件特征,包含了所有潜在描述该软件特征的文本句子。所发现的句子社区可能存在重叠的句子,这些重叠句子同时涉及多个句子社区所蕴含的软件特征。进一步,为了帮助人们更好地理解句子社区所蕴含的特征,所提方法设计了相应的算法来从所有句子社区中依次选择熵最小的社区,并从所选社区中挑选最有代表性的、且其他社区还未选择的句子来作为一个社区所蕴含特征的描述符。文中爬取Softpedia.com网站的软件产品文本描述信息作为实验数据。实验结果表明,所提方法与现有代表性方法相比在准确性与时间方面具有更好的表现。

关键词: 特征提取, 重叠社区发现, 自然语言

Abstract: Extracting software features from natural language of product descriptions has gained a lot of attentions in recent years.In light that the sentences in the descriptions can describe the semantics of software features more precisely and one sentence may be concerned about more than one software feature,this paper proposed a feature identification method by detecting the overlapping clusters of these sentences in the natural language descriptions.Based on the overlapping community detection algorithm (LMF),the proposed method defines a metric to measure the similarity between each pair of sentences in the descriptions,builds a sentence similarity network accordingly,and then detects the overlapping sentence communities in such network.Each sentence community is a cluster which implies one software feature,and contains all the sentences potentially describing the implied feature.Further,in order to help people better understand the characteristics of sentence communities,the proposed method designs corresponding algorithms to select the communities with the lowest entropy from all sentence communities in turn,and to select the most representative sentences from the selected communities that have not been selected by other communities as descriptors of the features contained in the community.The natural language product descriptions from Soft pedia.com were crawled as experimental data.Experimental results show that the proposed method has better performance in accuracy and time consumption.

Key words: Feature extraction, Natural language, Overlapping community detection

中图分类号: 

  • TP311
[1]KANG K.Feature-Oriented Domain Analysis (FODA) Feasibi- lity Study[J].Technical Report Software Engineering Institute Carnegie Mellon University,1990,4(4):206-207.
[2]BERGER C.Kano’s methods for understanding customer-de- fined quality[J].Center for Quality Management Journal,1993,2(4):3-36.
[3]FERRARI A,SPAGNOLO G O,DELL’ORLETTA F.Mining commonalities and variabilities from natural language documents[C]//International Software Product Line Conference.New York:ACM,2013:116-120.
[4]HARIRI N,CASTROHERRERA C,MIRAKHORLI M,et al.Supporting Domain Analysis through Mining and Recommending Features from Online Product Listings[J].IEEE Transactions on Software Engineering,2013,39(12):1736-1752.
[5]LIU Y,LIU L,LIU H,et al.Mining domain knowledge from app descriptions[J].Journal of Systems & Software,2017,1(23):1-19.
[6]BAKAR N H,KASIRUN Z M,SALLEH N.Feature extraction approaches from natural language requirements for reuse in software product lines:A systematic literature review[J].Journal of Systems & Software,2015,106(C):132-149.
[7]BAKAR N H,KASIRUN Z M,SALLEH N,et al.Extracting features from online software reviews to aid requirements reuse[J].Applied Soft Computing,2016,49:1297-1315.
[8]JOHANN T,STANIK C,ALIREZA M A B,et al.SAFE:A Simple Approach for Feature Extraction from App Descriptions and App Reviews[C]//Requirements Engineering Conference.IEEE,2017:21-30.
[9]CHEN N,LIN J,HOI S C H,et al.AR-miner:mining informative reviews for developers from mobile app marketplace[C]//International Conference on Software Engineering.ACM,2014:767-778.
[10]VU P M,NGUYEN T T,PHAM H V,et al.Mining User Opini- ons in Mobile App Reviews:A Keyword-Based Approach (T)[J].Computer Science,2015,9(13):749-759.
[11]VU P M,PHAM H V,NGUYEN T T,et al.Phrase-based extraction of user opinions in mobile app reviews[C]//IEEE/ACM International Conference on Automated Software Engineering.IEEE,2016:726-731.
[12]GUZMAN E,MAALEJ W.How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews[C]//Requirements Engineering Conference.IEEE,2014:153-162.
[13]MOGOTSI I C,CHRISTOPHER D.Manning,Prabhakar Rag- havan,and Hinrich Schütze:Introduction to information retrieval[J].Information Retrieval,2010,13(2):192-195.
[14]LANCICHINETTI A,FORTUNATO S,KERTéSZ J.Detecting the overlapping and hierarchical community structure of complex networks[J].New Journal of Physics,2008,11(3):19-44.
[15]BEIL F,ESTER M,XU X.Frequent term-based text clustering[C]//Eighth International Conference on Knowledge Discovery and Data Mining.ACM,2002:436-442.
[16]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
[17]DHILLON I S,MODHA D S.Concept Decompositions for Large Sparse Text Data Using Clustering[J].Machine Lear-ning,2000,42(1/2).
[18]BEZDEK J C.Pattern Recognition with Fuzzy Objective Function Algorithms[J].Advanced Applications in Pattern Recognition,1981,22(1171):203-239.
[19]NIU N,SAVOLAINEN J,NIU Z,et al.A Systems Approach to Product Line Requirements Reuse[J].IEEE Systems Journal,2014,8(3):827-836.
[20]LIAN X,CLELAND-HUANG J,ZHANG L.Mining Associations Between Quality Concerns and Functional Requirements[C]//Requirements Engineering Conference.IEEE,2017:292-301.
[21]MEFTEH M,BOUASSIDA N,BENABDALLAH H.Mining Feature Models from Functional Requirements[J].Computer Journal,2016,59(12).
[22]YU Y,WANG H,YIN G,et al.Mining and recommending software features across multiple web repositories[C]//Asia-Pacific Symposium on Internetware.ACM,2013:1-9.
[23]SARRO F,ALSUBAIHIN A A,HARMAN M,et al.Feature lifecycles as they spread,migrate,remain,and die in App Stores[C]//Requirements Engineering Conference.IEEE,2015:76-85.
[1] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[2] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[3] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[4] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[5] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[6] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[7] 李小伟, 舒辉, 光焱, 翟懿, 杨资集.
自然语言处理在简历分析中的应用研究综述
Survey of the Application of Natural Language Processing for Resume Analysis
计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134
[8] 高元浩, 罗晓清, 张战成.
基于特征分离的红外与可见光图像融合算法
Infrared and Visible Image Fusion Based on Feature Separation
计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148
[9] 左杰格, 柳晓鸣, 蔡兵.
基于图像分块与特征融合的户外图像天气识别
Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion
计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263
[10] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[11] 张虎, 柏萍.
融入句子中远距离词语依赖的图卷积短文本分类方法
Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification
计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062
[12] 陈志毅, 隋杰.
基于DeepFM和卷积神经网络的集成式多模态谣言检测方法
DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection
计算机科学, 2022, 49(1): 101-107. https://doi.org/10.11896/jsjkx.201200007
[13] 陈湘涛, 赵美杰, 杨梅.
基于子图结构的局部社区发现算法
Overlapping Community Detection Algorithm Based on Subgraph Structure
计算机科学, 2021, 48(9): 244-250. https://doi.org/10.11896/jsjkx.201100010
[14] 张师鹏, 李永忠.
基于降噪自编码器和三支决策的入侵检测方法
Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions
计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059
[15] 冯霞, 胡志毅, 刘才华.
跨模态检索研究进展综述
Survey of Research Progress on Cross-modal Retrieval
计算机科学, 2021, 48(8): 13-23. https://doi.org/10.11896/jsjkx.200800165
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!