计算机科学 ›› 2019, Vol. 46 ›› Issue (12): 201-207.doi: 10.11896/jsjkx.181001856
刘春, 张国良
LIU Chun, ZHANG Guo-liang
摘要: 近年来从软件产品的文本描述中提取软件特征获得了大量关注。考虑到产品文本描述中的句子能够更加清晰地表达一个特征的含义,并且文本描述中的每个句子可能会涉及多个软件特征,文中提出了一种通过发现软件产品文本描述中重叠的句子聚簇来提取软件特征的方法。基于复杂网络中的LMF重叠社区发现算法,所提方法通过自定义文本描述中句子之间的相识性度量,构建句子之间的相似性网络,然后发现句子相似性网络中的句子社区,实现对句子的聚类。每个句子社区蕴含一个软件特征,包含了所有潜在描述该软件特征的文本句子。所发现的句子社区可能存在重叠的句子,这些重叠句子同时涉及多个句子社区所蕴含的软件特征。进一步,为了帮助人们更好地理解句子社区所蕴含的特征,所提方法设计了相应的算法来从所有句子社区中依次选择熵最小的社区,并从所选社区中挑选最有代表性的、且其他社区还未选择的句子来作为一个社区所蕴含特征的描述符。文中爬取Softpedia.com网站的软件产品文本描述信息作为实验数据。实验结果表明,所提方法与现有代表性方法相比在准确性与时间方面具有更好的表现。
中图分类号:
[1]KANG K.Feature-Oriented Domain Analysis (FODA) Feasibi- lity Study[J].Technical Report Software Engineering Institute Carnegie Mellon University,1990,4(4):206-207.[2]BERGER C.Kano’s methods for understanding customer-de- fined quality[J].Center for Quality Management Journal,1993,2(4):3-36.[3]FERRARI A,SPAGNOLO G O,DELL’ORLETTA F.Mining commonalities and variabilities from natural language documents[C]//International Software Product Line Conference.New York:ACM,2013:116-120.[4]HARIRI N,CASTROHERRERA C,MIRAKHORLI M,et al.Supporting Domain Analysis through Mining and Recommending Features from Online Product Listings[J].IEEE Transactions on Software Engineering,2013,39(12):1736-1752.[5]LIU Y,LIU L,LIU H,et al.Mining domain knowledge from app descriptions[J].Journal of Systems & Software,2017,1(23):1-19.[6]BAKAR N H,KASIRUN Z M,SALLEH N.Feature extraction approaches from natural language requirements for reuse in software product lines:A systematic literature review[J].Journal of Systems & Software,2015,106(C):132-149.[7]BAKAR N H,KASIRUN Z M,SALLEH N,et al.Extracting features from online software reviews to aid requirements reuse[J].Applied Soft Computing,2016,49:1297-1315.[8]JOHANN T,STANIK C,ALIREZA M A B,et al.SAFE:A Simple Approach for Feature Extraction from App Descriptions and App Reviews[C]//Requirements Engineering Conference.IEEE,2017:21-30.[9]CHEN N,LIN J,HOI S C H,et al.AR-miner:mining informative reviews for developers from mobile app marketplace[C]//International Conference on Software Engineering.ACM,2014:767-778.[10]VU P M,NGUYEN T T,PHAM H V,et al.Mining User Opini- ons in Mobile App Reviews:A Keyword-Based Approach (T)[J].Computer Science,2015,9(13):749-759.[11]VU P M,PHAM H V,NGUYEN T T,et al.Phrase-based extraction of user opinions in mobile app reviews[C]//IEEE/ACM International Conference on Automated Software Engineering.IEEE,2016:726-731.[12]GUZMAN E,MAALEJ W.How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews[C]//Requirements Engineering Conference.IEEE,2014:153-162.[13]MOGOTSI I C,CHRISTOPHER D.Manning,Prabhakar Rag- havan,and Hinrich Schütze:Introduction to information retrieval[J].Information Retrieval,2010,13(2):192-195.[14]LANCICHINETTI A,FORTUNATO S,KERTéSZ J.Detecting the overlapping and hierarchical community structure of complex networks[J].New Journal of Physics,2008,11(3):19-44.[15]BEIL F,ESTER M,XU X.Frequent term-based text clustering[C]//Eighth International Conference on Knowledge Discovery and Data Mining.ACM,2002:436-442.[16]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022.[17]DHILLON I S,MODHA D S.Concept Decompositions for Large Sparse Text Data Using Clustering[J].Machine Lear-ning,2000,42(1/2).[18]BEZDEK J C.Pattern Recognition with Fuzzy Objective Function Algorithms[J].Advanced Applications in Pattern Recognition,1981,22(1171):203-239.[19]NIU N,SAVOLAINEN J,NIU Z,et al.A Systems Approach to Product Line Requirements Reuse[J].IEEE Systems Journal,2014,8(3):827-836.[20]LIAN X,CLELAND-HUANG J,ZHANG L.Mining Associations Between Quality Concerns and Functional Requirements[C]//Requirements Engineering Conference.IEEE,2017:292-301.[21]MEFTEH M,BOUASSIDA N,BENABDALLAH H.Mining Feature Models from Functional Requirements[J].Computer Journal,2016,59(12).[22]YU Y,WANG H,YIN G,et al.Mining and recommending software features across multiple web repositories[C]//Asia-Pacific Symposium on Internetware.ACM,2013:1-9.[23]SARRO F,ALSUBAIHIN A A,HARMAN M,et al.Feature lifecycles as they spread,migrate,remain,and die in App Stores[C]//Requirements Engineering Conference.IEEE,2015:76-85. |
[1] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[2] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[3] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[4] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[5] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[6] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[7] | 李小伟, 舒辉, 光焱, 翟懿, 杨资集. 自然语言处理在简历分析中的应用研究综述 Survey of the Application of Natural Language Processing for Resume Analysis 计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134 |
[8] | 高元浩, 罗晓清, 张战成. 基于特征分离的红外与可见光图像融合算法 Infrared and Visible Image Fusion Based on Feature Separation 计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148 |
[9] | 左杰格, 柳晓鸣, 蔡兵. 基于图像分块与特征融合的户外图像天气识别 Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion 计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263 |
[10] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[11] | 张虎, 柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法 Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification 计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062 |
[12] | 陈志毅, 隋杰. 基于DeepFM和卷积神经网络的集成式多模态谣言检测方法 DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection 计算机科学, 2022, 49(1): 101-107. https://doi.org/10.11896/jsjkx.201200007 |
[13] | 陈湘涛, 赵美杰, 杨梅. 基于子图结构的局部社区发现算法 Overlapping Community Detection Algorithm Based on Subgraph Structure 计算机科学, 2021, 48(9): 244-250. https://doi.org/10.11896/jsjkx.201100010 |
[14] | 张师鹏, 李永忠. 基于降噪自编码器和三支决策的入侵检测方法 Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions 计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059 |
[15] | 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述 Survey of Research Progress on Cross-modal Retrieval 计算机科学, 2021, 48(8): 13-23. https://doi.org/10.11896/jsjkx.200800165 |
|