计算机科学 ›› 2017, Vol. 44 ›› Issue (8): 207-215.doi: 10.11896/j.issn.1002-137X.2017.08.036

• 人工智能 • 上一篇    下一篇

基于三元概念分析的文本分类算法研究

李贞,张卓,王黎明   

  1. 郑州大学信息工程学院 郑州450001,郑州大学信息工程学院 郑州450001,郑州大学信息工程学院 郑州450001
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家青年科学基金项目(61303044)资助

Research on Text Classification Algorithm Based on Triadic Concept Analysis

LI Zhen, ZHANG Zhuo and WANG Li-ming   

  • Online:2018-11-13 Published:2018-11-13

摘要: 随着网络中三维数据的涌现,三元概念分析的优势也逐渐体现出来。三元概念分析是较新的研究领域,具有广阔的发展前景。提出基于三元概念分析的文本分类方法,该方法是一种全新的构思理念,是三元概念分析在应用上的拓展。该算法的主要思路是:首先将数据集预处理为三元背景,同时将背景中的二值关系扩展为0-1间的模糊关系,其用于表示特定条件下属性对于对象的隶属度,并基于此构建三元概念,利用三元概念表示数据集中文本、特征词与类别之间的三元关系;然后结合模糊理论中的贴近度,类比得出三元概念间的相似度,并运用相似性度量计算出训练集中三元概念与新文本的相似值。实验结果表明,文中所提模型是有效的,且在特定的数据集上相较于机器学习Support Vector Machine(SVM)算法、K-Nearest Neighbor(KNN)算法、卷积神经网络(CNN)算法以及基于形式概念分析的分类模型均有更好的分类效果。

关键词: 三元概念分析,三元概念,模糊理论,文本分类,三元概念相似度

Abstract: With the emergence of three-dimensional data in the network,the advantages of triadic concept analysis (TCA) have been reflected gradually.As a relatively new field,TCA has a bright prospect.This paper proposed a text classification algorithm based on TCA,which is a novel idea and a development of TCA in application aspect.The main idea of this algorithm is firstly preprocessing the dataset so that we can convert it into triadic context,meanwhile extend the binary relation in the context to a fuzzy value between 0-1 which represents membership degree about attribute for object under certain conditions.Based on this,we can build triadic concepts and utilize it to express the ternary relation among text,term and category.Then,combined with the approach degree in fuzzy theory,we can analogize the similarity formula of triadic concepts,accordingly calculate the training set’s similar value about triadic concept for a new text.Compared to support vector machine(SVM),K-nearest neighbor (KNN),convolution neural network (CNN) algorithm and classification based on formal concept analysis model,the results indicate that the proposed model in specific dataset is effective and achieves a better performance.

Key words: Triadic concept analysis,Triadic concept,Fuzzy theory,Text classification,Triadic concept similarity

[1] LEHMANN F,WILLE R.A triadic approach to formal concept analysis[C]∥International Conference on Conceptual Structures:Applications,Implementation and Theory (LNCS954).Heidelberg:Springer-Verlag,1995:32-43.
[2] GANTER B,WILLE R.Formal concept analysis:mathematical foundations[M].Berlin:Springer-Verlag,1999:66-68.
[3] BELOHLAVEK R,GLODEANU C,VYCHODIL V.Optimalfactorization of three-way binary data using triadic concepts[J].Order-A Journal on the Theory of Ordered Sets and Its Applications,2013,30(2):437-454.
[4] TANG Y Q,FAN M,LI J H.Cognitive system model and approach to transformation of information granules under triadic formal concept analysis[J].Journal of Shangdong University (Natural Science),2014,49(8):102-106.(in Chinese) 汤亚强,范敏,李金海.三元形式概念分析下的认知系统模型及信息粒转化方法[J].山东大学学报(理学版),2014,49(8):102-106.
[5] WEI L,WAN Q,QIAN T,et al.An overview of triadic concept analysis[J].Journal of Northwest University (Natural Science Edition),2014,44(5):689-699.(in Chinese) 魏玲,万青,钱婷,等.三元概念分析综述[J].西北大学学报(自然科学版),2014,44(5):689-699.
[6] CARPINETO C,MICHINI C,NICOLUSSI R.A ConceptLattice-Based Kernel for SVM Text Classification[C]∥Formal Concept Analysis,International Conference(ICFCA 2009).Darmstadt,Germany,2009:237-250.
[7] BELOHLAVEK R,BAETS B D,VYCHODIL J O V.InducingDecision Trees via Concept Lattices[J].International Journal of General Systems,2009,38(4):455-467.
[8] KANG X,LI D,WANG S.A multi-instance ensemble learningmodel based on concept lattice[J].Knowledge-Based Systems,2011,24(8):1203-1213.
[9] LI S T,TSAI F C.A fuzzy conceptualization model for text mi-ning with application in opinion polarity classification[J].Know-ledge-Based Systems,2013,39(2):23-33.
[10] LI S T,TSAI F C.Noise control in document classification based on fuzzy formal concept analysis[C]∥IEEE International Conference on Fuzzy Systems (FUZZ).2011:2583-2588.
[11] POELMANS J,IGNATOV D I,K UZNETSOV S O,et al.Formal concept analysis in knowledge processing:A survey on applications[J].Expert Systems with Applications,2013,40(16):6538-6560.
[12] LIU G J,WANG W Y.Research on the application of concept lattice in intelligent learning[C]∥Cross Strait Quad-Regional Radio Science and Wireless Technology Conference (CSQRWC),2011.IEEE,2011:1499-1501.
[13] PRISS U.Formal concept analysis in information science[J].Annual Review of Information Science & Technology,2006,40(1):521-543.
[14] BELOHLAVEK R,GLODEANU C,VYCHODIL V.Optimal Factorization of Three-Way Binary Data Using Triadic Concepts[J].Order-A Journal on the Theory of Ordered Sets & Its Applications,2013,30(2):437-454.
[15] IGNATOV D I,GNATYSHAK D V,K UZNETSOV S O,et al.Triadic Formal Concept Analysis and triclustering:searching for optimal patterns[J].Machine Learning,2015,101(1):271-302.
[16] TADRAT J,BOONJING V,PATTARAINTAKORN P.Buil-ding classification rules for case-based classifier using fuzzy sets and formal concept analysis[C]∥International Conference on Soft Computing As Transdisciplinary Science and Technology.ACM,Cergy-Pontoise,France,2008:13-18.
[17] FORMICA A.Concept similarity in Formal Concept Analysis:An information content approach[J].Knowledge-Based Systems,2008,21(1):80-87.
[18] LI Q,HE L,LIN X.Dimension reduction based on categorical fuzzy correlation degree for document categorization[C]∥IEEE International Conference on Granular Computing.2013:186-190.
[19] LIU X J.Study on the Construction Algorithm of Concept Trilattices and Its Application [D].Xi’an:Xidian University,2013.(in Chinese) 刘晓今.概念三元格构造算法及应用研究[D].西安:西安电子科技大学,2013.
[20] ZHANG Z,DU J,WANG L.Formal concept analysis approach for data extraction from a limited deep web database[J].Journal of Intelligent Information Systems,2013,41(2):211-234.
[21] TRABELSI C,JELASSI N,Y AHIA S B.Scalable mining of frequent tri-concepts from folksonomies[M]∥Advances in Know-ledge Discovery and Data Mining.Springer Berlin Heidelberg,2012:231-242.
[22] FENG G H.Review of Performance Evaluation of Text Classification[J].Journal of Intelligence,2011(8):66-70.(in Chinese) 奉国和.文本分类性能评价研究[J].情报杂志,2011(8):66-70.
[23] CHAI Y M,ZHANG Z,WANG L M.An Algorithm for Mining Global Closed Frequent Itemsets Based on Distributed Frequent Concept Direct Product[J].Chinese Journal of Computers,2012,35(5):990-1001.(in Chinese) 柴玉梅,张卓,王黎明.基于频繁概念直乘分布的全局闭频繁项集挖掘算法[J].计算机学报,2012,35(5):990-1001.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!