Computer Science ›› 2019, Vol. 46 ›› Issue (11A): 66-71.

• Intelligent Computing • Previous Articles     Next Articles

Short Text Feature Extension Method Based on Bayesian Networks

LIU Hui-qing, GUO Yan-bu, LI Hong-ling, LI Wei-hua   

  1. (School of Information,Yunnan University,Kunming 650500,China)
  • Online:2019-11-10 Published:2019-11-20

Abstract: Aiming at the problems of feature sparsity and insuffcient representation ability in short text,this paper proposed a feature extension method based on Bayesian networks.Firstly,the semantic Bayesian network is constructed by defining the dependencies between the feature words in the short texts.Then,the correlation degree is defined between the feature word and the short text,and the feature words closely related to the short text are selected.These words are further extended to the short text to reduce the noise and sparsity of short texts.Finally,this paper analyzed the feasibility and effectiveness of the proposed method with the short text classification as the basic task of text analysis.The experimental results on the Amazon product dataset show that the proposed method is feasible and effective.

Key words: Bayesian network, Feature extension, Short text, Text analysis

CLC Number: 

  • TP391
[1]SEVERYN A,MOSCHITTI A.Learning to Rank Short TextPairs with Convolutional Deep Neural Networks[C]∥The International ACM SIGIR Conference.2015:373-382.
[2]ZHANG W,XUE G R,XUE G R,et al,Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia[J].Acm Transactions on Intelligent Systems & Technology,2012,3(2):36:1-36:25.
[3]NGUYEN T H,GRISHMAN R.Relation Extraction:Perspec-tive from Convolutional Neural Networks[C]∥The Workshop on Vector Space Modeling for Natural Language Processing.2015:39-48.
[4]MA H,JI Y,LI X,et al.A Microblog Hot Topic Detection Algorithm Based on Discrete Particle Swarm Optimization[C]∥Pacific Rim International Conference on Trends in Artificial Intelligence.2016:271-282.
[5]MA J L,LIU J L,YU C H.An efficient algorithm for Chinese text clustering[J].Computer Engineering & Science,2013,35(2):103-108.
[6]高永兵,钟振华,王宇,等.基于混合方法的中文微博自动摘要技术研究[J].计算机工程与科学,2016,38(6):1257-1261.
[7]王仲远,程健鹏,王海勋,等.短文本理解研究[J].计算机研究与发展,2016,53(2):262-269.
[8]YU Z,WANG H,LIN X,et al.Understanding short textsthrough semantic enrichment and hashing[J].IEEE Transactions on Knowledge & Data Engineering,2016,28(2):566-579.
[9]WANG Y,HUANG H,FENG C.Query Expansion Based on a Feedback Concept Model for Microblog Retrieval[C]∥International Conference on World Wide Web.2017:559-568.
[10]崔婉秋,杜军平,寇菲菲,等.面向微博短文本的社交与概念化语义扩展搜索方法[J].计算机研究与发展,2018,55(8):1641-1652.
[11]吕超镇,姬东鸿,吴飞飞.基于LDA特征扩展的短文本分类[J].计算机工程与应用,2015,51(4):123-127.
[12]XU K,FENG Y,HUANG S,et al.Semantic Relation Classification via Convolutional Neural Networks with Simple Negative Sampling[J].Computer Science,2015,71(7):941-949.
[13]SRIRAM B,FUHRY D,DEMIR E,et al.Short text classification in twitter to improve information filtering[C]∥Internatio-nal ACM SIGIR Conference on Research and Development in Information Retrieval.2010:841-842.
[14]ZHANG W,XU W,CHEN G,et al.A Feature Extraction Me-thod Based on Word Embedding for Word Similarity Computing[J].Communications in Computer & Information Science,2014,496:160-167.
[15]袁满,欧阳元新,熊璋,等.一种基于频繁词集的短文本特征扩展方法[J].东南大学学报(自然科学版),2014,44(2):256-260.
[16]郭永辉.面向短文本分类的特征扩展方法[D].哈尔滨:哈尔滨工业大学,2013.
[17]MENDES E.Introduction to Bayesian Networks[J].Medical Imaging Technology,2014,21(2):1-5.
[18]PEARL J.Probabilistic Reasoning in Intelligent Systems[M].Morgan Kaufmann Publishers,1988:1022-1027.
[19]YI Z H,WEI W L,XI C Y,et al.Research Progress of Probabilistic Graphical Models:A Survey[J].Journal of Software,2013,24(11):2476-2497.
[20]TANG B,KAY S,HE H.Toward Optimal Feature Selection in Naive Bayes for Text Categorization[J].IEEE Transactions on Knowledge & Data Engineering,2016,28(9):2508-2521.
[21]陈为,朱标,张宏鑫.BN-Mapping:基于贝叶斯网络的地理空间数据可视分析[J].计算机学报,2016(7):1281-1293.
[22]王双成,高瑞,杜瑞杰.具有超父结点时间序列贝叶斯网络集成回归模型[J].计算机学报,2017,40(12):2748-2761.
[23]HECKERMAN D,DAN G,CHICKERING D M.LearningBayesian networks:The combination of knowledge and statistical data[J].Machine Learning,1995,20(3):197-243.
[24]BLITZER J,DREDZE M,PEREIRA F.Biographies,Bollywood,Boom-boxes and Blenders:Domain Adaptation for Sentiment Classification[C]∥Proceedings of ACL’07.2007.
[1] LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi. Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network [J]. Computer Science, 2022, 49(9): 92-100.
[2] SHAO Xin-xin. TI-FastText Automatic Goods Classification Algorithm [J]. Computer Science, 2022, 49(6A): 206-210.
[3] LIU Shuo, WANG Geng-run, PENG Jian-hua, LI Ke. Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words [J]. Computer Science, 2022, 49(4): 282-287.
[4] LI Jia-rui, LING Xiao-bo, LI Chen-xi, LI Zi-mu, YANG Jia-hai, ZHANG Lei, WU Cheng-nan. Dynamic Network Security Analysis Based on Bayesian Attack Graphs [J]. Computer Science, 2022, 49(3): 62-69.
[5] ZHANG Hu, BAI Ping. Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification [J]. Computer Science, 2022, 49(2): 279-284.
[6] LI Chao, QIN Biao. Efficient Computation of Intervention in Causal Bayesian Networks [J]. Computer Science, 2022, 49(1): 279-284.
[7] SHI Wei, FU Yue. Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis [J]. Computer Science, 2021, 48(6A): 158-164.
[8] ZHANG Ming-yang, WANG Gang, PENG Qi, ZHANG Yan-feng. Data Analysis of OpenReview [J]. Computer Science, 2021, 48(6): 63-70.
[9] LI Chao, QIN Biao. Efficient Computation of MPE in Causal Bayesian Networks [J]. Computer Science, 2021, 48(4): 14-19.
[10] LU Bo-ren, HU Shi-zhe, LOU Zheng-zheng, YE Yang-dong. Character-level Feature Extraction Method for Railway Text Classification [J]. Computer Science, 2021, 48(3): 220-226.
[11] LI Jian-lan, PAN Yue, LI Xiao-cong, LIU Zi-wei, WANG Tian-yu. Chinese Commentary Text Research Status and Trend Analysis Based on CiteSpace [J]. Computer Science, 2021, 48(11A): 17-21.
[12] JI Nan-xun, SUN Xiao-yan, LI Zhen-qi. Fusion Vectorized Representation Learning of Multi-source Heterogeneous User-generated Contents [J]. Computer Science, 2021, 48(10): 51-58.
[13] CHENG Jing, LIU Na-na, MIN Ke-rui, KANG Yu, WANG Xin, ZHOU Yang-fan. Word Embedding Optimization for Low-frequency Words with Applications in Short-text Classification [J]. Computer Science, 2020, 47(8): 255-260.
[14] NI Hai-qing, LIU Dan, SHI Meng-yu. Chinese Short Text Summarization Generation Model Based on Semantic-aware [J]. Computer Science, 2020, 47(6): 74-78.
[15] XU Yuan-yin,CHAI Yu-mei,WANG Li-ming,LIU Zhen. Emotional Sentence Classification Method Based on OCC Model and Bayesian Network [J]. Computer Science, 2020, 47(3): 222-230.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!