Computer Science ›› 2019, Vol. 46 ›› Issue (6A): 478-481.

• Big Data & Data Mining • Previous Articles     Next Articles

Method of Short Text Classification Based on Frequent Item Feature Extension

JIN Yi-fan, FU Ying-xun, MA Li   

  1. College of Information,North China University of Technology,Beijing 100144,China
  • Online:2019-06-14 Published:2019-07-02

Abstract: Short text has the characteristics of high feature dimension and sparse,as a result,the traditional classification method is not effective in short text classification.To solve this problem,a short text classification method based on frequent item feature extension called STCFIFE was proposed.First of all,frequent itemsets in the background corpus are mined through FP-growth algorithm,and combining the contextual association feature,the extended feature weight is calculated.Then the new features are added to the feature space of the original short text.On this basis,SVM (Support Vector Machine) classifier is trained for classification.The experimental results show that,compared with the traditional SVM algorithm and the LDA+KNN algorithm,STCFIFE can effectively alleviate problems of feature deficiency and high dimensional sparsity in short text and improves F1 value by 2%~10%,improving the classification effect in short text.

Key words: Feature extension, Feature weight, Frequent item mining, Short text classification, Support vector machine

CLC Number: 

  • TP391
[1]张志飞,苗夺谦,高灿.基于LDA主题模型的短文本分类方法[J].计算机应用,2013,33(6):1587-1590.
[2]王雯,赵衎衎,李翠平,等.Spark平台下的短文本征扩展与分类研究[J].计算机科学与探索,2017,34(5):1-9.
[3]王振振,何明,杜永萍.基于LDA主题模型的文本相似度计算[J].计算机科学2013,40(12):229-232.
[4]石晶,李万龙.基于LDA模型的主题分析[J].自动化学报,2009,35(12):1586-1593.
[5]YANG Y,ZHANG J,KISIEL B.A scalability analysis of classifiers in text categorization [C]∥Proceedings of the 26th ACM International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-03).Toronto:ACM Press,2003:96-103.
[6]JOACHIMS T.Text Categorization with Support Vector Ma-chines:Learning with Many Relevant Features[J].Machine Learning,1998,1398(23):137-142.
[7]CALMA A,REITMAIER T,SICK B.Semi-Supervised Active Learning for Support Vector Machines:A Novel Approach that Exploits Structure Information in Data[J].Information Sciences,2018,456:13-22.
[8]徐光美,刘宏哲,张敬尊.基于特征加权的多关系朴素贝叶斯分类模型[J].计算机科学,2014,41(2):283-285.
[9] 胡元,石冰.基于区域划分的KNN文本快速分类算法研究[J].计算机科学,2012,39(10):182-186.
[10]季一木,张永潘,郎贤波,等.面向流数据的决策树分类算法并行化[J].计算机研究与发展,2017,54(9):1945-1957.
[11]SHIRAKAWA M,NAKAYAMA K,HARA T,et al.Wikipedia-Based Semantic Similarity Measurementsfor Noisy Short Texts Using Extended Naive Bayes[J].IEEE Transactionson Emerging Topics in Computing,2015,3(2):1.
[12]LIU W S,CAO Z W,WANG J,et al.Short text classification based on Wikipedia and Word2vec[C]∥2nd IEEE International Conference on Computer and Communications (ICCC).2016.
[13]HE H,CHEN B,XU W,et al.Short Text Feature Extraction and Clustering for Web Topic Mining[C]∥Proceedings of the Third International Conference on Semantics,Knowledge and Grid.IEEE Computer Society,2007:382-385.
[14]LIU J L,YAN Y Y.SMS Text Classification Method Based on Context[J].Computer Engineering,2011,37(10):41-43.
[15]CHEN Q U,YAO L X,YANG J.Short text classification based on LDA topic model[C]∥International Conference on Audio,Language and Image Processing (ICALIP).2016.
[16]WANG X L,WANG J,YANG Y.Labeled LDA-Kernel SVM:A Short Chinese Text Supervised Classification Based on SinaWeibo[C]∥4th International Conference on Information Science and Control Engineering(ICISCE).2017.
[17]YUAN M.Feature Extension for Short Text Categorization Using Frequent Term Sets[J].Elsevier Procedia Computer Scien-ce,2014,31:663-670.
[18]FENG G,LI S,SUN T,et al.A Probabilistic Model Derived Term Weighting Scheme for Text Classification[J].Pattern Recognition Letters,2018,110:23-29.
[19]MIROΗCZUK M M,PROTASIEWICZ J.A Recent Overview of the State-of-the-Art Elements of Text Classification[J].Expert Systems with Applications,2018,106:36-54.
[20]LI H,WANG Y,ZHANG D,et al.Pfp:parallel fpgrowth for query recommendation[C]∥Proceedings of the 2008 ACM Conference on Recommender Systems.ACM,2008:107-114.
[21]SOGOULABS.SogouCS,version:2012[OL].http://www.sogou.com/ labs/resource/cs.php.
[1] HOU Xia-ye, CHEN Hai-yan, ZHANG Bing, YUAN Li-gang, JIA Yi-zhen. Active Metric Learning Based on Support Vector Machines [J]. Computer Science, 2022, 49(6A): 113-118.
[2] SHAO Xin-xin. TI-FastText Automatic Goods Classification Algorithm [J]. Computer Science, 2022, 49(6A): 206-210.
[3] SHAN Xiao-ying, REN Ying-chun. Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm [J]. Computer Science, 2022, 49(6A): 211-216.
[4] CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[5] XING Yun-bing, LONG Guang-yu, HU Chun-yu, HU Li-sha. Human Activity Recognition Method Based on Class Increment SVM [J]. Computer Science, 2022, 49(5): 78-83.
[6] LIU Shuo, WANG Geng-run, PENG Jian-hua, LI Ke. Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words [J]. Computer Science, 2022, 49(4): 282-287.
[7] ZHANG Hu, BAI Ping. Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification [J]. Computer Science, 2022, 49(2): 279-284.
[8] GUO Fu-min, ZHANG Hua, HU Rong-hua, SONG Yan. Study on Method for Estimating Wrist Muscle Force Based on Surface EMG Signals [J]. Computer Science, 2021, 48(6A): 317-320.
[9] ZHUO Ya-qian, OU Bo. Face Anti-spoofing Algorithm for Noisy Environment [J]. Computer Science, 2021, 48(6A): 443-447.
[10] LEI Jian-mei, ZENG Ling-qiu, MU Jie, CHEN Li-dong, WANG Cong, CHAI Yong. Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning [J]. Computer Science, 2021, 48(6): 190-195.
[11] WANG You-wei, ZHU Chen, ZHU Jian-ming, LI Yang, FENG Li-zhou, LIU Jiang-chun. User Interest Dictionary and LSTM Based Method for Personalized Emotion Classification [J]. Computer Science, 2021, 48(11A): 251-257.
[12] CHENG Jing, LIU Na-na, MIN Ke-rui, KANG Yu, WANG Xin, ZHOU Yang-fan. Word Embedding Optimization for Low-frequency Words with Applications in Short-text Classification [J]. Computer Science, 2020, 47(8): 255-260.
[13] CAO Su-e, YANG Ze-min. Prediction of Wireless Network Traffic Based on Clustering Analysis and Optimized Support Vector Machine [J]. Computer Science, 2020, 47(8): 319-322.
[14] XU Xiang-yan and HOU Rui-huan. Medium and Long-term Population Prediction Based on GM(1,1)-SVM Combination Model [J]. Computer Science, 2020, 47(6A): 485-487.
[15] SONG Yan, HU Rong-hua, GUO Fu-min, YUAN Xin-liang and XIONG Rui-yang. Improved SVM+BP Algorithm for Muscle Force Prediction Based on sEMG [J]. Computer Science, 2020, 47(6A): 75-78.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!