Computer Science ›› 2019, Vol. 46 ›› Issue (12): 69-73.doi: 10.11896/jsjkx.190400107

• Big Data & Data Science • Previous Articles     Next Articles

Short Text Feature Expansion and Classification Based on Non-negative Matrix Factorization

HUANG Meng-ting, ZHANG Ling, JIANG Wen-chao   

  1. School of Computers,Guangdong University of Technology,Guangzhou 510006,China
  • Received:2019-04-18 Online:2019-12-15 Published:2019-12-17

Abstract: In this paper,a feature extension method based on non-negative matrix factorization (NMFFE) was proposed to overcome the sparse of short text feature.This method only considers the data itself and does not rely on external resources for feature extension.Firstly,the internal relationship of text and word is taken into account in the factorization of the relationship matrix between text and word ,and word clustering instruction matrix is obtained by graph dual re-gularization non-negative matrix triple factorization (DNMTF) method.Then,word clustering instruction matrix is reduced in dimensionality to get the feature space.Finally,according to the degree of correlation between words,the feature in the feature space is added to the short text,thus solving the problem of feature sparse in short text and improving the accuracy of text classification.The experimental data show that compared with the better performance in BOW algorithm and Char-CNN algorithm,the accuracy of short text classification based on NMFFE algorithm is increased by 25.77%,10.89% and 1.79% on the three datasets,which are Web snippets,Twitter sports and AGnews,respectively.The experimental data fully demonstrate that NMFFE algorithm is superior to BOW algorithm and Char-CNN algorithm in terms of classification accuracy and algorithm robustness.

Key words: Short text classification, Feature extension, Non-negative matrix factorization, Feature space, Correlation

CLC Number: 

  • TP391
[1] TOMMASEL A,GODOY D.Short-text feature construction and selection in social media data:a survey[J].Artificial Intelligence Review,2018,49(3):301-338.
[2] BOLLEGALA D,MATSUO Y,ISHIZUKA M.A Web Search Engine-Based Approach to Measure Semantic Similarity between Words[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(7):977-990.
[3] LI X,SU Y,MA H,et al.Combining Statistical Information and Semantic Similarity for Short Text Feature Extension[C]//International Conference on Intelligent Information Processing.Springer,2016:205-210.
[4] LI J,CAI Y,CAI Z,et al.Wikipedia Based Short Text Classification Method[M]//Database Systems for Advanced Applications.Berlin:Springer,2017:275-286.
[5] LI P,HE L,WANG H,et al.Learning From Short Text Streams With Topic Drifts[J].IEEE Transactions on Cybernetics,2017,48(9):1-15.
[6] VO D T,OCK C Y.Learning to classify short text from scienti- fic documents using topic models with various types of know-ledge[J].Expert Systems with Applications,2015,42(3):1684-1698.
[7] ZHANG H,ZHONG G.Improving short text classification by learning vector representations of both words and hidden topics[J].Knowledge-Based Systems,2016,102(C):76-86.
[8] KIM K,CHUNG B S,CHOI Y R,et al.Language independent semantic kernels for short-text classification[J].Expert Systems with Applications,2014,41(2):735-743.
[9] ZHANG X,ZHAO J,LECUN Y.Character-level convolutional networks for text classification[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.ACM,2015,1:649-657.
[10] DING C H Q ,LI T ,PENG W ,et al.Orthogonal nonnegative matrix t-factorizations for clustering[C]//Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2006.
[11] GU Q,ZHOU J.Co-clustering on manifolds[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2009:359-368.
[12] SHANG F ,JIAO L C ,WANG F .Graph dual regularization non-negative matrix factorization for co-clustering[J].Pattern Recognition,2012,45(6):2237-2250.
[13] BOYD S,VANDENBERGHE L.Convex Optimization[M]. Cambridge:Cambridge University Press,2004.
[14] PHAN X H ,NGUYEN L M ,HORIGUCHI S .Learning to classify short and sparse text & web with hidden topics from large-scale data collections[C]//Proceeding of the 17th International Conference on World Wide Web.Beijing:ACM,2008:91-100.
[15] HU Y ,ZHENG L ,YANG Y ,et al.Twitter100k:A Real-world Dataset for Weakly Supervised Cross-Media Retrieval[J].IEEE Transactions on Multimedia,2018,20(4):927-938.
[16] ZHAO Y ,KARYPIS G .Criterion functions for document clustering[C]//Proceedings of the Thirteenth ACM Conference on Information and knowledge Management.ACM,2005:1-30.
[17] STREHL A ,GHOSH J .Cluster ensembles — a knowledge reuse framework for combining multiple partitions[J].Journal of Machine Learning Research,2003,3(3):583-617.
[18] HUBERT L ,ARABIE P .Comparing Partitions[J].Journal of Classification,1985,2(1):193-218.
[1] CHEN Jie-ting, WANG Wei-ying, JIN Qin. Multi-label Video Classification Assisted by Danmaku [J]. Computer Science, 2021, 48(1): 167-174.
[2] LI Yu-rong, LIU Jie, LIU Ya-lin, GONG Chun-ye, WANG Yong. Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation [J]. Computer Science, 2020, 47(8): 49-55.
[3] CHENG Jing, LIU Na-na, MIN Ke-rui, KANG Yu, WANG Xin, ZHOU Yang-fan. Word Embedding Optimization for Low-frequency Words with Applications in Short-text Classification [J]. Computer Science, 2020, 47(8): 255-260.
[4] MO Cai-wang, CHANG Kan, LI Heng-xin, LI Ming-hong, QIN Tuan-fa. Color Image Super-resolution Algorithm Based on Inter-channel Correlation and Nonlocal Self-similarity [J]. Computer Science, 2020, 47(6): 138-143.
[5] YU Lu, HU Jian-feng, YAO Lei-yue. Correlation Filter Object Tracking Algorithm Based on Global and Local Block Cooperation [J]. Computer Science, 2020, 47(6): 157-163.
[6] CHEN Qian, ZHOU Jie, SHAO Gen-fu. MIMO Channels with Arbitrary AoA Power Spectrum for Various Wireless Environments [J]. Computer Science, 2020, 47(6): 271-275.
[7] ZHANG Qin, CHEN Hong-mei, FENG Yun-fei. Overlapping Community Detection Method Based on Rough Sets and Density Peaks [J]. Computer Science, 2020, 47(5): 72-78.
[8] GUO Xin, ZHANG Geng, CHEN Qian, WANG Su-ge. Candidate Sentences Extraction for Machine Reading Comprehension [J]. Computer Science, 2020, 47(5): 198-203.
[9] LI Gang, WANG Chao, HAN De-peng, LIU Qiang-wei, LI Ying. Study on Multimodal Image Genetic Data Based on Deep Principal Correlated Auto-encoders [J]. Computer Science, 2020, 47(4): 60-66.
[10] LIU Xiao-ling,LIU Bai-song,WANG Yang-yang,TANG Hao. Research and Development of Multi-label Generation Based on Deep Learning [J]. Computer Science, 2020, 47(3): 192-199.
[11] TAN Jian-hao, YIN Wang, LIU Li-ming, WANG Yao-nan. Robust Long-term Adaptive Object Tracking Based onMulti-correlation Filtering Strategy [J]. Computer Science, 2020, 47(12): 169-176.
[12] WANG Rui-jie, LI Jun-huai, WANG Kan, WANG Huai-jun, SHANG Xun-chao, TU Peng-jia. Feature Selection Method for Behavior Recognition Based on Improved Feature Subset Discrimination [J]. Computer Science, 2020, 47(11A): 204-208.
[13] MA Kang, LOU Jing-tao, SU Zhi-yuan, LI Yong-le, ZHU Yuan. Object Tracking Algorithm Based on Feature Fusion and Adaptive Scale Kernel Correlation Filter [J]. Computer Science, 2020, 47(11A): 224-230.
[14] LIU Jing, HUANG Ju, LAI Ying-xu, QIN Hua, ZENG Wei. Study on Secure Log Storage Method Based on Blockchain [J]. Computer Science, 2020, 47(11A): 388-395.
[15] ZHOU Chang, LI Xiang-li, LI Qiao-lin, ZHU Dan-dan, CHEN Shi-lian, JIANG Li-rong. Sparse Non-negative Matrix Factorization Algorithm Based on Cosine Similarity [J]. Computer Science, 2020, 47(10): 108-113.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[9] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .
[10] SHI Chao, XIE Zai-peng, LIU Han and LV Xin. Optimization of Container Deployment Strategy Based on Stable Matching[J]. Computer Science, 2018, 45(4): 131 -136 .