Computer Science ›› 2018, Vol. 45 ›› Issue (6): 208-210.doi: 10.11896/j.issn.1002-137X.2018.06.037

• Artificial Intelligence • Previous Articles     Next Articles

Improved Autoencoder Based Classification Algorithm for Text

XU Zhuo-bin, ZHENG Hai-shan, PAN Zhu-hong   

  1. Information and Network Center,Xiamen University,Xiamen,Fujian 361005,China
  • Received:2018-02-28 Online:2018-06-15 Published:2018-07-24

Abstract: Vector representation of words is the premise of applications in text mining.In order to improve the effectiveness of autoencoders in words embedding and theaccuracy of text lassification,this paper proposed an improved autoencoderand applied it for text classification.Based on traditional autoencoder,a global adjustable function is added to the latent layer,which adjusts smaller absolute values to bigger absolute values and implements the sparsity of characteristic vector in the latent layer.With the adjusted latent characteristic vector,a full connected neural network is used to classify text.The experiments on 20news dataset show that the proposed method is more effective in words embedding,and has better performance in text classification.

Key words: Autoencoder, Embedding vector, Neutral network, Text mining

CLC Number: 

  • TP391.4
[1]ELLISON N B.Social network sites:definition,history,and scholarship[J].Journal of Computer-Mediated Communication,2007,13(1):210-230.
[2]HOFMANN T.Probabilistic latent semantic analysis[C]//Fif-teenth Conference on Uncertainty in Artificial Intelligence.1999:289-296.
[3]SONG Y,PAN S,LIU S,et al.Topic and keyword re-ranking for LDA-based topic modeling[C]//18th ACM Conference on Information and Knowledge Management.2009:1757-1760.
[4]BROWN P F,DESOUZA P V,MERCER R L,et al.Class-based n-gram models of natural language[J].Computational linguistics,1992,18(4):467-479.
[5]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//26th International Conference on Neural Information Processing Systems.2013:3111-3119.
[6]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//31st International Conference on Machine Learning.2014:1188-1196.
[7]TANG Z H,ZHU Q X,HONG C Q,et al.Based on self encoders and hypergraph learning[J].Acta Automatica Sinica,2016,42(1):1014-1021.(in Chinese)
唐朝辉,朱清新,洪朝群,等.基于自编码器及超图学习的多标签特征提取[J].自动化学报,2016,42(1):1014-1021.
[8]XING C,MA L,YANG X.Stacked denoise autoencoder based feature extraction and classification for hyperspectral images[J].Journal of Sensors,2016(2016):1-10.
[9]HOU X,SHEN L,SUN K,et al.Deep feature consistent variational autoencoder[C]//2017 IEEE Winter Conference on Applications of Computer Vision (WACV).2017:1133-1141.
[10]TAO C,PAN H B,LI Y S,et al.Unsupervised spectral-spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification[J].IEEE Geoscience and Remote Sensing Letters,2015,12(12):2438-2442.
[11]CIREGAN D,MEIER U,SCHMIDHUBER J.Multi-column deep neural networks for image classification[C]//2012 IEEE conference on Computer vision and pattern recognition (CVPR).2012:3642-3649.
[12]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.2012:1097-1105.
[13]URIARTE-ARCIA A V,LÓPEZ-YÁNEZ I,YÁNEZ-MÁRQUEZ C.One-hot vector hybrid associative classifier for medical data classification PloS one[J].Public Library of Science,2014,9(10):95-105.
[14]ZHANG Y Y,HUO J,YANG W Q,et al.A deep belief network-based heterogeneous face verification method for the se-cond-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10(2):193-200.(in Chinese)
张媛媛,霍静,杨婉琪,等.others深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(2):193-200.
[15]HINTON G E,SALAKHUTDINOV R R.Replicated softmax:an undirected topic model[C]//22nd International Conference on Neural Information Processing Systems.2009:1607-1614.
[16]LV F,HAN M,QIU T.Remote Sensing Image Classification Based on Ensemble Extreme Learning Machine with Stacked Autoencoder[J].IEEE Access,2017,3(99):1-11.
[17]GAO J,ZHANG C X,WANG Z,et al.Question Classification Based on Improved TFIDF Algorithm[C]//International Conference on Control,Automation and Artificial Intelligence.2017:354-357.
[18]YANG B,HAN Q W,LEI M,et al.Short Text Classification Algorithm Based on Improved TF-IDF Weight[J].Journal of Chongqing University of Technology(Natural Sicence),2016,30(12):103-113.(in Chinese)
杨彬,韩庆文,雷敏,等.基于改进的TF-IDF权重的短文本分类算法[J].重庆理工大学学报(自然科学),2016,30(12):103-113.
[1] LIU Xin, WANG Jun, SONG Qiao-feng, LIU Jia-hao. Collaborative Multicast Proactive Caching Scheme Based on AAE [J]. Computer Science, 2022, 49(9): 260-267.
[2] WANG Guan-yu, ZHONG Ting, FENG Yu, ZHOU Fan. Collaborative Filtering Recommendation Method Based on Vector Quantization Coding [J]. Computer Science, 2022, 49(9): 48-54.
[3] LIU Chang, WEI Wei-min, MENG Fan-xing, CAI Zhi. Research Progress on Speech Style Transfer [J]. Computer Science, 2022, 49(6A): 301-308.
[4] HAN Jie, CHEN Jun-fen, LI Yan, ZHAN Ze-cong. Self-supervised Deep Clustering Algorithm Based on Self-attention [J]. Computer Science, 2022, 49(3): 134-143.
[5] QIAO Jie, CAI Rui-chu, HAO Zhi-feng. Mining Causality via Information Bottleneck [J]. Computer Science, 2022, 49(2): 198-203.
[6] XU Tao, TIAN Chong-yang, LIU Cai-hua. Deep Learning for Abnormal Crowd Behavior Detection:A Review [J]. Computer Science, 2021, 48(9): 125-134.
[7] ZHANG Shi-peng, LI Yong-zhong. Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions [J]. Computer Science, 2021, 48(9): 345-351.
[8] YANG Lei, JIANG Ai-lian, QIANG Yan. Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization [J]. Computer Science, 2021, 48(8): 53-59.
[9] BAI Yong, ZHANG Zhan-long, XIONG Jun-di. Power Knowledge Text Mining Based on FP-Growth Algorithm and GRNN [J]. Computer Science, 2021, 48(8): 86-90.
[10] SUN Sheng-zi, GUO Bing-hui , YANG Xiao-bo. Embedding Consensus Autoencoder for Cross-modal Semantic Analysis [J]. Computer Science, 2021, 48(7): 93-98.
[11] ZHANG Tong-ming, ZHANG Ning. Review of Research on Investor Sentiment Index in Stock Market [J]. Computer Science, 2021, 48(6A): 143-150.
[12] HU Xiao-wei, CHEN Yu-zhong. Query Suggestion Method Based on Autoencoder and Reinforcement Learning [J]. Computer Science, 2021, 48(6A): 206-212.
[13] XING Hong-jie, HAO ZhongHebei. Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder [J]. Computer Science, 2021, 48(6): 202-209.
[14] ZHAO Xin-can, CHANG Han-xing, JIN Ren-biao. 3D Point Cloud Shape Completion GAN [J]. Computer Science, 2021, 48(4): 192-196.
[15] FU Kun, ZHAO Xiao-meng, FU Zi-tong, GAO Jin-hui, MA Hao-ran. Deep Network Representation Learning Method on Incomplete Information Networks [J]. Computer Science, 2021, 48(12): 212-218.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!