Computer Science ›› 2018, Vol. 45 ›› Issue (6): 208-210.doi: 10.11896/j.issn.1002-137X.2018.06.037

• Artificial Intelligence • Previous Articles     Next Articles

Improved Autoencoder Based Classification Algorithm for Text

XU Zhuo-bin, ZHENG Hai-shan, PAN Zhu-hong   

  1. Information and Network Center,Xiamen University,Xiamen,Fujian 361005,China
  • Received:2018-02-28 Online:2018-06-15 Published:2018-07-24

Abstract: Vector representation of words is the premise of applications in text mining.In order to improve the effectiveness of autoencoders in words embedding and theaccuracy of text lassification,this paper proposed an improved autoencoderand applied it for text classification.Based on traditional autoencoder,a global adjustable function is added to the latent layer,which adjusts smaller absolute values to bigger absolute values and implements the sparsity of characteristic vector in the latent layer.With the adjusted latent characteristic vector,a full connected neural network is used to classify text.The experiments on 20news dataset show that the proposed method is more effective in words embedding,and has better performance in text classification.

Key words: Text mining, Autoencoder, Embedding vector, Neutral network

CLC Number: 

  • TP391.4
[1]ELLISON N B.Social network sites:definition,history,and scholarship[J].Journal of Computer-Mediated Communication,2007,13(1):210-230.
[2]HOFMANN T.Probabilistic latent semantic analysis[C]//Fif-teenth Conference on Uncertainty in Artificial Intelligence.1999:289-296.
[3]SONG Y,PAN S,LIU S,et al.Topic and keyword re-ranking for LDA-based topic modeling[C]//18th ACM Conference on Information and Knowledge Management.2009:1757-1760.
[4]BROWN P F,DESOUZA P V,MERCER R L,et al.Class-based n-gram models of natural language[J].Computational linguistics,1992,18(4):467-479.
[5]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//26th International Conference on Neural Information Processing Systems.2013:3111-3119.
[6]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//31st International Conference on Machine Learning.2014:1188-1196.
[7]TANG Z H,ZHU Q X,HONG C Q,et al.Based on self encoders and hypergraph learning[J].Acta Automatica Sinica,2016,42(1):1014-1021.(in Chinese)
唐朝辉,朱清新,洪朝群,等.基于自编码器及超图学习的多标签特征提取[J].自动化学报,2016,42(1):1014-1021.
[8]XING C,MA L,YANG X.Stacked denoise autoencoder based feature extraction and classification for hyperspectral images[J].Journal of Sensors,2016(2016):1-10.
[9]HOU X,SHEN L,SUN K,et al.Deep feature consistent variational autoencoder[C]//2017 IEEE Winter Conference on Applications of Computer Vision (WACV).2017:1133-1141.
[10]TAO C,PAN H B,LI Y S,et al.Unsupervised spectral-spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification[J].IEEE Geoscience and Remote Sensing Letters,2015,12(12):2438-2442.
[11]CIREGAN D,MEIER U,SCHMIDHUBER J.Multi-column deep neural networks for image classification[C]//2012 IEEE conference on Computer vision and pattern recognition (CVPR).2012:3642-3649.
[12]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.2012:1097-1105.
[13]URIARTE-ARCIA A V,LÓPEZ-YÁNEZ I,YÁNEZ-MÁRQUEZ C.One-hot vector hybrid associative classifier for medical data classification PloS one[J].Public Library of Science,2014,9(10):95-105.
[14]ZHANG Y Y,HUO J,YANG W Q,et al.A deep belief network-based heterogeneous face verification method for the se-cond-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10(2):193-200.(in Chinese)
张媛媛,霍静,杨婉琪,等.others深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(2):193-200.
[15]HINTON G E,SALAKHUTDINOV R R.Replicated softmax:an undirected topic model[C]//22nd International Conference on Neural Information Processing Systems.2009:1607-1614.
[16]LV F,HAN M,QIU T.Remote Sensing Image Classification Based on Ensemble Extreme Learning Machine with Stacked Autoencoder[J].IEEE Access,2017,3(99):1-11.
[17]GAO J,ZHANG C X,WANG Z,et al.Question Classification Based on Improved TFIDF Algorithm[C]//International Conference on Control,Automation and Artificial Intelligence.2017:354-357.
[18]YANG B,HAN Q W,LEI M,et al.Short Text Classification Algorithm Based on Improved TF-IDF Weight[J].Journal of Chongqing University of Technology(Natural Sicence),2016,30(12):103-113.(in Chinese)
杨彬,韩庆文,雷敏,等.基于改进的TF-IDF权重的短文本分类算法[J].重庆理工大学学报(自然科学),2016,30(12):103-113.
[1] XU Tao, TIAN Chong-yang, LIU Cai-hua. Deep Learning for Abnormal Crowd Behavior Detection:A Review [J]. Computer Science, 2021, 48(9): 125-134.
[2] ZHANG Shi-peng, LI Yong-zhong. Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions [J]. Computer Science, 2021, 48(9): 345-351.
[3] YANG Lei, JIANG Ai-lian, QIANG Yan. Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization [J]. Computer Science, 2021, 48(8): 53-59.
[4] BAI Yong, ZHANG Zhan-long, XIONG Jun-di. Power Knowledge Text Mining Based on FP-Growth Algorithm and GRNN [J]. Computer Science, 2021, 48(8): 86-90.
[5] SUN Sheng-zi, GUO Bing-hui , YANG Xiao-bo. Embedding Consensus Autoencoder for Cross-modal Semantic Analysis [J]. Computer Science, 2021, 48(7): 93-98.
[6] ZHANG Tong-ming, ZHANG Ning. Review of Research on Investor Sentiment Index in Stock Market [J]. Computer Science, 2021, 48(6A): 143-150.
[7] HU Xiao-wei, CHEN Yu-zhong. Query Suggestion Method Based on Autoencoder and Reinforcement Learning [J]. Computer Science, 2021, 48(6A): 206-212.
[8] XING Hong-jie, HAO ZhongHebei. Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder [J]. Computer Science, 2021, 48(6): 202-209.
[9] ZHAO Xin-can, CHANG Han-xing, JIN Ren-biao. 3D Point Cloud Shape Completion GAN [J]. Computer Science, 2021, 48(4): 192-196.
[10] FAN Lian-xi, LIU Yan-bei, WANG Wen, GENG Lei, WU Jun, ZHANG Fang, XIAO Zhi-tao. Multimodal Representation Learning for Alzheimer's Disease Diagnosis [J]. Computer Science, 2021, 48(10): 107-113.
[11] LIU Dan, ZHAO Sen, YAN Zhi-liang, ZHAO Jing, WANG Hui-qing. miRNA-disease Association Prediction Model Based on Stacked Autoencoder [J]. Computer Science, 2021, 48(10): 114-120.
[12] LI Ya-nan, HU Yu-jia, GAN Wei, ZHU Min. Survey on Target Site Prediction of Human miRNA Based on Deep Learning [J]. Computer Science, 2021, 48(1): 209-216.
[13] ZHU Di-chen, XIA Huan, YANG Xiu-zhang, YU Xiao-min, ZHANG Ya-cheng and WU Shuai. Research on Mobile Game Industry Development in China Based on Text Mining and Decision Tree Analysis [J]. Computer Science, 2020, 47(6A): 530-534.
[14] WANG Hang, CHEN Xiao, TIAN Sheng-zhao, CHEN Duan-bing. SAR Image Recognition Based on Few-shot Learning [J]. Computer Science, 2020, 47(5): 124-128.
[15] SUN Zhi-qiang, WAN Liang, DING Hong-wei. Android Malware Detection Method Based on Deep Autoencoder Network [J]. Computer Science, 2020, 47(4): 298-304.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] . [J]. Computer Science, 2018, 1(1): 1 .
[2] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[3] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[4] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[5] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[6] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[7] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[8] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[9] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[10] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .