计算机科学 ›› 2018, Vol. 45 ›› Issue (6): 208-210.doi: 10.11896/j.issn.1002-137X.2018.06.037
许卓斌, 郑海山, 潘竹虹
XU Zhuo-bin, ZHENG Hai-shan, PAN Zhu-hong
摘要: 词的向量化表达是文本挖掘应用的必要前提。为了改善自编码器在词嵌入中的效果,提高文本分类的准确性,提出了一种改进的自编码器并将其用于文本分类。在传统自编码器的基础上,在隐藏层加入了一个全局调整函数,其将绝对值小的特征值调整到绝对值大的特征值上,实现了隐藏层特征向量的稀疏化。得到调整后的特征向量之后,采用全连接神经网络进行文本分类。在20news数据集上的实验结果表明,所提方法具有更好的词向量嵌入式效果,并且在文本分类中也具有更好的效果。
中图分类号:
[1]ELLISON N B.Social network sites:definition,history,and scholarship[J].Journal of Computer-Mediated Communication,2007,13(1):210-230. [2]HOFMANN T.Probabilistic latent semantic analysis[C]//Fif-teenth Conference on Uncertainty in Artificial Intelligence.1999:289-296. [3]SONG Y,PAN S,LIU S,et al.Topic and keyword re-ranking for LDA-based topic modeling[C]//18th ACM Conference on Information and Knowledge Management.2009:1757-1760. [4]BROWN P F,DESOUZA P V,MERCER R L,et al.Class-based n-gram models of natural language[J].Computational linguistics,1992,18(4):467-479. [5]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//26th International Conference on Neural Information Processing Systems.2013:3111-3119. [6]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//31st International Conference on Machine Learning.2014:1188-1196. [7]TANG Z H,ZHU Q X,HONG C Q,et al.Based on self encoders and hypergraph learning[J].Acta Automatica Sinica,2016,42(1):1014-1021.(in Chinese) 唐朝辉,朱清新,洪朝群,等.基于自编码器及超图学习的多标签特征提取[J].自动化学报,2016,42(1):1014-1021. [8]XING C,MA L,YANG X.Stacked denoise autoencoder based feature extraction and classification for hyperspectral images[J].Journal of Sensors,2016(2016):1-10. [9]HOU X,SHEN L,SUN K,et al.Deep feature consistent variational autoencoder[C]//2017 IEEE Winter Conference on Applications of Computer Vision (WACV).2017:1133-1141. [10]TAO C,PAN H B,LI Y S,et al.Unsupervised spectral-spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification[J].IEEE Geoscience and Remote Sensing Letters,2015,12(12):2438-2442. [11]CIREGAN D,MEIER U,SCHMIDHUBER J.Multi-column deep neural networks for image classification[C]//2012 IEEE conference on Computer vision and pattern recognition (CVPR).2012:3642-3649. [12]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.2012:1097-1105. [13]URIARTE-ARCIA A V,LÓPEZ-YÁNEZ I,YÁNEZ-MÁRQUEZ C.One-hot vector hybrid associative classifier for medical data classification PloS one[J].Public Library of Science,2014,9(10):95-105. [14]ZHANG Y Y,HUO J,YANG W Q,et al.A deep belief network-based heterogeneous face verification method for the se-cond-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10(2):193-200.(in Chinese) 张媛媛,霍静,杨婉琪,等.others深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(2):193-200. [15]HINTON G E,SALAKHUTDINOV R R.Replicated softmax:an undirected topic model[C]//22nd International Conference on Neural Information Processing Systems.2009:1607-1614. [16]LV F,HAN M,QIU T.Remote Sensing Image Classification Based on Ensemble Extreme Learning Machine with Stacked Autoencoder[J].IEEE Access,2017,3(99):1-11. [17]GAO J,ZHANG C X,WANG Z,et al.Question Classification Based on Improved TFIDF Algorithm[C]//International Conference on Control,Automation and Artificial Intelligence.2017:354-357. [18]YANG B,HAN Q W,LEI M,et al.Short Text Classification Algorithm Based on Improved TF-IDF Weight[J].Journal of Chongqing University of Technology(Natural Sicence),2016,30(12):103-113.(in Chinese) 杨彬,韩庆文,雷敏,等.基于改进的TF-IDF权重的短文本分类算法[J].重庆理工大学学报(自然科学),2016,30(12):103-113. |
[1] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[2] | 王冠宇, 钟婷, 冯宇, 周帆. 基于矢量量化编码的协同过滤推荐方法 Collaborative Filtering Recommendation Method Based on Vector Quantization Coding 计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109 |
[3] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[4] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[5] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[6] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[7] | 王润安, 邹兆年. 基于物理操作级模型的查询执行时间预测方法 Query Performance Prediction Based on Physical Operation-level Models 计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074 |
[8] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[9] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[10] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[11] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[12] | 齐秀秀, 王佳昊, 李文雄, 周帆. 基于概率元学习的矩阵补全预测融合算法 Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning 计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126 |
[13] | 杨炳新, 郭艳蓉, 郝世杰, 洪日昌. 基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用 Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition 计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070 |
[14] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[15] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
|