Computer Science ›› 2021, Vol. 48 ›› Issue (3): 220-226.doi: 10.11896/jsjkx.200200061

• Artificial Intelligence • Previous Articles     Next Articles

Character-level Feature Extraction Method for Railway Text Classification

LU Bo-ren, HU Shi-zhe, LOU Zheng-zheng, YE Yang-dong   

  1. School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China
  • Received:2020-02-15 Revised:2020-06-17 Online:2021-03-15 Published:2021-03-05
  • About author:LU Bo-ren,born in 1996,master.His main research interests include machine learning and natural language proces-sing.
    LOU Zheng-zheng,born in 1984,asso-ciate professor,master supervisor,is a member of China Computer Federation.His main research interests include machine learning and pattern recognition.
  • Supported by:
    National Key Research and Development Program (2018YFB1201403) and Youth Program of National Natural Science Foundation of China(61502434).

Abstract: Railway text classification is of great practical significance to the development of China’s railway industry.Existing Chinese text feature extraction methods rely on word segmentation in advance.However,due to the low accuracy of word segmentation for railway text data,the feature extraction of railway text has limitations such as inadequate semantic understanding and incomplete feature acquisition.In view of the above problems,a character-level feature extraction method,CLW2V (Character Le-vel-Word2Vec),is proposed,which effectively solves the problem caused by the rich and high complexity of professional vocabulary in railway texts.Compared with the TF-IDF and Word2Vec methods based on lexical features,the CLW2V method based on character features extracts more refined text features,which solves the problem of poor feature extraction effect caused by the dependence on presegmentation in traditional methods.Experimental verification is carried out on the data set of railway safety supervision and licensing,which shows that the CLW2V feature extraction method for railway text classification is superior to the traditional TF-IDF and Word2Vec methods that rely on word segmentation.

Key words: Character level vector, Feature extraction method, Railway short text, Text classification

CLC Number: 

  • U229
[1]SHI T Y,LIU J,LI P,et al.Research on The Overall Scheme and Key Technologies of Railway Big Data Platform[J].Railway Computer Application 2016(9):1-6.
[2]LIU M J,WANG X F.Data Preprocessing in Data Mining [J].Computer science,2000,27(4):54-57.
[3]JONES K S.A statistical interpretation of term specificity and its application in retrieval[J].Journal of Documentation,1972,28(1):11-21.
[4]YANG L B,LI P,XUE R,et al.Fault Intelligent Classificationof Railway Signal Equipment Based on Unbalanced Text Data Mining[J].Journal of Railway Science,2018,40(2):59-66.
[5]MIKOLOV T,CHEN K,CORRADOG,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[6]KOWSARI,MEIMANDI J,HEIDARYSAFA,et al.Text Classification Algorithms:A Survey[J].Information,2019,10(4).
[7]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed Representations of Words and Phrases and Their Compositionality[C]//Advances in Neural Information Processing Systems.2013:3111-3119.
[8]CHEN L,LI J.Text feature selection methods based on word vector[J].Journal of Chinese Computer Systems,2018,39(5):991-994.
[9]DONG W,QIAN L P.Text similarity calculation based on semantic dictionary and word frequency information [J].Compu-ter Science,2017,44(Z11):422-427.
[10]LI X,XIE H,LIL J.Sentence semantic similarity calculationbased on Word2vec [J].Computer Science,2017,44(9):256-260.
[11]ZHANG M L,ROBLES V.Feature selection for multi-labelnaive Bayes classification[J].Information Sciences,2009,179(19):3218-3229.
[12]VRIES A D,MAMOULIS N,NES N,et al.Efficient KNNSearch on Vertically Decomposed Data[C]//Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data Madison,Wisconsin.Madiso:ACM Press,2002:322-333.
[13]JOACHIMS T.Text categorization with Support Vector Ma-chines:Learning with many relevant features[M].Machine Learning:ECML-98.1998:137-142.
[14]BERGERA L.A maximum entropy approach to natural lan-guage processing[J].Computational Linguistics,1996,22(1):39-71.
[15]KIM Y.Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).2014:1746-1751.
[16]ZHANG X,ZHAO J,LECUN Y.Character-level Convolutional Networks for Text Classification[C]//Advances in Neural Information Processing Systems.2015:649-657.
[17]GOODFELLOW I,BENGIO Y,COURVILLE A.Deep learning[M].MIT Press,2016:121-128.
[18]NAIR V,HINTON G E.Rectified Linear Units Improve Re-stricted Boltzmann Machines[C]//Proceedings of the 27th International Conference on Machine Learning (ICML-10).2010:807-814.
[19]SHEN Y,HE X,GAO J,et al.Learning semantic Representa-tions Using Convolutional Neural Networks For Web Search[C]//Proceedings of the 23rd International Conference on World Wide Web.ACM,2014:373-374.
[1] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[2] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[3] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[4] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[5] DENG Kai, YANG Pin, LI Yi-zhou, YANG Xing, ZENG Fan-rui, ZHANG Zhen-yu. Fast and Transmissible Domain Knowledge Graph Construction Method [J]. Computer Science, 2022, 49(6A): 100-108.
[6] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[7] SHAO Xin-xin. TI-FastText Automatic Goods Classification Algorithm [J]. Computer Science, 2022, 49(6A): 206-210.
[8] DENG Zhao-yang, ZHONG Guo-qiang, WANG Dong. Text Classification Based on Attention Gated Graph Neural Network [J]. Computer Science, 2022, 49(6): 326-334.
[9] LIU Shuo, WANG Geng-run, PENG Jian-hua, LI Ke. Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words [J]. Computer Science, 2022, 49(4): 282-287.
[10] ZHONG Gui-feng, PANG Xiong-wen, SUI Dong. Text Classification Method Based on Word2Vec and AlexNet-2 with Improved AttentionMechanism [J]. Computer Science, 2022, 49(4): 288-293.
[11] DENG Wei-bin, ZHU Kun, LI Yun-bo, HU Feng. FMNN:Text Classification Model Fused with Multiple Neural Networks [J]. Computer Science, 2022, 49(3): 281-287.
[12] ZHANG Hu, BAI Ping. Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification [J]. Computer Science, 2022, 49(2): 279-284.
[13] DU Shao-hua, WAN Huai-yu, WU Zhi-hao, LIN You-fang. Customs Commodity HS Code Classification Integrating Text Sequence and Graph Information [J]. Computer Science, 2021, 48(4): 97-103.
[14] LI Ke-yue, CHEN Yi, NIU Shao-zhang. Social E-commerce Text Classification Algorithm Based on BERT [J]. Computer Science, 2021, 48(2): 87-92.
[15] YU You-qin, LI Bi-cheng. Microblog User Interest Recognition Based on Multi-granularity Text Feature Representation [J]. Computer Science, 2021, 48(12): 219-225.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!