计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 122-126.

• 智能计算 • 上一篇    下一篇

基于深层融合的股票文本主题识别

张加惠, 陈致远, 赵峰, 安志勇, 谢青松   

  1. (山东工商学院计算机科学与技术学院 山东 烟台264005)
  • 出版日期:2019-11-10 发布日期:2019-11-20
  • 通讯作者: 谢青松(1965-),男,教授,硕士生导师,CCF会员,主要研究方向为金融数据分析、智能算法,E-mail:xieqingsong@sdtbu.edu。
  • 作者简介:张加惠(1995-),女,硕士生,主要研究方向为金融数据分析与挖掘。
  • 基金资助:
    本文受国家自然科学基金(61773244),烟台市重点研发计划项目(2017ZH065,2019XDHZ081),赛尔网络下一代互联网技术创新项目(NGII20170626),山东工商学院研究生科技创新基金项目(3110318)资助。

Stock Text Theme Recognition Based on Deep Fusion

ZHANG Jia-hui, CHEN Zhi-yuan, ZHAO Feng, AN Zhi-yong, XIE Qing-song   

  1. (School of Computer Science and Technology,Shandong Technology and Business University,Yantai,Shandong 264005,China)
  • Online:2019-11-10 Published:2019-11-20

摘要: 股票市场在资本市场中占据着重要地位,是经济的晴雨表。专家对股票的评论是投资者进行投资决策的重要依据。因此,如何快速有效地捕获众多专家股评的主题信息,成为股票研究领域的热点。然而目前大多数股票文本主题识别算法,其特征选择方法及分类模式多采用单一的标准。一般而言,单一的标准只能从某个侧面反映文本主题的识别效果,无法全面捕获目标的主体特征。事实上,不同的特征选择标准及分类器模型从不同侧面去理解文本,捕获的特征信息具有较强的互补性。为了提高股票文本主题识别的准确性,文章从信息融合的角度对股票文本进行了多层面融合:1)特征选择层,对多种特征选择方法进行加权融合,使其能够全面表征股票文本的特点;2)决策层,基于SVM-score,对多个分类器进行决策层融合,使其能够提高文本识别的准确性。基于实测数据的实验表明:相比单一模式的文本主题识别方法,文章提出的多层融合算法的识别精度明显更高。

关键词: SVM-score, 特征融合, 特征选择, 文本分类, 主题识别

Abstract: The stock market occupies an important position in the capital market and is a barometer of the economy.Experts' comments on stocks are an important basis for investors to make investment decisions.Therefore,how to quickly and effectively capture the subject information of many expert stock reviews has become a hot spot in the field of stock research.However,most stock text topic recognition algorithms currently use a single standard for their feature selection methods and classification models.In general,a single standard can only reflect the recognition of a text topic from one side,and cannot fully capture the subject's main features.In fact,different feature selection criteria and classifier models understand the text from different sides,and the captured feature information has strong complementarity.To this end,in order to improve the accuracy of the theme recognition of stock texts,this paper has a multi-faceted fusion of stock texts from the perspective of information fusion,it includes:1)Feature selection layer,which performs weighted fusion on multiple feature selection methods to enable it to fully characterize stock text features;2)The decision-making layer,based on SVM-score,performs decision-making layer fusion on multiple classifiers,which can improve the accuracy of text recognition.Experiments based on measured data show that the recognition accuracy of the multi-layer fusion algorithm proposed in this paper is significantly improved compared with the single-mode text topic recognition method.

Key words: Feature fusion, Feature selection, Subject recognition, SVM-score, Text categorization

中图分类号: 

  • TP391
[1]张晨希.数据挖掘技术在股票预测中的应用[D].合肥:安徽大学,2006.
[2]梁雪玲.LG-trader:基于局部泛化误差和特征选择的股票交易决策支持[D].广州:华南理工大学,2014.
[3]卜乐.我国上市公司股票股利与长期股票价格相关性研究[D].上海:东华大学,2014.
[4]汤浩.股票收益分布函数分析及价格预测[D].武汉:武汉科技大学,2004.
[5]KIM K J.Financial time series forecasting using support vector machines [J].Neurocomputing,2003,55(1):307-319.
[6]方匡南,纪宏,路逊.股票技术指标相似性与有效性研究[J].统计与信息论坛,2009,24(9):26-30.
[7]李妍.基于集成学习的股票买卖点预测研究[D].西安:西北大学,2018.
[8]HAN M,XI J H,XU S G.Prediction of chaotic time series based on the recurrent predictor neural network[J].IEEE Transactions on Signal Processing,2004,52(12):3409-3416.
[9]GUYON I,ELISSEEFF A.An introduction to variable and feature selection[J].Journal of Machine Learning Research,2003(3):1157-1182.
[10]张润莲,张昭,彭小金,等.基于Fisher分和支持向量机的特征选择算法[J].计算机工程与设计,2014,35(12):4145-4190.
[11]宋哲理,王超,王振飞.基于MapReduce的多级特征选择机制[J].计算机科学,2018,45(S2):478-483,489.
[12]MAO X,ZHAO G,SUN R.Naive Bayesian algorithm classification model with local attribute weighted based on KNN [C]∥Proc of IEEE Information Technology,Networking,Electronic and Automation Control Conference.IEEE,2017:904-908.
[13]汪东升,黄传河,黄晓鹏,等.电信大数据文本挖掘算法及应用[J].计算机科学,2017(12):238-244.
[14]数据堂.停用词集合[DB/OL].http://www.datatang.com/data/19300/.Data Hall.Stop word collection[DB/OL].http://www.datatang.com/data/19300/ .
[15]王纵虎,刘速.一种成对约束限制的半监督文本聚类算法[J].计算机科学,2016,43(12):190-195.
[16]李荣陆.文本分类及其相关技术研究[D].上海:复旦大学,2005.
[17]刘付勇,高贤强,张著.基于改进贝叶斯概率模型的推荐算法[J].计算机科学,2017,44(5):285-289.
[18]MESLEH A W.Chi square feature extraction based SVMs Arabic Language Text Categorization system[J].Journal of Computer Science,2007,3(6):430-435.
[19]CHANG Ch C,LIN C -J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(27):1-27.
[20]NASON G P .Wavelet Shrinkage Using Cross-Validation[J].Journal of the Royal Statistical Society:Series B (Methodological),1996,58(2):463-479.
[1] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[2] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[3] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[4] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[5] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[6] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[7] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[8] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[9] 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓.
一种可快速迁移的领域知识图谱构建方法
Fast and Transmissible Domain Knowledge Graph Construction Method
计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[10] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[11] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[12] 邵欣欣.
TI-FastText自动商品分类算法
TI-FastText Automatic Goods Classification Algorithm
计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[13] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[14] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[15] 陈永平, 朱建清, 谢懿, 吴含笑, 曾焕强.
基于外接圆半径差损失的实时安全帽检测算法
Real-time Helmet Detection Algorithm Based on Circumcircle Radius Difference Loss
计算机科学, 2022, 49(6A): 424-428. https://doi.org/10.11896/jsjkx.220100252
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!