计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 211000017-7.doi: 10.11896/jsjkx.211000017

• 人工智能 • 上一篇    下一篇

基于DE-lightGBM模型的上市公司高送转预测实证研究

岑健铭1,2, 封全喜1,2, 张丽丽1, 佟锐超1   

  1. 1 桂林理工大学理学院 广西 桂林 541004
    2 广西高校应用统计重点实验室 广西 桂林 541004
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 封全喜(fqx9904@163.com)
  • 作者简介:(cenjianming0819@foxmail.com)
  • 基金资助:
    国家自然科学基金(62166015,61763008,62166013);防城港市科学技术攻关项目(防财教 [2014] 42号)

Empirical Study on the Forecast of Large Stock Dividends of Listed Companies Based on DE-lightGBM

CEN Jian-ming1,2, FENG Quan-xi1,2, ZHANG Li-li1, TONG Rui-chao1   

  1. 1 College of Science,Guilin University of Technology,Guilin,Guangxi 541004,China
    2 Guangxi Colleges and Universities Key Laboratory of Applied Statistics,Guilin,Guangxi 541004,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:CEN Jian-ming,born in 1994,postgraduate.His main research interests include Intelligent algorithms and machine learning.
    FENG Quan-xi,born in 1980,Ph.D,professor.His main research research interests include computational intelligence and machine learning in real world and so on.
  • Supported by:
    National Natural Science Foundation of China(62166015,61763008,62166013) and Key Science and Technology Project of Fanghenggang city(Fangcaijiao [2014] No.42).

摘要: “高送转”现象指上市公司转增较大比例的股票。针对上市公司实施“高送转”现象的预测问题,文中提出了一种基于差分进化算法超参数优化的lightGBM模型(简记为DE-lightGBM)。该模型主要包括两个方面:首先,利用差分进化算法调整lightGBM模型的损失函数中少数类别的权重以及正则项系数,以处理数据类别不平衡的问题;其次,以F1和AUC作为评价指标,再次利用差分进化算法优化li-ghtGBM模型的重要超参数变量,找到一组预测效果最优的参数组合。数值结果显示,DE-lightGBM模型取得了较好的效果,F1和AUC值分别为0.536 8和0.873 4。提出的DE-lightGBM模型能够有效识别下一年将会实施“高送转”的上市公司。

关键词: 高送转, 差分进化算法, lightGBM, 不平衡数据处理, 机器学习

Abstract: Large stock dividends refers to the transfer of a large proportion of shares by listed companies.Aiming at the prediction problem of large stock dividends phenomenon implemented by listed companies,this paper proposes alightGBM based on Differential Evolution algorithm hyperparametric optimization(Named as DE-lightGBM).The model mainly includes two aspects:Firstly,Differential Evolution algorithm is used to adjust the weight of a few categories and the coefficient of regular term in the loss function of lightGBM to deal with the problem of data category imbalance.Secondly,taking F1 and AUC as evaluation indexes,Differential Evolution algorithm is used to optimize the important hyperparametric variables of lightGBM model again to find a group of parameter combinations with the best prediction effect.The numerical results show that the DE-lightGBM has achieved good results,and the F1 and AUC are 0.536 8 and 0.873 4 respectively.DE-lightGBM proposed in this paper can effectively identify the listed companies that will implement stock dividends next year.

Key words: Large Stock Dividends, Differential Evolution, LightGBM, Unbalance treatment, Machine learning

中图分类号: 

  • TP181
[1]CHE Z C,ZHAO Y X,GUAN S.Analysis on the Trend and Characteristics of “High Delivery” Policy of Listed Companies [J].Friends of Accounting,2013,17:26-31.
[2]LIU Y,YE D L.High Transfer,Corporate Performance and Executive Reduction Scale[J].Collected Essays on Finance and Economics,2019,9:62-72.
[3]LI C,HU Z Y,SHI S R.Research on Irrational Speculative Bubble Model Based on Stock Market Investor Sentiment [J].The Theory and Practice of Finance and Economics,2018,39(5):51-57.
[4]KRIEGER K,PETERSON D R.Predicting Stock Splits withthe Help of Firm-specific Experiences[J].Journal of Economics and Finance,2009,33(4):410-421.
[5]XIONG Y M,CHEN X.Research on the Motivation of Turn-over Behavior of Chinese Listed Companies--Based on the Test of High Turn-Over Samples[J].Research on Economics and Management,2012,5:81-88.
[6]SHI H,TING X Y.Prediction Model of “High Delivery andTurn” Based on Pattern Recognition [J].Times Finance,2016,12:289-290.
[7]WANG K,LONG W J.Research on High Stock Transfer Based on Integrated Learning [J].Times Finance,2016,36:163-164,167.
[8]DONG K M,ZHAO S S.Research on the Motivation of “High Turnover” of Chinese Listed Companies--Based on BP Neural Network Model Method analysis [J].Review of Investment Studies,2018,1:139-153.
[9]CHEN J W,CHEN Y X,FAN W H.Research on the InfluenceFactors of High Turnover of Listed Companies Based on Data Mining [J].China Computer & Communication,2020,14:162-164.
[10]LI Y,FANG Z Q.Research on Enterprise High Transfer Me-thod Based on Machine Learning [J].Digital Space,2020,10:220-221.
[11]ZHANG T H,LUO K Y.An Empirical Study on High Turnover Forecasting of Listed Companies Based on Integrated Learning [J].Computer Engineering and Applications,2021,57(4):1-7.
[12]YANG J,OLAFSSON S.Optimization-based Feature Selection with Adaptive Instance Sampling[J].Computers & Operations Research,2006,33(11):3088-3106.
[13]RESHEF D N,RESHEF Y A,FINUCANE H K,et al.Detecting Novel Associations in Large Data Sets[J].Science,2011,334(6062):1518.
[14]YANG Q W.Overview of Differential Evolution Algorithms[J].Pattern Recognition and Artificial Intelligence,2008,4(21):506-513.
[15]SONG L L,WANG S H,YANG C,et al.Application Research of Improved XGBoost in Unbalanced Data Processing [J].Computer Science,2020,47(6):98-103.
[16]YAN S X,ZHU P,LIU Z.Research on Vehicle Fault Prediction Method Based on Improved LightGBM Model [J].Automotive Engineering,2020,42(6):815-819,825.
[17]TANG K,QIN M,ZHAO X,et al.Prediction of Gaseous Nitrite Based on Stacking Integrated Learning Model [J].China Environmental Science,2020,40(2):582-590.
[18]Al DAOUD E.Comparison Between XGBoost,LightGBM and CatBoost Using a Home Credit Dataset[J].International Journal of Computer and Information Engineering,2019,13(1):6-10.
[19]第八届“泰迪杯”数据挖掘挑战赛赛题[EB/OL].https://www.tipdm.org/bdrace/index.html.
[20]CHEN S L,SHEN S Q,LI D S.Integrated Learning Method for Unbalanced Data Based on Updating Sample Weights [J].Computer Science,2018,45(7):31-37.
[21]KAUR H,PANNU H S,MALHI A K.A Systematic Review on Imbalanced Data Challenges in Machine Learning:Applications and Solutions[J].ACM Computing Surveys(CSUR),2019,52(4):1-36.
[22]ZHOU Z H.Machine Learning[M].Beijing:Tsinghua University Press,2016.
[23]LI G H,LI J Q,ZHANG L,et al.A Feature Selection Method Based on Ant Colony Algorithm and Random Forest [J].Computer Science,2019,46(S2):212-215.
[24]BERGSTRA J,BARDENET R,BENGIO R,et al.Algorithms for hyper-parameter optimization[C]//Advances in Neural Information Processing Systems.2011.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[4] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[5] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[6] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[7] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[8] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[9] 刘宝宝, 杨菁菁, 陶露, 王贺应.
基于DE-LSTM模型的教育统计数据预测研究
Study on Prediction of Educational Statistical Data Based on DE-LSTM Model
计算机科学, 2022, 49(6A): 261-266. https://doi.org/10.11896/jsjkx.220300120
[10] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[11] 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮.
一种基于异质模型融合的 Android 终端恶意软件检测方法
Android Malware Detection Method Based on Heterogeneous Model Fusion
计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103
[12] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[13] 许杰, 祝玉坤, 邢春晓.
机器学习在金融资产定价中的应用研究综述
Application of Machine Learning in Financial Asset Pricing:A Review
计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
[14] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[15] 李野, 陈松灿.
基于物理信息的神经网络:最新进展与展望
Physics-informed Neural Networks:Recent Advances and Prospects
计算机科学, 2022, 49(4): 254-262. https://doi.org/10.11896/jsjkx.210500158
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!