计算机科学 ›› 2022, Vol. 49 ›› Issue (12): 195-204.doi: 10.11896/jsjkx.210600029

• 数据库&大数据&数据科学 • 上一篇    下一篇

融合XGBoost与SHAP模型的足球运动员身价预测及特征分析方法

廖彬1, 王志宁2, 李敏2, 孙瑞娜2,3,4   

  1. 1 贵州财经大学大数据统计学院 贵阳550025
    2 新疆财经大学统计与数据科学学院 乌鲁木齐830012
    3 中国科学院信息工程研究所 北京100093
    4 中国科学院大学网络空间安全学院 北京100049
  • 收稿日期:2021-06-03 修回日期:2021-10-22 发布日期:2022-12-14
  • 通讯作者: 王志宁(740216422@qq.com)
  • 作者简介:(liaobin665@163.com)
  • 基金资助:
    国家自然科学基金(61562078);新疆“天山雪松计划”青年拔尖人才后备人选项目:机器学习前沿算法及其应用研究;新疆高校科研计划(XJEDU2021Y037)

Integrating XGBoost and SHAP Model for Football Player Value Prediction and Characteristic Analysis

LIAO Bin1, WANG Zhi-ning2, LI Min2, SUN Rui-na2,3,4   

  1. 1 College of Big Data Statistics,Guizhou University of Finance and Economics,Guiyang 550025,China
    2 College of Statistics and Data Science,Xinjiang University of Finance and Economics,Urumqi 830012,China
    3 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
    4 School of Networks Security,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2021-06-03 Revised:2021-10-22 Published:2022-12-14
  • About author:LIAO Bin,born in 1986,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include deep learning,data mi-ning and big data computing model.WANG Zhi-ning,born in 1994,postgraduate.His main research interests include machine learning and big data.
  • Supported by:
    National Natural Science Foundation of China(61562078),Xinjiang “Tianshan Cedar Plan” Young Top Talent Reserve Project:Research on Machine Learning Frontier Algorithm and Its Application and Scientific Research Program of Colleges and Universities in Xinjiang(XJEDU2021Y037).

摘要: 随着足球运动全球化程度的不断提升,全球转会市场愈发庞大,然而针对影响转会交易最关键的因素球员身价的深入模型及应用研究还较为缺乏。以国际足球联合会FIFA的官方球员数据库为研究对象,首先,在区分不同球员位置的前提下,运用Box-Cox变换、F-Score特征选择等方法对原始数据集进行特征处理;其次,通过XGBoost构建球员身价预测模型,并与Random Forest,Adaboost,GBDT,SVR等主流机器学习算法进行10折交叉验证实验对比,证明了XGBoost模型在R2,MAE,RMSE这3项指标上的性能优势;最后,在身价预测模型的基础上,融合SHAP框架分析不同位置影响球员身价的重要因素,为球员身价评估、身价对比分析、球员训练策略制定等场景提供决策支持。

关键词: 机器学习, 球员身价预测, 训练策略, XGBoost算法, SHAP值

Abstract: With the increasing globalization of football,the global player transfer market is becoming more and more prosperous.However,as the most important factor affecting player transfer transaction,the player’s transfer value lacks in-depth model and application research.In this paper,the FIFA’s official player database is taken as the research object.Firstly,on the premise of distinguishing different player positions,Box-Cox transformation,F-Score feature selection,etc.are used to perform feature processing on the original data set.Secondly,the player value prediction model is constructed by XGBoost,and compared with the main machine learning algorithms such as random forest,AdaBoost,GBDT and SVR for 10-fold cross validation experiments.Experimental results prove that the XGBoost model has a performance advantage over the existing models on the indicators of R2,MAE and RMSE.Finally,on the basis of constructing the value prediction model,this paper integrates the SHAP framework to analyze the important factors affecting the players’ value score in different positions,and provides decision support for some scenarios,such as player’s value score evaluation,comparative analysis,and training strategy formulation,etc.

Key words: Machine learning, Player’s value prediction, Training strategy, XGBoost algorithm, SHAP value

中图分类号: 

  • TP391
[1]Football Clubs’Valuation:The European Elite 2020[EB/OL].(2020-05-28)[2020-10-13].http://www.footballbenchmark.com/library/football_clubs_valuation_the_european_elite_2020.
[2]Global Transfer Market Report 2020[EB/OL].(2020-01-18)[2020-10-13].http://img.fifa.com/image/upload/ijiz9rtpkfnbhxwbqr70.pdf.
[3]AO X Q,GONG Y J,LI J.Prediction of soccer match results based on handicapdata[J].Journal of Chongqing Technology Business University(Natural Science),2016,33(6):86-89.
[4]NAZIM R,AIDA M,ROSHIDI D,et al.A Review on football match outcome prediction using bayesian networks [J].Journal of Physics:Conference Series,2018,1020(1):1-9.
[5]LEONARDO E,FRANCESCO P,NICOLA T.Combining historical data and bookmakers’ odds in modelling football scores[J].Statistical Modelling,2018,18(6):1-24.
[6]XIA Z C,YANG G B,ZHANG Z Y,et al.Video adaptationscheme for football sports video on mobile terminals[J].Journal of Chinese Computer Systems,2011,32(8):1660-1664.
[7]TONG M,DING L W,JI C L.Fusion of HCRF and AAM highlight events detection in soccer videos[J].Journal of Computer Research and Development,2014,51(1):225-236.
[8]YU J Q,ZHANG Q,WANG Z K,et al.Soccer highlight detection based on replay and affection arousal model[J].Chinese Journal of Computers,2014,37(6):1268-1280.
[9]CHAWLA S,ESTEPHAN J,GUDMUNDSSON J,et al.Classification of passes in football matches using spatiotemporal data[J].ACM Transactions on Spatial Algorithms and Systems,2017,3(6):11-25.
[10]GOES F R,KEMPE M,MEERHOFF L A,et al.Not every pass can be an assist:a data-driven model to measure pass effectiveness in professional soccer matches[J].Big Data,2018,7(1):57-70.
[11]REIN R,RAABE D,MEMMERT D.‘Which pass is better?’ Novel approaches to assess passing effectiveness in elite soccer[J].Hum Movement Science,2017,55(10):172-181.
[12]HERM S,CALLSEN-BRACKER H M,KREIS H.When thecrowd evaluates soccer players’ market values:Accuracy and evaluation attributes of an online community[J].Sport Management Review,2014,17(4):484-492.
[13]SCELLES N,HELLEU B,DURAND C,et al.Professionalsports firm values:Bringing new determinants to the foreground?A study of European soccer,2005-2013[J].Journal of Sports Economics,2014,17(7):1-18.
[14]WAN B.Study on the transfer of the super league players inwinter of the 2016 Season[J].Bulletinof Sport Science & Technology,2016,24(9):107-109.
[15]ROSSETTI G,CAPRONI V.Football Market Strategies:Think Locally,Trade Globally [C]//IEEE 16th International Confe-rence on Data Mining Workshops (ICDMW).Barcelona,Spain:IEEE,2016:152-159.
[16]CHEN C.The model construction of transfer price about football forward players in China football association super league[D].Beijing:Beijing Sport University,2017.
[17]YE X S,MA L,CHEN J T,et al.Study on the inter-team gap of players’ market value in Chinese football association super league[J].China Sport Science and Technology,2017,53(3):63-70.
[18]OLIVER M,ALEXANDER S,MARKUS W.Beyond crowdjudgments:data-driven estimation of market value in association football[J].European Journal of Operational Research,2017,263(2):611-624.
[19]PRABHNOOR S,PUNEET S L.Influence of crowd-sourcing,popularity and previous year statistics in market value estimation of football players[J].Journal of Discrete Mathematical Sciences & Cryptography,2019,22(2):113-126.
[20]KIRSCHSTEIN T,STEFFEN L.Assessing the market values of soccer players-a robust analysis of data from German 1.and 2.Bundesliga[J].Journal of Applied Statistics,2019,46(7):1336-1349.
[21]ZHAO Y.Analysis of professional soccer player transfer market based on complex network theory[D].Nanjing:Southeast University,2018.
[22]IMAN B,SEYED M R.A novel machine learning method for estimating football players’ value in the transfer market[J].Soft Computing,2020,25(10):2499-2511.
[23]HUO D.Evaluation of the value of basketball players based on wireless network and improved Bayesian algorithm[J].EURASIP Journal on Wireless Communications and Networking,2020,236(9):1-11.
[24]CHEN T,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]//Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2016:785-794.
[25]SONG L L,WANG S H,YANG C,et al.Application research of improved XGBoost in imbalanced data processing[J].Computer Science,2020,47(6):98-103.
[26]LI B S,LI L Z,SUN Y,et al.Intranet defense algorithm based on pseudo boosting decision tree[J].Computer Science,2018,45(4):157-162.
[27]LUNDBERG S M,LEE S I.A unified approach to interpreting model predictions[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.ACM,2017:4765-4774.
[28]CHEN Y W,LIN C J.Combining SVMs with various selection strategies[J].Studies in Fuzziness and Soft Computing,Berlin:Springer,2008:315-324.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[4] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[5] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[6] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[7] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[8] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[9] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[10] 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮.
一种基于异质模型融合的 Android 终端恶意软件检测方法
Android Malware Detection Method Based on Heterogeneous Model Fusion
计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103
[11] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[12] 许杰, 祝玉坤, 邢春晓.
机器学习在金融资产定价中的应用研究综述
Application of Machine Learning in Financial Asset Pricing:A Review
计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
[13] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[14] 李野, 陈松灿.
基于物理信息的神经网络:最新进展与展望
Physics-informed Neural Networks:Recent Advances and Prospects
计算机科学, 2022, 49(4): 254-262. https://doi.org/10.11896/jsjkx.210500158
[15] 张潆藜, 马佳利, 刘子昂, 刘新, 周睿.
以太坊Solidity智能合约漏洞检测方法综述
Overview of Vulnerability Detection Methods for Ethereum Solidity Smart Contracts
计算机科学, 2022, 49(3): 52-61. https://doi.org/10.11896/jsjkx.210700004
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!