Computer Science ›› 2026, Vol. 53 ›› Issue (6): 339-349.doi: 10.11896/jsjkx.250900068

• Database & Big Data & Data Science • Previous Articles     Next Articles

Data Price Prediction and Interpretability Analysis Based on GWO-XGBoost Model andSHAP Values

YANG Jian1,2, CAO Nan1, JIN Dayi1, ZHANG Jiaqi1, YANG Taotao1   

  1. 1 School of Information,Shanxi University of Finance and Economics,Taiyuan 030006,China
    2 Shanxi Key Laboratory of Data Element Innovation and Economic Decision Analysis,Taiyuan 030006,China
  • Received:2025-09-11 Revised:2025-11-24 Online:2026-06-15 Published:2026-06-09
  • About author:YANG Jian,born in 1987,associate professor,master's supervisor.His main research interests include human-centered computing,machine learning and data pricing.
  • Supported by:
    National Social Science Fundation of China(23BJY205),Ministry of Education Humanities and Social Science Project(21YJCZH197) and Shanxi Provincial Research Foundation for Basic Research(202303021221184).

Abstract: With the rapid development of data marketization,data pricing has become a critical issue that needs to be addressed.To address the opaque and inadequate interpretability of pricing mechanisms in this process,this paper proposes a data pricing model based on the grey wolf algorithm(GWO)optimized for XGBoost.Firstly,a raw dataset is obtained from the Youyi Data platform and subjected to descriptive statistical analysis.The data is then preprocessed by removing outliers,one-hot encoding,logarithmic transformation,and normalization.Feature correlation is analyzed using the Spearman correlation coefficient.Finally,the GWO algorithm is used to optimize XGBoost hyperparameters to improve the model's predictive performance.Experimental results indicate that the GWO-XGBoost model achieves a coefficient of determination(R2)of 0.971,significantly outperforming five baseline models.The GWO-XGBoost model also achieves significant improvements in metrics such as mean squared error(MSE),root mean squared error(RMSE),and mean absolute error(MAE)compared to traditional hyperparameter optimization methods such as grid search and random search.Furthermore,using the SHAP interpretability analysis method,an in-depthana-lysis is conducted of the model's prediction results from both global and local perspectives,identifying the data update interval as the dominant factor influencing the model's prediction results,contributing 95.16% of the total prediction increment.This research not only provides a scientific and rational mechanism for data pricing but also provides a clear direction for subsequent model optimization,which is of great significance for promoting the healthy development of the data element market.

Key words: Data pricing, Grey wolf algorithm, XGBoost algorithm, SHAP analysis

CLC Number: 

  • TP391
[1]AZCOITIA S A,IORDANOU C,LAOUTARIS N.Understan-ding the price of data in commercial data marketplaces[C]//2023 IEEE 39th International Conference on Data Engineering(ICDE).IEEE,2023:3718-3728.
[2]PEI J.A survey on data pricing:from economics to data science[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(10):4586-4608.
[3]ZHANG M,BELTRÁN F,LIU J.A survey of data pricing for data marketplaces[J].IEEE Transactions on Big Data,2023,9(4):1038-1056.
[4]HU-BOLZ J,REED M,ZHANG K,et al.Federated data acquisition market:Architecture and a mean-field based data pricing strategy[J].High-Confidence Computing,2025,5(1):100232.
[5]MIAO X,PENG H,HUANG X,et al.Modern data pricing mo-dels:Taxonomy and comprehensive survey[J].arXiv:2306.04945,2023.
[6]YANG J,ZHAO C,XING C.Big data market optimization pricing model based on data quality[J].Complexity,2019,2019(1):5964068.
[7]MAJUMDAR R,GURTOO A,MAILECKAL M.Developing a data pricing framework for data exchange[J].Future Business Journal,2025,11(1):4.
[8]YANG J,XING C.Personal Data Market Optimization PricingModel Based on Privacy Level[J].Information,2019,10:123.
[9]LIANG J,YUAN C.Data price determinants based on a hedonic pricing model[J].Big Data Research,2021,25:100249.
[10]CONG Z,LUO X,PEI J,et al.Data pricing in machine learning pipelines[J].Knowledge and Information Systems,2022,64(6):1417-1455.
[11]GUO X,ZHANG L.Dynamic Pricing Models in E-Commerce:Exploring Machine Learning Techniques to Balance Profitability and Customer Satisfaction[J].IEEE Access,2025,13:72994-73002.
[12]LI D,ZHAO Y,WANG Y,et al.The privacy preserving auction mechanisms in iot-based trading market:A survey[J].Internet of Things,2024,26:101178.
[13]KAKKAR R,GUPTA R,ALSHEHRI M D,et al.Block-CPS:Blockchain and non-cooperative game-based data pricing scheme for car sharing[J].IEEE Internet of Things Journal,2022,9(24):25780-25790.
[14]LI J T,AN X Q,LI Q Y,et al.Application of XGBoost algorithm in the optimization of pollutant concentration[J].Atmospheric Research,2022,276:106238.
[15]JABEUR S B,MEFTEH-WALI S,VIVIANI J L.Forecasting gold price with the XGBoost algorithm and SHAP interaction values[J].Annals of Operations Research,2024,334(1):679-699.
[16]XIAO Y N,CUI H,KHURMA R A,et al.Artificial lemming algorithm:a novel bionic meta-heuristic technique for solving real-world engineering optimization problems[J].Artificial Intelligence Review,2025,58(3):84.
[17]GHASEMI M,ZARE M,TROJOVSKÝ P,et al.Optimizationbased on the smart behavior of plants with its engineering applications:Ivy algorithm[J].Knowledge-Based Systems,2024,295:111850.
[18]RAO C J,YING L,MARK G.Credit risk assessment mechanism of personal auto loan based on PSO-XGBoost Model[J].Complex & Intelligent Systems,2023,9(2):1391-1414.
[19]PRENDIN F,PAVAN J,CAPPON G,et al.The importance of interpreting machine learning models for blood glucose prediction in diabetes:an analysis using SHAP[J].Scientific Reports,2023,13(1):16865.
[20]SANTOS M R,AFFONSO G,IGNACIO S G.SHapley additive explanations(SHAP)for efficient feature selection in rolling bearing fault diagnosis[J].Machine Learning and Knowledge Extraction,2024,6(1):316-341.
[21]ANTONINI A S,TANZOLA J,ASIAIN L,et al.MachineLearning model interpretability using SHAP values:Application to Igneous Rock Classification task[J].Applied Computing and Geosciences,2024,23:100178.
[22]FUMAGALLI F,MUSCHALIK M,KOLPACZKI P,et al.SHAP-IQ:Unified approximation of any-order shapley interactions[C]//NeurIPS.2023:11515-11551.
[23]MIRJALILI S,MIRJALILI S M,LEWIS A.Grey wolf optimizer[J].Advances in Engineering Software,2014,69:46-61.
[24]WANG L,ZHANG Q,YANG S,et al.Multi-strategy grey wolf optimization algorithm forglobal optimization and engineering applications[J].Journal of Systems Science and Systems Engineering,2025,34(2):203-230.
[25]YANG J,GUAN J.A heart disease prediction model based on feature optimization and smote-Xgboost algorithm[J].Information,2022,13(10):475.
[26]WANG Y,CHENG W,JIN Y,et al.An XGBoost-SHAP Model for Energy Demand Prediction with Boruta-Lasso Feature Selection[J].IEEE Access,2025,13:135806-135821.
[27]ZHANG J,ZHAO Z.Corporate ESG rating prediction based on XGBoost-SHAP interpretable machine learning model[J].Expert Systems with Applications,2026,295:128809.
[28]SHEN J X,ZHAO X S.Research on data resource pricing me-thod based on stacking multi-algorithm fusion model[J].Information Studies:Theory & Application,2023,46(1):179-186.
[29]SHEN J X,ZHAO X S.Research on data resource value assessment method based on dynamic stacked-GBDT ensemble lear-ning[J].Science and Technology Management Research,2023,43(1):53-61.
[30]YANG J,CHEN Y,CHANG L,et al.Research on Data Re-source Pricing Method Based on SSA-XGBoost Model[J].Journal of Systems Science and Information,2025,13(1):116-136.
[1] CUI Jinjia, ZENG Chen, WANG Lu, PENG Xiaohui. Analysis of Data Trading Models and Transaction Challenges [J]. Computer Science, 2026, 53(4): 121-133.
[2] LIAO Bin, WANG Zhi-ning, LI Min, SUN Rui-na. Integrating XGBoost and SHAP Model for Football Player Value Prediction and Characteristic Analysis [J]. Computer Science, 2022, 49(12): 195-204.
[3] GONG Zhui-fei, WEI Chuan-jia. Complex Network Link Prediction Method Based on Topology Similarity and XGBoost [J]. Computer Science, 2021, 48(12): 226-230.
[4] QUAN Yi-xuan, ZHENG Jia-li, LUO Wen-cong, LIN Zi-han, XIE Xiao-de. Improved Grey Wolf Optimizer for RFID Network Planning [J]. Computer Science, 2021, 48(1): 253-257.
[5] ZHOU Wen-xiang, QIAO Xue-gong. Anycast Routing Algorithm for Wireless Sensor Networks Based on Energy Optimization [J]. Computer Science, 2020, 47(12): 291-295.
[6] ZHAO Rui-jie, SHI Yong, ZHANG Han, LONG Jun, XUE Zhi. Webshell File Detection Method Based on TF-IDF [J]. Computer Science, 2020, 47(11A): 363-367.
[7] ZHAO Yun-tao, CHEN Jing-cheng, LI Wei-gang. Multi-objective Grey Wolf Optimization Hybrid Adaptive Differential Evolution Mechanism [J]. Computer Science, 2019, 46(11A): 83-88.
[8] CUI Yan-peng,SHI Ke-xing,HU Jian-wei. Research of Webshell Detection Method Based on XGBoost Algorithm [J]. Computer Science, 2018, 45(6A): 375-379.
[9] XIAO Jing-jie, CHEN Zhi-yun. Focused Crawling Based on Grey Wolf Algorithms [J]. Computer Science, 2018, 45(11A): 146-148.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!