基于深度强化学习的长期因果效应估计

doi:10.11896/jsjkx.250600043

Abstract

Abstract: Causal effect estimation aims to calculate the magnitude of the causal effect of the treatment variable on the outcome variable.The existing prevalent causal effect estimation methods are mainly applicable to static data or a single time point in time series,and cannot effectively estimate the cumulative impact of the treatment variable on the outcome variable over a long period of time.To solve this problem,the long-term causal effect estimation method based on traditional reinforcement learning fits the long-term potential outcomes through linear basis functions,thereby calculating the long-term causal effect.However,due to the limited expressive power of linear basis functions in complex scenarios,existing methods cannot accurately identify weak causal effects,and at the same time,there will be significant performance degradation problems when the data dimension increases.In response to the above problems,this paper proposes a long-term causal effect estimation method based on deep reinforcement lear-ning.This method uses the dueling network to estimate long-term potential outcomes,which can effectively estimate the impact of the treatment variable on the outcome variable,thereby greatly improving the algorithm’s ability to identify weak causal effects.Meanwhile,the proposed method avoids the biases that occur when estimating long-term potential outcomes due to improper selection of basis functions.Experimental results show that the proposed method outperforms existing algorithms on statistical synthetic datasets and order scheduling simulation datasets.

Key words: Long-term causal effect estimation, Potential outcome model, Deep reinforcement learning

CLC Number:

TP181

LIU Jiaqi, WANG Yujie, XIANG Guodu, YU Kui, CAO Fuyuan. Long-term Causal Effect Estimation Based on Deep Reinforcement Learning[J].Computer Science, 2026, 53(4): 235-244.

References

[1]KESSLER R C,BOSSARTE R M,LUEDTKE A,et al.Machine learning methods for developing precision treatment rules with observational data[J].Behaviour Research and Therapy,2019,120:103412.
[2]ASSAEL H,ISHIHARA M,KIM B J.Accounting for causality when measuring sales lift from television advertising:Television campaigns are shown to be more effective for lighter brand users[J].Journal of Advertising Research,2021,61(1):3-11.
[3]SHALIT U.Can we learn individual-level treatment policiesfrom clinical data?[J].Biostatistics,2020,21(2):359-362.
[4]RUBIN D B.Estimating causal effects of treatments in randomi-zed and nonrandomized studies[J].Journal of Educational Psychology,1974,66(5):688.
[5]PEARL J,MACKENZIE D.The book of why:the new science of cause and effect[M]//Basic Books.2018.
[6]WU A P,YUAN J K,KUANG K,et al.Learning decomposed representations for treatment effect estimation[J].IEEE Transactions on Knowledge and Data Engineering,2022,35(5):4989-5001.
[7]CAO D,ENOUEN J,WANG Y,et al.Estimating treatmenteffects from irregular time series observations with hidden confounders[C]//Proceedings of the 37th AAAI Conference on Artificial Intelligence.2023:6897-6905.
[8]FOUGÈRE D,JACQUEMET N.Policy evaluation using causal inference methods[M]//Handbook of Research Methods and Applications in Empirical Microeconomics.Edward Elgar Publishing,2021:294-324.
[9]SHI C C,WANG X Y,LUO S K,et al.Dynamic causaleffects evaluation in a/b testing with a reinforcement learning framework[J].Journal of the American Statistical Association,2023,118(543):2059-2071.
[10]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M]//A Bradford Book.2018.
[11]ROSENBAUM P R,RUBIN D B.The central role of the propensity score in observational studies for causal effects[J].Bio-metrika,1983,70(1):41-55.
[12]RUBIN D B.Matching to remove bias in observational studies[J].Biometrics,1973,29(1):159-183.
[13]HORVITZ D G,THOMPSON D J.A generalization of sampling without replacement from a finite universe[J].Journal of the American Statistical Association,1952,47(260):663-685.
[14]SCHARFSTEIN D O,ROTNITZKY A,ROBINS J M.Adjusting for nonignorable drop-out using semiparametric nonresponse models[J].Journal of the American Statistical Association,1999,94(448):1096-1120.
[15]SHALIT U,JOHANSSON F D,SONTAG D.Estimating Individual Treatment Effect:Generalization Bounds and Algorithms[C]//Proceedings of the 34th International Conference on Machine Learning.2017:3076-3085.
[16]LOUIZOS C,SHALIT U,MOOIJ J M,et al.Causal Effect Inference with Deep Latent-Variable Models[C]//Proceedings of the 31st Conference on Neural Information Processing Systems.2017:6446-6456.
[17]YAO L Y,LI S,LI Y L,et al.Representation Learning forTreatment Effect Estimation from Observational Data[C]//Proceedings of the 32nd Conference on Neural Information Processing Systems.2018:2638-2648.
[18]YOON J,JORDON J,VANDERSCHAAR M.Ganite:Estimation of Individualized Treatment Effects Using Generative Adversarial Nets[C]//Proceedings of the 6th International Conference on Learning Representations.2018:50-60.
[19]WANG H,CHEN Z C,FAN J J,et al.Optimal transport fortreatment effect estimation[C]//Proceedings of the 38th Conference on Neural Information Processing Systems.2024:1-21.
[20]ROBINS J.A new approach to causal inference in mortalitystudies with a sustained exposure period－application to control of the healthy worker survivor effect[J].Mathematical Modelling,1986,7(9/10/11/12):1393-1512.
[21]XU Y,XU Y,SARIA S.A non-parametric bayesian approach for estimating treatment-response curves from sparse time series[C]//Proceedings of the 1st Machine Learning for Healthcare.2016:282-300.
[22]QIAN Z Z,ZHANG Y,BICA I,et al.Synctwin:Treatmenteffect estimation with longitudinal outcomes[C]//Proceedings of the 35th Conference on Neural Information Processing Systems.2021:3178-3190.
[23]LIM B,ALAA A,VAN DER SCHAAR M.Forecasting treatment responses over time using recurrent marginal structural networks[C]//Proceedings of the 32nd Conference on Neural Information Processing Systems.2018:7494-7504.
[24]BICA I,ALAA A M,JORDON J,et al.Estimating counterfactual treatment outcomes over time through adversarially balanced representations[J].arXiv:2002.04083,2020.
[25]LI R,HU S,LU M Y,et al.G-Net:a recurrent network approach to g-computation for counterfactual prediction under a dynamic treatment regime[C]//Proceedings of Machine Lear-ning Research.2021:282-299.
[26]MELNYCHUK V,FRAUEN D,FEUERRIEGEL S.Causaltransformer for estimating counterfactual outcomes[C]//Proceedings of the 39th International Conference on Machine Learning.2022:15293-15329.
[27]ROBINS J M.Optimal structural nested models for optimal sequential decisions[C]//Proceedings of the 2nd Seattle Symposium in Biostatistics:Analysis of Correlated Data.2004:189-326.
[28]WANG Z Y,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33th International Conference on Machine Learning.2016:1995-2003.
[29]BELLMAN R.Dynamic programming[J].Science,1966,153(3731):34-37.

Related Articles 15

[1]	LI Fang, YUAN Baochun, SHEN Hang, WANG Tianjing, BAI Guangwei. Deep Reinforcement Learning-based Aircraft Task Offloading in Low Earth Orbit Satellite Networks [J]. Computer Science, 2026, 53(2): 406-415.
[2]	WANG Haoyan, LI Chongshou, LI Tianrui. Reinforcement Learning Method for Solving Flexible Job Shop Scheduling Problem Based onDouble Layer Attention Network [J]. Computer Science, 2026, 53(1): 231-240.
[3]	CHEN Jintao, LIN Bing, LIN Song, CHEN Jing, CHEN Xing. Dynamic Pricing and Energy Scheduling Strategy for Photovoltaic Storage Charging Stations Based on Multi-agent Deep Reinforcement Learning [J]. Computer Science, 2025, 52(9): 337-345.
[4]	ZHANG Yongliang, LI Ziwen, XU Jiahao, JIANG Yuchen, CUI Ying. Congestion-aware and Cached Communication for Multi-agent Pathfinding [J]. Computer Science, 2025, 52(8): 317-325.
[5]	HUO Dan, YU Fuping, SHEN Di, HAN Xueyan. Research on Multi-machine Conflict Resolution Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(7): 271-278.
[6]	WU Zongming, CAO Jijun, TANG Qiang. Online Parallel SDN Routing Optimization Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(6A): 240900018-9.
[7]	WANG Chenyuan, ZHANG Yanmei, YUAN Guan. Class Integration Test Order Generation Approach Fused with Deep Reinforcement Learning andGraph Convolutional Neural Network [J]. Computer Science, 2025, 52(6): 58-65.
[8]	ZHAO Xuejian, YE Hao, LI Hao, SUN Zhixin. Multi-AGV Path Planning Algorithm Based on Improved DDPG [J]. Computer Science, 2025, 52(6): 306-315.
[9]	LI Yuanbo, HU Hongchao, YANG Xiaohan, GUO Wei, LIU Wenyan. Intrusion Tolerance Scheduling Algorithm for Microservice Workflow Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(5): 375-383.
[10]	ZHENG Longhai, XIAO Bohuai, YAO Zewei, CHEN Xing, MO Yuchang. Graph Reinforcement Learning Based Multi-edge Cooperative Load Balancing Method [J]. Computer Science, 2025, 52(3): 338-348.
[11]	DU Likuan, LIU Chen, WANG Junlu, SONG Baoyan. Self-learning Star Chain Space Adaptive Allocation Method [J]. Computer Science, 2025, 52(3): 359-365.
[12]	HUO Xingpeng, SHA Letian, LIU Jianwen, WU Shang, SU Ziyue. Windows Domain Penetration Testing Attack Path Generation Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(3): 400-406.
[13]	XU Donghong, LI Bin, QI Yong. Task Scheduling Strategy Based on Improved A2C Algorithm for Cloud Data Center [J]. Computer Science, 2025, 52(2): 310-322.
[14]	PENG Junlong, FAN Jing. Hybrid Reinforcement Learning Algorithm Combined with 2-opt for Solving Traveling Salesman Problem [J]. Computer Science, 2025, 52(11A): 250200121-8.
[15]	XIA Weihao, WANG Jinlong. Research on Multi-agent Joint Navigation Strategy Based on Improved Deep ReinforcementLearning [J]. Computer Science, 2025, 52(11A): 250200095-7.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Long-term Causal Effect Estimation Based on Deep Reinforcement Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0