计算机科学 ›› 2022, Vol. 49 ›› Issue (9): 33-40.doi: 10.11896/jsjkx.220300158

• 数据库&大数据&数据科学* 上一篇    下一篇

生成链接树:一种高数据真实性的反事实解释生成方法

王明, 武文芳, 王大玲, 冯时, 张一飞   

  1. 东北大学计算机科学与工程学院 沈阳 110169
  • 收稿日期:2022-03-16 修回日期:2022-05-30 出版日期:2022-09-15 发布日期:2022-09-09
  • 通讯作者: 王大玲(wangdaling@cse.neu.edu.cn)
  • 作者简介:(2001819@stu.neu.edu.cn)
  • 基金资助:
    国家自然科学基金(62172086,61872074)

Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity

WANG Ming, WU Wen-fang, WANG Da-ling, FENG Shi, ZHANG Yi-fei   

  1. School of Computer Science and Engineering,Northeastern University,Shenyang 110169,China
  • Received:2022-03-16 Revised:2022-05-30 Online:2022-09-15 Published:2022-09-09
  • About author:WANG Ming,born in 1997,postgra-duate,is a student member of China Computer Federation.His main research interests include interpretable machine learning and counterfactual explanation.
    WANG Da-ling,born in 1962,Ph.D,professor,Ph.D supervisor,is a senior member of China Computer Federation.Her main research interests include social media processing,interpretable dialogue generation and sentiment analysis.
  • Supported by:
    National Natural Science Foundation of China(62172086,61872074).

摘要: 超大的数据规模及结构复杂的深度模型在互联网数据的处理与应用方面表现出了优异的性能,但降低了人工智能(Artificial Intelligence,AI)系统的可解释性。反事实解释(Counterfactual Explanations,CE)作为可解释性领域研究中一种特殊的解释方法,受到了很多研究者的关注。反事实解释除了作为解释外,也可以被视为一种生成的数据。从应用角度出发,文中提出了一种生成具有高数据真实性反事实解释的方法,称为生成链接树(Generative Link Tree,GLT),采用分治策略与局部贪心策略,依据训练数据中出现的案例生成反事实解释。文中对反事实解释的生成方法进行了总结并选取了其中热门的数据集来验证GLT方法。此外,提出“数据真实性(Data Fidelity,DF)”的指标,用于评估反事实解释作为数据的有效性和潜在应用能力。与基线方法相比,GLT生成的反事实解释数据的真实性明显高于基线模型所生成的反事实解释。

关键词: 可解释性, 填充式, 反事实解释, 数据真实性

Abstract: The super large data scale and complex structure of deep models show excellent performance in processing and application of Internet data,but reduce the interpretability of AI systems.Counterfactual Explanations(CE) has received much attention from researchers as a special kind of explanation approach in the field of interpretability research.Counterfactual Explanations can be regarded as a kind of generated data in addition to being an explanation.From the viewpoint of application,this paper proposes an approach for generating counterfactual explanations with high data fidelity,called generative link tree(GLT),which uses a partitioning strategy and a local greedy strategy to construct counterfactual explanations based on the cases appearing in the training data.Moreover,it summarizes the generation methods of counterfactual explanations and select popular datasets to verify the GLT method.In addition,the metric of “Data Fidelity (DF)” is proposed to evaluate the fidelity and potential application of the counterfactual explanation as data from an experimental perspective.Compared with the baseline method,the data fidelity of the counterfactual explanation generated by the GLT method is significantly higher than that of the counterfactual explanation gene-rated by the baseline model.

Key words: Interpretability, Filling type, Counterfactual explanations, Data fidelity

中图分类号: 

  • TP391
[1]GUNNING D,STEFIK M,CHOI J,et al.XAI-Explainable artificial intelligence[J].Science Robotics,2019,4(37):eaay7120.
[2]WACHTER S,MITTELSTADT B,RUSSELL C.Counterfac-tual explanations without opening the black box:Automated decisions and the GDPR[J].Harvard Journal of Law & Techno-logy,2017,31:841-887.
[3]PEARL J,MACKENZIE D.The book of why:the new science of cause and effect[M].Basic Books,2018.
[4]MOLNAR C.Interpretable machine learning[M].Lulu.com,2020.
[5]VELMURUGAN M,OUYANG C,MOREIRA C,et al.Evalua-ting fidelity of explainable methods for predictive process analy-tics[C]//International Conference on Advanced Information Systems Engineering.Cham:Springer,2021:64-72.
[6]YUE Z,WANG T,SUN Q,et al.Counterfactual zero-shot and open-set visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:15404-15414.
[7]MOTHILAL R K,SHARMA A,TAN C.Explaining machine learning classifiers through diverse counterfactual explanations[C]//Proceedings of the 2020 Conference on Fairness,Accountability,and Transparency.2020:607-617.
[8]VERMA S,DICKERSON J,HINES K.Counterfactual explanations for machine learning:A review[J].arXiv:2010.10596,2020.
[9]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[10]USTUN B,SPANGHER A,LIU Y.Actionable recourse in linearclassification[C]//Proceedings of the Conference on Fairness,Accountability,and Transparency.2019:10-19.
[11]POYIADZI R,SOKOL K,SANTOS-RODRIGUEZ R,et al.FACE:feasible and actionable counterfactual explanations[C]//Proceedings of the AAAI/ACM Conference on AI,Ethics,and Society.2020:344-350.
[12]KEANE M T,SMYTH B.Good counterfactuals and where to find them:A case-based technique for generating counterfactuals for explainable ai(xai)[C]//International Conference on Case-Based Reasoning.Cham:Springer,2020:163-178.
[13]GOYAL Y,WU Z,ERNST J,et al.Counterfactual visual explanations[C]//International Conference on Machine Learning.PMLR,2019:2376-2384.
[14]LOOVEREN A V,KLAISE J.Interpretable counterfactual explanations guided by prototypes[C]//Joint European Confe-rence on Machine Learning and Knowledge Discovery in Databases.Cham:Springer,2021:650-665.
[15]SMYTH B,KEANE M T.A Few Good Counterfactuals:Gene-rating Interpretable,Plausible and Diverse Counterfactual Explanations[J].arXiv:2101.09056,2021.
[16]KARIMI A H,SCHÖLKOPF B,VALERA I.Algorithmicrecourse:from counterfactual explanations to interventions[C]//Proceedings of the 2021 ACM Conference on Fairness,Accountability,and Transparency.2021:353-362.
[17]GRATH R M,COSTABELLO L,VAN C L,et al.Interpretable credit application predictions with counterfactual explanations[J].arXiv:1811.05245,2018.
[18]RUSSELL C.Efficient search for diverse coherent explanations[C]//Proceedings of the Conference on Fairness,Accountabi-lity,and Transparency.2019:20-28.
[19]MAHAJAN D,TAN C,SHARMA A.Preserving causal con-straints in counterfactual explanations for machine learning classifiers[J].arXiv:1912.03277,2019.
[20]KARIMI A H,BARTHE G,BALLE B,et al.Model-agnostic counterfactual explanations for consequential decisions[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2020:895-905.
[21]KAUSHIK D,HOVY E,LIPTON Z C.Learning the difference that makes a difference with counterfactually-augmented data[J].arXiv:1909.12434,2019.
[22]ZHAO W,OYAMA S,KURIHARA M.Generating naturalcounterfactual visual explanations[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence.2021:5204-5205.
[23]GOMEZ O,HOLTER S,YUAN J,et al.Vice:Visual counterfactual explanations for machine learning models[C]//Procee-dings of the 25th International Conference on Intelligent User Interfaces.2020:531-535.
[24]KOHAVI R,BECKER B.Adult [EB/OL].2019.(1996-05-01).http://archive.ics.uci.edu/ml/datasets/Adult.
[25]HOFMANN H.Statlog(German Credit Data)[EB/OL].(1994-11-17).http://archive.ics.uci.edu/ml/datasets/statlog+(germag+credit+data).
[26]ASUNCION A,NEWMAN D.UCI machine learning repository [EB/OL].[2013-05-28].http://archive.ics.uci.edu/ml.
[27]MICCI-BARRECA D.A preprocessing scheme for high-cardi-nality categorical attributes in classification and prediction problems[J].ACM SIGKDD Explorations Newsletter,2001,3(1):27-32.
[28]BIAU G,SCORNET E.A random forest guided tour[J].Test,2016,25(2):197-227.
[29]SAFAVIAN S R,LANDGREBE D.A survey of decision tree classifier methodology[J].IEEE Transactions on Systems,Man,and Cybernetics,1991,21(3):660-674.
[30]RISH I.An empirical study of the naive Bayes classifier[J].IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence,2001,3(22):41-46.
[31]REFAEILZADEH P,TANG L,LIU H.Cross-validation[M].Encyclopedia of Database Systems,2009:532-538.
[1] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[2] 成科扬, 王宁, 崔宏纲, 詹永照.
基于局部注意力图互迁移的可解释性优化方法
Interpretability Optimization Method Based on Mutual Transfer of Local Attention Map
计算机科学, 2022, 49(5): 64-70. https://doi.org/10.11896/jsjkx.210400176
[3] 朝乐门, 王锐.
数据科学平台:特征、技术及趋势
Data Science Platform:Features,Technologies and Trends
计算机科学, 2021, 48(8): 1-12. https://doi.org/10.11896/jsjkx.210600033
[4] 张佳嘉, 张小洪.
多分支卷积神经网络肺结节分类方法及其可解释性
Multi-branch Convolutional Neural Network for Lung Nodule Classification and Its Interpretability
计算机科学, 2020, 47(9): 129-134. https://doi.org/10.11896/jsjkx.190700203
[5] 魏霖静,练智超,王联国,侯振兴.
基于词条与语意差异度量的文档聚类算法
Term and Semantic Difference Metric Based Document Clustering Algorithm
计算机科学, 2016, 43(12): 229-233. https://doi.org/10.11896/j.issn.1002-137X.2016.12.042
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!