计算机科学 ›› 2025, Vol. 52 ›› Issue (11): 90-97.doi: 10.11896/jsjkx.240900061

• 数据库&大数据&数据科学 • 上一篇    下一篇

对抗生成式的多敏感属性数据去偏方法

王文鹏, 葛洪伟, 李婷   

  1. 康养智能化技术教育部工程研究中心(江南大学) 江苏 无锡 214122
    江南大学人工智能与计算机学院 江苏 无锡 214122
  • 收稿日期:2024-09-10 修回日期:2024-12-16 出版日期:2025-11-15 发布日期:2025-11-06
  • 通讯作者: 葛洪伟(ghw8601@163.com)
  • 作者简介:(2734847275@qq.com)
  • 基金资助:
    国家自然科学基金(61806006)

Adversarial Generative Multi-sensitive Attribute Data Biasing Method

WANG Wenpeng, GE Hongwei, LI Ting   

  1. Engineering Research Center of Intelligent Technology for Healthcare,Ministry of Education,Jiangnan University,Wuxi,Jiangsu 214122,China
    School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi,Jiangsu 214122,China
  • Received:2024-09-10 Revised:2024-12-16 Online:2025-11-15 Published:2025-11-06
  • About author:WANG Wenpeng,born in 1998,postgraduate.His main research interests include recommendation system and machine learning.
    GE Hongwei,born in 1967,Ph.D,professor,Ph.D supervisor.His main research interests include artificial intelligence,pattern recognition,machine learning,image processing and analysis.
  • Supported by:
    National Natural Science Foundation of China(61806006).

摘要: 针对消除数据中敏感属性与非敏感属性之间的相关性、减轻实现公平性对模型准确性的损失以及多敏感属性去偏的问题,提出一种对抗生成式的多敏感属性数据去偏方法。在多敏感属性去偏问题上,该方法通过多个敏感属性的组合值来划分群组,并通过消除各群组与多敏感属性组合的相关性来提升各群组预测结果的公平性。在消除数据中敏感属性与非敏感属性之间的相关性问题上,采用自编码器与预测敏感属性的网络进行对抗式训练,这种训练机制能够深入挖掘并消除群组中潜藏的与敏感属性相关的信息,从而在保留数据有用性的同时,显著降低偏见。在减轻实现公平性对模型准确性损失,最大化准确性与公平性之间平衡的问题上,通过引入预测网络,并利用其损失函数作为约束,优化编码器的信息提取能力,确保在数据编码过程中能够更精准地捕捉关键信息,避免数据在去偏过程中过度牺牲模型的预测性能。在3个真实数据集上进行数据去偏实验,将经编码器编码的数据应用于逻辑回归模型,公平性提升50.5%~84%,验证了该教据去偏方法的有效性。综合考虑公平性、准确性以及公平性与准确性的平衡,该去偏方法优于其他去偏算法。

关键词: 数据去偏, 机器学习, 对抗学习, 自编码器

Abstract: This paper proposes a method for multi-sensitive attribute data debiasing,leveraging adversarial learning and autoencoder to eliminate correlations between sensitive and non-sensitive attributes,minimize the impact on model accuracy when striving for fairness,and address the issue of multi-sensitive attribute debiasing.In addressing multi-sensitive attribute debiasing,this method groups based on the combined values of multiple sensitive attributes,enhancing the fairness of each group's predictions by eliminating group correlations with these sensitive attribute combinations.To eliminate correlations between sensitive and non-sensitive attributes,an adversarial training approach is employed,utilizing auto-encoders alongside networks predicting sensitive attributes.This training effectively uncovers and eliminates latent sensitive attribute-related information within the groups,signi-ficantly reducing bias while retaining data utility.To mitigate the impact on model accuracy from striving for fairness and optimize the balance between accuracy and fairness,a prediction network is introduced.Its loss function is used as a constraint to enhance the encoder's ability to extract information,ensuring more precise capture of key information during data encoding and preventing excessive sacrifice of predictive performance during the debiasing process.Data debiasing experiments on three real datasets are conducted,applying the encoded data to logistic regression models.The fairness improvements range from 50.5% to 84%,validating the effectiveness of the debiasing method.Considering fairness,accuracy,and their balance,this debiasing method outperforms other debiasing algorithms.

Key words: Data depolarization, Machine learning, Adversarial learning, Auto-encoder

中图分类号: 

  • TP391
[1]MEHRABI N,MORSTATTER F,SAXENA N,et al.A survey on bias and fairness in machine learning[J].ACM computing surveys,2021,54(6):1-35.
[2]PEDRESHI D,RUGGIERI S,TURINI F.Discrimination-aware data mining[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2008:560-568.
[3]CATON S,HAAS C.Fairness in machine learning:A survey[J].ACM Computing Surveys,2024,56(7):1-38.
[4]CHEN Z,ZHANG J M,SARRO F,et al.MAAT:a novel ensemble approach to addressing fairness and performance bugs for machine learning software[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2022:1122-1134.
[5]KAMIRAN F,CALDERS T.Data preprocessing techniques for classification without discrimination[J].Knowledge and Information Systems,2012,33(1):1-33.
[6]FELDMAN M,FRIEDLER S A,MOELLER J,et al.Certifying and removing disparate impact[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2015:259-268.
[7]ZHANG L,WU Y,WU X.Achieving non-discrimination in data release[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:1335-1344.
[8]CHAI J,WANG X.Fairness with adaptive weights[C]//International Conference on Machine Learning.PMLR,2022:2853-2866.
[9]LI P,LIU H.Achieving fairness at no utility cost via data re-weighing with influence[C]//International Conference on Machine Learning.PMLR,2022:12917-12930.
[10]XU D,YUAN S,ZHANG L,et al.Fairgan:Fairness-aware generative adversarial networks[C]//2018 IEEE International Conference on Big Data(Big Data).IEEE,2018:570-575.
[11]PETROVIĆA,NIKOLIĆ M,JOVANOVIĆ M,et al.Fair classification via Monte Carlo policy gradient method[J].Engineering Applications of Artificial Intelligence,2021,104:104398.
[12]ZHANG B H,LEMOINE B,MITCHELL M.Mitigating un-wanted biases with adversarial learning[C]//Proceedings of the 2018 AAAI/ACM Conference on AI,Ethics,and Society.2018:335-340.
[13]WEI S,NIETHAMMER M.The fairness-accuracy Pareto front[J].Statistical Analysis and Data Mining:The ASA Data Science Journal,2022,15(3):287-302.
[14]HU Z,XU Y,TIAN X.Adaptive priority reweighing for genera-lizing fairness improvement[C]//International Joint Conference on Neural Networks(IJCNN 2023).IEEE,2023:1-8.
[15]KAMIRAN F,MANSHA S,KARIM A,et al.Exploiting reject option in classification for social discrimination control[J].Information Sciences,2018,425:18-33.
[16]CHAKRABORTY J,MAJUMDER S,YU Z,et al.Fairway:a way to build fair ML software[C]//Proceedings of the 28th ACM Joint Meeting on European Software Engineering Confe-rence and Symposium on the Foundations of Software Enginee-ring.2020:654-665.
[17]D'ALOISIO G,D'ANGELO A,DI MARCO A,et al.Debiaser for Multiple Variables to enhance fairness in classification tasks[J].Information Processing & Management,2023,60(2):103226
[18]CANALLI Y,BRAIDA F,ALVIM L,et al.Fair TransitionLoss:From label noise robustness to bias mitigation[J].Know-ledge-Based Systems,2024,294:111711.
[19]KIM D,PARK S,HWANG S,et al.Fair classification by lossbalancing via fairness-aware batch sampling[J].Neurocompu-ting,2023,518:231-241.
[20]KHALILIM M,ZHANG X,ABROSHAN M.Loss balancing for fair supervised learning[C]//International Conference on Machine Learning.PMLR,2023:16271-16290.
[21]LIANG Y,CHEN C,TIAN T,et al.Fair classification via domain adaptation:A dual adversarial learning approach[J].Frontiers in Big Data,2023,5:129.
[22]GRARI V,LAMPRIER S,DETYNIECKI M.Adversarial lear-ning for counterfactual fairness[J].Machine Learning,2023,112(3):741-763.
[23]CHEN H,ZHU T,ZHANG T,et al.Privacy and fairness in Federated learning:on the perspective of Tradeoff[J].ACM Computing Surveys,2023,56(2):1-37.
[24]VUCINICH S,ZHU Q.The Current State and Challenges ofFairness in Federated Learning[J].IEEE Access,2023,11:80903-80914.
[25]ANGWIN J,LARSON J,MATTU S,et al.Machine bias[M]//Ethics of Data and Analytics.Auerbach Publications,2022:254-264.
[26]ASUNCION A,NEWMAN D.UCI machine learning repository[DB/OL].https://archive.ics.uci.edu/ml.
[27]FOULDS J R,ISLAM R,KEYA K N,et al.An intersectional definition of fairness[C]//2020 IEEE 36th International Confe-rence on Data Engineering(ICDE).IEEE,2020:1918-1921.
[28]GHOSH A,GENUIT L,REAGAN M.Characterizing intersec-tional group fairness with worst-case comparisons[C]//Artificial Intelligence Diversity,Belonging,Equity,and Inclusion.PMLR,2021:22-34.
[29]AGARWAL A,BEYGELZIMER A,DUDÍK M,et al.A reductions approach to fair classification[C]//International Confe-rence on Machine Learning.PMLR,2018:60-69.
[30]BIRD S,DUDÍK M,EDGAR R,et al.Fairlearn:A toolkit for assessing and improving fairness in AI:MSR-TR-2020-32 [R].Microsoft,2020.
[31]FELDMAN M,FRIEDLER S A,MOELLER J,et al.Certifying and removing disparate impact[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2015:259-268.
[32]ZAFAR M B,VALERA I,ROGRIGUEZ M G,et al.Fairnessconstraints:Mechanisms for fair classification[C]//Artificial Intelligence and Statistics.PMLR,2017:962-970.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!