计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 184-190.doi: 10.11896/jsjkx.210400234
徐国宁1, 陈奕芃1, 陈一鸣1, 陈晋音1,2, 温浩3
XU Guo-ning1, CHEN Yi-peng1, CHEN Yi-ming1, CHEN Jin-yin1,2, WEN Hao3
摘要: 深度学习技术在图像识别、自然语言处理、金融预测等领域具有广泛应用,其分析结果一旦存在偏见将给个人和群体带来负面影响,因此在保障深度学习模型的性能不受影响的前提下提高模型的公平性至关重要。针对数据的偏见信息不只是敏感属性,属性之间的关联性使非敏感属性也会带有偏见信息,因此只考虑敏感属性的去偏算法依然存在偏见问题。为了消除数据中关联属性的敏感信息对深度学习的分类结果带来偏见,提出一种基于生成式对抗网络的数据去偏方法,模型的损失函数结合公平性约束及准确性损失两种约束优化,利用对抗式编码消除偏见信息,生成去偏数据集;并通过生成器与判别器的交替博弈训练,减少数据集无偏信息的损失,在保证主任务分类准确率的同时消除数据中的偏见,从而提高后续分类任务的公平性。最终,在多个真实数据集上展开数据去偏实验,验证了该去偏算法的有效性。
中图分类号:
[1] BRENNAN T,DIETERICH W,EHRET B.Evaluating the Predictive Validity of the Compas Risk and Needs Assessment System[J].Criminal Justice & Behavior,2008,36(1):21-40. [2] LIU L T,DEAN S,ROLF E,et al.Delayed impact of fair machine learning[C]//International Conference on Machine Lear-ning.PMLR,2018:3150-3158. [3] CHAR D S,SHAH N H,MAGNUS D.Implementing Machine Learning in Health Care-Addressing Ethical Challenges[J].New England Journal of Medicine,2018,378(11):981-983. [4] WADSWORTH C,VERA F,PIECH C.Achieving fairnessthrough adversarial learning:an application to recidivism prediction[J].arXiv:1807.00199,2018. [5] LICHMAN M.UCI machine learning repository[EB/OL].URL-http://archive.ics.uci.edu/ml. [6] WANG T,ZHAO J,YATSKAR M,et al.Balanced datasets are not enough:Estimating and mitigating gender bias in deep image representations[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:5310-5319. [7] KAMIRAN F,CALDERS T.Data preprocessing techniques for classification without discrimination[J].Knowledge and Information Systems,2012,33(1):1-33. [8] DWORK C,HARDT M,PITASSI T,et al.Fairness throughawareness[C]//Proceedings of the 3rd Innovations in Theoretical Computer Science Conference.2012:214-226. [9] FELDMAN M,FRIEDLER S A,MOELLER J,et al.Certifying and removing disparate impact[C]//proceedings of the 21th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2015:259-268. [10] KAMISHIMA T,AKAHO S,ASOH H,et al.Fairness-awareclassifier with prejudice remover regularizer[C]//Joint Euro-pean Conference on Machine Learning and Knowledge Discovery in Databases.Berlin:Springer,2012:35-50. [11] CALMON F P,WEI D,VINZAMURI B,et al.Optimized pre-processing for discrimination prevention[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:3995-4004. [12] LUONG B T,RUGGIERI S,TURINI F.k-NN as an implementation of situation testing for discrimination discovery and prevention[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011:502-510. [13] ZHANG L,WU Y,WU X.Achieving non-discrimination in data release[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:1335-1344. [14] KAMIRAN F,CALDERS T.Classifying without discriminating[C]//2009 2nd International Conference on Computer,Control and Communication.IEEE,2009:1-6. [15] ŽLIOBAITE I,KAMIRAN F,CALDERS T.Handling condi-tional discrimination[C]//2011 IEEE 11th International Confe-rence on Data Mining.IEEE,2011:992-1001. [16] ZEMEL R,WU Y,SWERSKY K,et al.Learning fair representations[C]//International Conference on Machine Learning.PMLR,2013:325-333. [17] AGARWAL A,BEYGELZIMER A,DUDIK M,et al.A reductions approach to fair classification[C]//International Confe-rence on Machine Learning.PMLR,2018:60-69. [18] ZHANG B H,LEMOINE B,MITCHELL M.Mitigating un-wanted biases with adversarial learning[C]//Proceedings of the 2018 AAAI/ACM Conference on AI,Ethics,and Society.2018:335-340. [19] CELIS L E,HUANG L,KESWANI V,et al.Classification with fairness constraints:A meta-algorithm with provable guarantees[C]//Proceedings of the Conference on Fairness,Accountability,and Transparency.2019:319-328. [20] ZHANG P,WANG J,SUN J,et al.White-box fairness testing through adversarial sampling[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.2020:949-960. [21] GALHOTRA S,BRUN Y,MELIOU A.Fairness testing:testing software for discrimination[C]//Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.2017:498-510. [22] UDESHI S,ARORA P,CHATTOPADHYAY S.Automated directed fairness testing[C]//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.New York:ACM,2018:98-108. [23] Adult data[EB/OL].[2020-07-26].http://tinyurl.com/UCI-Adult,1996. [24] ANGWIN J,LARSON J,MATTU S,et al.Machine bias.riskassessments in criminal sentencing[EB/OL].(2016-05-23)[2020-07-26].https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing,2016. [25] HARDT M,PRICE E,SREBRO N.Equality of opportunity in supervised learning[J].arXiv:1610.02413,2016. [26] ZAFAR M B,VALERA I,ROGRIGUEZ M G,et al.Fairnessconstraints:Mechanisms for fair classification[C]//Artificial Intelligence and Statistics.PMLR,2017:962-970. |
[1] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[2] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[3] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[4] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[5] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[6] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[7] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[8] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[9] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[10] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[11] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[12] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[13] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[14] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[15] | 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩. 基于Transformer和LSTM的药物相互作用预测 Drug-Drug Interaction Prediction Based on Transformer and LSTM 计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150 |
|