基于约束优化生成式对抗网络的数据去偏方法

doi:10.11896/jsjkx.210400234

计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 184-190.doi: 10.11896/jsjkx.210400234

基于约束优化生成式对抗网络的数据去偏方法

徐国宁¹, 陈奕芃¹, 陈一鸣¹, 陈晋音^1,2, 温浩³

1 浙江工业大学信息工程学院杭州 310023
2 浙江工业大学网络空间安全研究院杭州 310023
3 重庆中科云从科技有限公司重庆 400000

出版日期:2022-06-10 发布日期:2022-06-08
通讯作者: 陈晋音(chenjinyin@zjut.edu.cn)
作者简介:(xubo3006@163.com)
基金资助:
国家自然科学基金(62072406);浙江省自然科学基金(LY19F020025);宁波市“科技创新2025”重大专项(2018B10063);教育部产学合作协同育人项目

Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks

XU Guo-ning¹, CHEN Yi-peng¹, CHEN Yi-ming¹, CHEN Jin-yin^1,2, WEN Hao³

1 College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China
2 Institute of Cyberspace Security,Zhejiang University of Technology,Hangzhou 310023,China
3 Chongqing Zhongke Yuncong Technology Limited Company,Chongqing 400000,China

Online:2022-06-10 Published:2022-06-08
About author:XU Guo-ning,born in 1999.His main research interests include deep learning and artificial intelligence.
CHEN Jin-yin,born in 1982,Ph.D,professor.Her main research interests include artificial intelligence security,graph data mining and evolutionary computing.
Supported by:
National Natural Science Foundation of China(62072406),Natural Science Foundation of Zhejiang Province,China(LY19F020025),Major Special Funding for “Science and Technology Innovation 2025” in Ningbo(2018B10063) and Ministry of Education Cooperative Education Project.

摘要/Abstract

摘要： 深度学习技术在图像识别、自然语言处理、金融预测等领域具有广泛应用,其分析结果一旦存在偏见将给个人和群体带来负面影响,因此在保障深度学习模型的性能不受影响的前提下提高模型的公平性至关重要。针对数据的偏见信息不只是敏感属性,属性之间的关联性使非敏感属性也会带有偏见信息,因此只考虑敏感属性的去偏算法依然存在偏见问题。为了消除数据中关联属性的敏感信息对深度学习的分类结果带来偏见,提出一种基于生成式对抗网络的数据去偏方法,模型的损失函数结合公平性约束及准确性损失两种约束优化,利用对抗式编码消除偏见信息,生成去偏数据集;并通过生成器与判别器的交替博弈训练,减少数据集无偏信息的损失,在保证主任务分类准确率的同时消除数据中的偏见,从而提高后续分类任务的公平性。最终,在多个真实数据集上展开数据去偏实验,验证了该去偏算法的有效性。

关键词: 对抗训练, 深度学习, 生成式对抗网络, 数据去偏

Abstract: With the wide application of deep learning technology in image recognition,natural language processing and financial predicting,once there is bias in analysis results,it will cause negative impacts both on individuals and groups,thus any effects on its performance it is vital to enhance the fairness of the model without affecting the perfomance of deep learning model.Biased information about data is not only sensitive attributes,and non-sensitive attributes will also contain bias due to the correlation among attributes,therefore,the bias cannot be eliminated when debiasing algorithms only consider sensitive attributes.In order to eliminate the bias in the classification results of the deep learning model caused by the correlated sensitive attributions in the data,this paper proposes a data debiasing method based on the generative adversarial network.The loss function of the model combines the fairness constraints and the accuracy loss,and the model utilizes adversarial code to eliminate bias to generate debiased dataset,then with the alternating gaming training of the generator and the discriminator to reduce the loss of the no-bias information in the dataset,and the classification accuracy is ensured while the bias in the data is eliminated to improve the fairness of the subsequent classification tasks.Finally,data debiasing experiments are carried out on several real-world dataset to verify the effectiveness of the proposed algorithm.The results show that the proposed method can effectively decrease the bias information in datasets and generate datasets with less bias.

Key words: Adversarial training, Data debiasing, Deep learning, Generative adversarial networks

中图分类号:

TP391

徐国宁, 陈奕芃, 陈一鸣, 陈晋音, 温浩. 基于约束优化生成式对抗网络的数据去偏方法[J]. 计算机科学, 2022, 49(6A): 184-190. https://doi.org/10.11896/jsjkx.210400234

XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks[J]. Computer Science, 2022, 49(6A): 184-190. https://doi.org/10.11896/jsjkx.210400234

参考文献

[1] BRENNAN T,DIETERICH W,EHRET B.Evaluating the Predictive Validity of the Compas Risk and Needs Assessment System[J].Criminal Justice & Behavior,2008,36(1):21-40.
[2] LIU L T,DEAN S,ROLF E,et al.Delayed impact of fair machine learning[C]//International Conference on Machine Lear-ning.PMLR,2018:3150-3158.
[3] CHAR D S,SHAH N H,MAGNUS D.Implementing Machine Learning in Health Care-Addressing Ethical Challenges[J].New England Journal of Medicine,2018,378(11):981-983.
[4] WADSWORTH C,VERA F,PIECH C.Achieving fairnessthrough adversarial learning:an application to recidivism prediction[J].arXiv:1807.00199,2018.
[5] LICHMAN M.UCI machine learning repository[EB/OL].URL-http://archive.ics.uci.edu/ml.
[6] WANG T,ZHAO J,YATSKAR M,et al.Balanced datasets are not enough:Estimating and mitigating gender bias in deep image representations[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:5310-5319.
[7] KAMIRAN F,CALDERS T.Data preprocessing techniques for classification without discrimination[J].Knowledge and Information Systems,2012,33(1):1-33.
[8] DWORK C,HARDT M,PITASSI T,et al.Fairness throughawareness[C]//Proceedings of the 3rd Innovations in Theoretical Computer Science Conference.2012:214-226.
[9] FELDMAN M,FRIEDLER S A,MOELLER J,et al.Certifying and removing disparate impact[C]//proceedings of the 21th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2015:259-268.
[10] KAMISHIMA T,AKAHO S,ASOH H,et al.Fairness-awareclassifier with prejudice remover regularizer[C]//Joint Euro-pean Conference on Machine Learning and Knowledge Discovery in Databases.Berlin:Springer,2012:35-50.
[11] CALMON F P,WEI D,VINZAMURI B,et al.Optimized pre-processing for discrimination prevention[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:3995-4004.
[12] LUONG B T,RUGGIERI S,TURINI F.k-NN as an implementation of situation testing for discrimination discovery and prevention[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011:502-510.
[13] ZHANG L,WU Y,WU X.Achieving non-discrimination in data release[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:1335-1344.
[14] KAMIRAN F,CALDERS T.Classifying without discriminating[C]//2009 2nd International Conference on Computer,Control and Communication.IEEE,2009:1-6.
[15] ŽLIOBAITE I,KAMIRAN F,CALDERS T.Handling condi-tional discrimination[C]//2011 IEEE 11th International Confe-rence on Data Mining.IEEE,2011:992-1001.
[16] ZEMEL R,WU Y,SWERSKY K,et al.Learning fair representations[C]//International Conference on Machine Learning.PMLR,2013:325-333.
[17] AGARWAL A,BEYGELZIMER A,DUDIK M,et al.A reductions approach to fair classification[C]//International Confe-rence on Machine Learning.PMLR,2018:60-69.
[18] ZHANG B H,LEMOINE B,MITCHELL M.Mitigating un-wanted biases with adversarial learning[C]//Proceedings of the 2018 AAAI/ACM Conference on AI,Ethics,and Society.2018:335-340.
[19] CELIS L E,HUANG L,KESWANI V,et al.Classification with fairness constraints:A meta-algorithm with provable guarantees[C]//Proceedings of the Conference on Fairness,Accountability,and Transparency.2019:319-328.
[20] ZHANG P,WANG J,SUN J,et al.White-box fairness testing through adversarial sampling[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.2020:949-960.
[21] GALHOTRA S,BRUN Y,MELIOU A.Fairness testing:testing software for discrimination[C]//Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.2017:498-510.
[22] UDESHI S,ARORA P,CHATTOPADHYAY S.Automated directed fairness testing[C]//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.New York:ACM,2018:98-108.
[23] Adult data[EB/OL].[2020-07-26].http://tinyurl.com/UCI-Adult,1996.
[24] ANGWIN J,LARSON J,MATTU S,et al.Machine bias.riskassessments in criminal sentencing[EB/OL].(2016-05-23)[2020-07-26].https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing,2016.
[25] HARDT M,PRICE E,SREBRO N.Equality of opportunity in supervised learning[J].arXiv:1610.02413,2016.
[26] ZAFAR M B,VALERA I,ROGRIGUEZ M G,et al.Fairnessconstraints:Mechanisms for fair classification[C]//Artificial Intelligence and Statistics.PMLR,2017:962-970.

相关文章 15

[1]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[5]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[6]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[7]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[8]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[9]	周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[10]	苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[11]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[12]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[13]	刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[14]	孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15]	康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩. 基于Transformer和LSTM的药物相互作用预测 Drug-Drug Interaction Prediction Based on Transformer and LSTM 计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于约束优化生成式对抗网络的数据去偏方法

Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0