计算机科学 ›› 2023, Vol. 50 ›› Issue (2): 158-165.doi: 10.11896/jsjkx.211100279
徐夏, 张晖, 杨春明, 李波, 赵旭剑
XU Xia, ZHANG Hui, YANG Chunming, LI Bo, ZHAO Xujian
摘要: 最近,算法的公平性问题引起了机器学习领域学者的广泛讨论。鉴于谱聚类在现代数据科学中的广泛流行,研究谱聚类的算法公平性是一个至关重要的话题。现有的公平谱聚类算法主要存在两个缺点:1)公平性能差;2)仅在单个敏感属性下工作。文中将公平问题视为一种约束谱聚类问题,通过求解约束谱聚类的可行解集,提出了一种非规范化公平谱聚类方法(Unnormalized Fair Spectral Clustering,UFSC),用于提升公平性能。此外,文中还提出了一种适用于多个敏感属性约束的公平聚类算法(Multi-sensitive Attributes Fair Spectral Clustering,MFSC)。在多个真实数据集上进行了实验,结果表明,UFSC和MFSC算法比现有的公平谱聚类算法生成的聚类结果更加公平。
中图分类号:
[1]USHIODA A.Hierarchical clustering of words and application to NLP tasks[C]//Fourth Workshop on Very Large Corpora.1996. [2]WU Z,LEAHY R.An optimal graph theoretic approach to data clustering:Theory and its application to image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1993,15(11):1101-1113. [3]HUTTENHOWER C,FLAMHOLZ A I,LANDIS J N,et al.NearestNeighbor Networks:clustering expression data based on gene neighborhoods[J].BMC Bioinformatics,2007,8(1):1-13. [4]CHUANG K H,CHIU M J,LIN C C,et al.Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy C-means[J].IEEE Transactions on Medical Imaging,1999,18(12):1117-1128. [5]IYER N S,KANDEL A,SCHNEIDER M.Feature-based fuzzy classification for interpretation of mammograms[J].Fuzzy Sets and Systems,2000,114(2):271-280. [6]DWORK C,HARDT M,PITASSI T,et al.Fairness throughawareness[C]//Proceedings of the 3rd Innovations in Theoretical Computer Science Conference.New York:ACM,2012:214-226. [7]KLEINBERG J,LAKKARAJU H,LESKOVEC J,et al.Human decisions and machine predictions[J].The Quarterly Journal of Economics,2018,133(1):237-293. [8]ROMEI A,RUGGIERI S.A multidisciplinary survey on dis-crimination analysis[J].The Knowledge Engineering Review,2014,29(5):582-638. [9]FELDMAN M,FRIEDLER S A,MOELLER J,et al.Certifying and removing disparate impact[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.New York:ACM,2015:259-268. [10]HARDT M,PRICE E,SREBRO N.Equality of opportunity in supervised learning[C]//Advances in Neural Information Processing Systems.2016:3315-3323. [11]NARAYANAN A.Translation tutorial:21 fairness definitions and their politics[C]//Proceedings of the Conference on Fairness Accountability Transp.New York,USA,2018. [12]ZAFAR M B,VALERA I,GOMEZ RODRIGUEZ M,et al.Fairness beyond disparate treatment & disparate impact:Lear-ning classification without disparate mistreatment[C]//Procee-dings of the 26th International Conference on World Wide Web.Republic and Canton of Geneva,CHE:International World Wide Web Conferences Steering Committee,2017:1171-1180. [13]HASHIMOTO T,SRIVASTAVA M,NAMKOONG H,et al.Fairness without demographics in repeated loss minimization[C]//International Conference on Machine Learning.PMLR,2018:1929-1938. [14]CHIERICHETTI F,KUMAR R,LATTANZI S,et al.Fair clustering through fairlets[C]//Advances in Neural Information Processing Systems.2017:5029-5037. [15]RÖSNER C,SCHMIDT M.Privacy Preserving Clustering with Constraints[C]//45th International Colloquium on Automata,Languages,and Programming(ICALP 2018).Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,2018. [16]BERCEA I O,GROß M,KHULLER S,et al.On the Cost of Essentially Fair Clusterings[C]//Approximation,Randomization,and Combinatorial Optimization.Algorithms and Techniques(APPROX/RANDOM 2019).Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,2019. [17]BACKURS A,INDYK P,ONAK K,et al.Scalable fair clustering[C]//International Conference on Machine Learning.PMLR,2019:405-413. [18]KLEINDESSNER M,SAMADI S,AWASTHI P,et al.Guarantees for spectral clustering with fairness constraints[C]//International Conference on Machine Learning.PMLR,2019:3458-3467. [19]BERA S K,CHAKRABARTY D,FLORES N J,et al.Fair algorithms for clustering[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:4954-4965. [20]AHMADIAN S,EPASTO A,KUMAR R,et al.Clusteringwithout over-representation[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York:ACM,2019:267-275. [21]ABRAHAM S S,PADMANABHAN D,SUNDARAM S S.Fairness in Clustering with Multiple Sensitive Attributes[C]//EDBT/ICDT 2020 Joint Conference.2020:287-298. [22]SCHMIDT M,SCHWIEGELSHOHN C,SOHLER C.Fair coresets and streaming algorithms for fair k-means[C]//International Workshop on Approximation and Online Algorithms.Cham:Springer,2019:232-251. [23]AHMADIAN S,EPASTO A,KUMAR R,et al.Fair correlation clustering[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2020:4195-4205. [24]KLEINDESSNER M,AWASTHI P,MORGENSTERN J.Fairk-center clustering for data summarization[C]//International Conference on Machine Learning.PMLR,2019:3448-3457. [25]ZIKO I M,GRANGER E,YUAN J,et al.Clustering with fairness constraints:A flexible and scalable approach[J].arXiv:1906.08207,2019. [26]CHEN X,FAIN B,LYU L,et al.Proportionally fair clustering[C]//International Conference on Machine Learning.PMLR,2019:1032-1041. [27]JUNG C,KANNAN S,LUTZ N.Service in Your Neighbor-hood:Fairness in Center Location[C]//1st Symposium on Foundations of Responsible Computing(FORC 2020).Schloss Dagstuhl-Leibniz-Zentrum für Informatik,2020. [28]DAVIDSON I,RAVI S S.Making existing clusterings fairer:Algorithms,complexity results and insights[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:3733-3740. [29]GHADIRI M,SAMADI S,VEMPALA S.Socially fair k-means clustering[C]//Proceedings of the 2021 ACM Conference on Fairness,Accountability,and Transparency.2021:438-448. [30]ABBASI M,BHASKARA A,VENKATASUBRAMANIAN S.Fair clustering via equitable group representations[C]//Proceedings of the 2021 ACM Conference on Fairness,Accountabi-lity,and Transparency.2021:504-514. [31]MAKARYCHEV Y,VAKILIAN A.Approximation Algorithms for Socially Fair Clustering[J].arXiv:2103.02512,2021. [32]CHHABRA A,MOHAPATRA P.Fair algorithms for hierarchical agglomerative clustering[J].arXiv:2005.03197,2020. [33]AHMADIAN S,EPASTO A,KNITTEL M,et al.Fair Hierarchical Clustering[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems.2020:21050-21060. [34]QUY T L,ROY A,FRIEGE G,et al.Fair-Capacitated Clustering[J].arXiv:2104.12116,2021. [35]VON LUXBURG U.A tutorial on spectral clustering[J].Statistics and Computing,2007,17(4):395-416. [36]YU S X,SHI J.Segmentation given partial grouping constraints[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(2):173-183. [37]BERTSEKAS D P.Nonlinear programming[J].Journal of the Operational Research Society,1997,48(3):334-334. [38]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905. [39]WEEKS M R,CLAIR S,BORGATTI S P,et al.Social networks of drug users in high-risk sites:Finding the connections[J].AIDS and Behavior,2002,6(2):193-206. [40]ZHOU Z H,CHEN Z Q.Hybrid decision tree[J].Knowledge-based systems,2002,15(8):515-528. [41]PALECHOR F M,DE LA HOZ MANOTAS A.Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia,Peru and Mexico[J].Data in Brief,2019,25:104344. [42]MORO S,CORTEZ P,RITA P.A data-driven approach to predict the success of bank telemarketing[J].Decision Support Systems,2014,62:22-31. [43]MEEK C,THIESSON B,HECKERMAN D.The learning-curve sampling method applied to model-based clustering[J].Journal of Machine Learning Research,2002,2(2):397-418. |
[1] | 彭钺峰, 赵波, 刘会, 安杨. 针对机器学习的成员推断攻击综述 Survey on Membership Inference Attacks Against Machine Learning 计算机科学, 2023, 50(3): 351-359. https://doi.org/10.11896/jsjkx.220100016 |
[2] | 王艺潭, 王一舒, 袁野. 学习索引研究综述 Survey of Learned Index 计算机科学, 2023, 50(1): 1-8. https://doi.org/10.11896/jsjkx.211000149 |
[3] | 陈得鹏, 刘肖, 崔杰, 何道敬. 面向机器学习的成员推理攻击综述 Survey of Membership Inference Attacks for Machine Learning 计算机科学, 2023, 50(1): 302-317. https://doi.org/10.11896/jsjkx.220800227 |
[4] | 冷典典, 杜鹏, 陈建廷, 向阳. 面向自动化集装箱码头的AGV行驶时间估计 Automated Container Terminal Oriented Travel Time Estimation of AGV 计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028 |
[5] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[6] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[7] | 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094 |
[8] | 张光华, 高天娇, 陈振国, 于乃文. 基于N-Gram静态分析技术的恶意软件分类研究 Study on Malware Classification Based on N-Gram Static Analysis Technology 计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203 |
[9] | 陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079 |
[10] | 李亚茹, 张宇来, 王佳晨. 面向超参数估计的贝叶斯优化方法综述 Survey on Bayesian Optimization Methods for Hyper-parameter Tuning 计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208 |
[11] | 赵璐, 袁立明, 郝琨. 多示例学习算法综述 Review of Multi-instance Learning Algorithms 计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047 |
[12] | 肖治鸿, 韩晔彤, 邹永攀. 基于多源数据和逻辑推理的行为识别技术研究 Study on Activity Recognition Based on Multi-source Data and Logical Reasoning 计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270 |
[13] | 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮. 一种基于异质模型融合的 Android 终端恶意软件检测方法 Android Malware Detection Method Based on Heterogeneous Model Fusion 计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103 |
[14] | 王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030 |
[15] | 许杰, 祝玉坤, 邢春晓. 机器学习在金融资产定价中的应用研究综述 Application of Machine Learning in Financial Asset Pricing:A Review 计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127 |
|