公平谱聚类方法用于提高簇的公平性

doi:10.11896/jsjkx.211100279

Abstract

Abstract: Recently,the fairness of the algorithm has aroused extensive discussion in the machine learning community.Given the widespread popularity of spectral clustering in modern data science,studying the algorithm fairness of spectral clustering is a crucial topic.Existing fair spectral clustering algorithms have two shortcomings:1) poor fairness performance;2) work only for single sensitive attribute.In this paper,the fair spectral clustering problem is regarded as a constrained spectral clustering problem.By solving the feasible solution set of constrained spectral clustering,an unnormalized fair spectral clustering(UFSC) method is proposed to improve fairness performance.In addition,the paper also proposes a fair clustering algorithm suitable for multiple sensitive attribute constraints.Experimental results on multiple real-world datasets demonstrate that the UFSC and MFSC are fairer than the existing fair spectral clustering algorithms.

Key words: Algorithm fairness, Fair spectral clustering, Constrained spectral clustering, Machine learning, Data analysis

CLC Number:

TP301

XU Xia, ZHANG Hui, YANG Chunming, LI Bo, ZHAO Xujian. Fair Method for Spectral Clustering to Improve Intra-cluster Fairness[J].Computer Science, 2023, 50(2): 158-165.

References

[1]USHIODA A.Hierarchical clustering of words and application to NLP tasks[C]//Fourth Workshop on Very Large Corpora.1996.
[2]WU Z,LEAHY R.An optimal graph theoretic approach to data clustering:Theory and its application to image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1993,15(11):1101-1113.
[3]HUTTENHOWER C,FLAMHOLZ A I,LANDIS J N,et al.NearestNeighbor Networks:clustering expression data based on gene neighborhoods[J].BMC Bioinformatics,2007,8(1):1-13.
[4]CHUANG K H,CHIU M J,LIN C C,et al.Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy C-means[J].IEEE Transactions on Medical Imaging,1999,18(12):1117-1128.
[5]IYER N S,KANDEL A,SCHNEIDER M.Feature-based fuzzy classification for interpretation of mammograms[J].Fuzzy Sets and Systems,2000,114(2):271-280.
[6]DWORK C,HARDT M,PITASSI T,et al.Fairness throughawareness[C]//Proceedings of the 3rd Innovations in Theoretical Computer Science Conference.New York:ACM,2012:214-226.
[7]KLEINBERG J,LAKKARAJU H,LESKOVEC J,et al.Human decisions and machine predictions[J].The Quarterly Journal of Economics,2018,133(1):237-293.
[8]ROMEI A,RUGGIERI S.A multidisciplinary survey on dis-crimination analysis[J].The Knowledge Engineering Review,2014,29(5):582-638.
[9]FELDMAN M,FRIEDLER S A,MOELLER J,et al.Certifying and removing disparate impact[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.New York:ACM,2015:259-268.
[10]HARDT M,PRICE E,SREBRO N.Equality of opportunity in supervised learning[C]//Advances in Neural Information Processing Systems.2016:3315-3323.
[11]NARAYANAN A.Translation tutorial:21 fairness definitions and their politics[C]//Proceedings of the Conference on Fairness Accountability Transp.New York,USA,2018.
[12]ZAFAR M B,VALERA I,GOMEZ RODRIGUEZ M,et al.Fairness beyond disparate treatment & disparate impact:Lear-ning classification without disparate mistreatment[C]//Procee-dings of the 26th International Conference on World Wide Web.Republic and Canton of Geneva,CHE:International World Wide Web Conferences Steering Committee,2017:1171-1180.
[13]HASHIMOTO T,SRIVASTAVA M,NAMKOONG H,et al.Fairness without demographics in repeated loss minimization[C]//International Conference on Machine Learning.PMLR,2018:1929-1938.
[14]CHIERICHETTI F,KUMAR R,LATTANZI S,et al.Fair clustering through fairlets[C]//Advances in Neural Information Processing Systems.2017:5029-5037.
[15]RÖSNER C,SCHMIDT M.Privacy Preserving Clustering with Constraints[C]//45th International Colloquium on Automata,Languages,and Programming(ICALP 2018).Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,2018.
[16]BERCEA I O,GROß M,KHULLER S,et al.On the Cost of Essentially Fair Clusterings[C]//Approximation,Randomization,and Combinatorial Optimization.Algorithms and Techniques(APPROX/RANDOM 2019).Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,2019.
[17]BACKURS A,INDYK P,ONAK K,et al.Scalable fair clustering[C]//International Conference on Machine Learning.PMLR,2019:405-413.
[18]KLEINDESSNER M,SAMADI S,AWASTHI P,et al.Guarantees for spectral clustering with fairness constraints[C]//International Conference on Machine Learning.PMLR,2019:3458-3467.
[19]BERA S K,CHAKRABARTY D,FLORES N J,et al.Fair algorithms for clustering[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:4954-4965.
[20]AHMADIAN S,EPASTO A,KUMAR R,et al.Clusteringwithout over-representation[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York:ACM,2019:267-275.
[21]ABRAHAM S S,PADMANABHAN D,SUNDARAM S S.Fairness in Clustering with Multiple Sensitive Attributes[C]//EDBT/ICDT 2020 Joint Conference.2020:287-298.
[22]SCHMIDT M,SCHWIEGELSHOHN C,SOHLER C.Fair coresets and streaming algorithms for fair k-means[C]//International Workshop on Approximation and Online Algorithms.Cham:Springer,2019:232-251.
[23]AHMADIAN S,EPASTO A,KUMAR R,et al.Fair correlation clustering[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2020:4195-4205.
[24]KLEINDESSNER M,AWASTHI P,MORGENSTERN J.Fairk-center clustering for data summarization[C]//International Conference on Machine Learning.PMLR,2019:3448-3457.
[25]ZIKO I M,GRANGER E,YUAN J,et al.Clustering with fairness constraints:A flexible and scalable approach[J].arXiv:1906.08207,2019.
[26]CHEN X,FAIN B,LYU L,et al.Proportionally fair clustering[C]//International Conference on Machine Learning.PMLR,2019:1032-1041.
[27]JUNG C,KANNAN S,LUTZ N.Service in Your Neighbor-hood:Fairness in Center Location[C]//1st Symposium on Foundations of Responsible Computing(FORC 2020).Schloss Dagstuhl-Leibniz-Zentrum für Informatik,2020.
[28]DAVIDSON I,RAVI S S.Making existing clusterings fairer:Algorithms,complexity results and insights[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:3733-3740.
[29]GHADIRI M,SAMADI S,VEMPALA S.Socially fair k-means clustering[C]//Proceedings of the 2021 ACM Conference on Fairness,Accountability,and Transparency.2021:438-448.
[30]ABBASI M,BHASKARA A,VENKATASUBRAMANIAN S.Fair clustering via equitable group representations[C]//Proceedings of the 2021 ACM Conference on Fairness,Accountabi-lity,and Transparency.2021:504-514.
[31]MAKARYCHEV Y,VAKILIAN A.Approximation Algorithms for Socially Fair Clustering[J].arXiv:2103.02512,2021.
[32]CHHABRA A,MOHAPATRA P.Fair algorithms for hierarchical agglomerative clustering[J].arXiv:2005.03197,2020.
[33]AHMADIAN S,EPASTO A,KNITTEL M,et al.Fair Hierarchical Clustering[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems.2020:21050-21060.
[34]QUY T L,ROY A,FRIEGE G,et al.Fair-Capacitated Clustering[J].arXiv:2104.12116,2021.
[35]VON LUXBURG U.A tutorial on spectral clustering[J].Statistics and Computing,2007,17(4):395-416.
[36]YU S X,SHI J.Segmentation given partial grouping constraints[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(2):173-183.
[37]BERTSEKAS D P.Nonlinear programming[J].Journal of the Operational Research Society,1997,48(3):334-334.
[38]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905.
[39]WEEKS M R,CLAIR S,BORGATTI S P,et al.Social networks of drug users in high-risk sites:Finding the connections[J].AIDS and Behavior,2002,6(2):193-206.
[40]ZHOU Z H,CHEN Z Q.Hybrid decision tree[J].Knowledge-based systems,2002,15(8):515-528.
[41]PALECHOR F M,DE LA HOZ MANOTAS A.Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia,Peru and Mexico[J].Data in Brief,2019,25:104344.
[42]MORO S,CORTEZ P,RITA P.A data-driven approach to predict the success of bank telemarketing[J].Decision Support Systems,2014,62:22-31.
[43]MEEK C,THIESSON B,HECKERMAN D.The learning-curve sampling method applied to model-based clustering[J].Journal of Machine Learning Research,2002,2(2):397-418.

Related Articles 15

[1]	PENG Yuefeng, ZHAO Bo, LIU Hui, AN Yang. Survey on Membership Inference Attacks Against Machine Learning [J]. Computer Science, 2023, 50(3): 351-359.
[2]	WANG Yitan, WANG Yishu, YUAN Ye. Survey of Learned Index [J]. Computer Science, 2023, 50(1): 1-8.
[3]	CHEN Depeng, LIU Xiao, CUI Jie, HE Daojing. Survey of Membership Inference Attacks for Machine Learning [J]. Computer Science, 2023, 50(1): 302-317.
[4]	LENG Dian-dian, DU Peng, CHEN Jian-ting, XIANG Yang. Automated Container Terminal Oriented Travel Time Estimation of AGV [J]. Computer Science, 2022, 49(9): 208-214.
[5]	NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[6]	HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[7]	LI Yao, LI Tao, LI Qi-fan, LIANG Jia-rui, Ibegbu Nnamdi JULIAN, CHEN Jun-jie, GUO Hao. Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network [J]. Computer Science, 2022, 49(8): 257-266.
[8]	ZHANG Guang-hua, GAO Tian-jiao, CHEN Zhen-guo, YU Nai-wen. Study on Malware Classification Based on N-Gram Static Analysis Technology [J]. Computer Science, 2022, 49(8): 336-343.
[9]	CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[10]	XIAO Zhi-hong, HAN Ye-tong, ZOU Yong-pan. Study on Activity Recognition Based on Multi-source Data and Logical Reasoning [J]. Computer Science, 2022, 49(6A): 397-406.
[11]	YAO Ye, ZHU Yi-an, QIAN Liang, JIA Yao, ZHANG Li-xiang, LIU Rui-liang. Android Malware Detection Method Based on Heterogeneous Model Fusion [J]. Computer Science, 2022, 49(6A): 508-515.
[12]	LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
[13]	ZHAO Lu, YUAN Li-ming, HAO Kun. Review of Multi-instance Learning Algorithms [J]. Computer Science, 2022, 49(6A): 93-99.
[14]	WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[15]	XU Jie, ZHU Yu-kun, XING Chun-xiao. Application of Machine Learning in Financial Asset Pricing:A Review [J]. Computer Science, 2022, 49(6): 276-286.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Fair Method for Spectral Clustering to Improve Intra-cluster Fairness

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0