Computer Science ›› 2019, Vol. 46 ›› Issue (11A): 98-102.

• Intelligent Computing • Previous Articles     Next Articles

Improved CoreSets Construction Algorithm for Bayesian Logistic Regression

ZHANG Shi-xiang, LI Wang-geng, LI Tong, ZHU Nan-nan   

  1. (School of Computer and Information,Anhui Normal University,Wuhu,Anhui 241000,China)
  • Online:2019-11-10 Published:2019-11-20

Abstract: With the rapid development of the Internet,new types of information dissemination methods are emerging.It leads to an explosion of data at an unprecedented rate.How to process and analyze huge raw data and turn it into usable knowledge for learning and utilization,has become an important topic of common concern for scientists and technical experts at home and abroad.The Bayesian approach provides rich hierarchical models,uncertainty quantification and prior specification,so in large-scale data background it is very attractive.The limited-iteration bisecting K-means algorithm preserves the clustering quality of the approximate standard bisecting K-means algorithm with higher computational efficiency,and it is more suitable for large data sets requiring faster processing speeds.Aiming at the low execution efficiency problem of the original coresets construction algorithm,the limited-iteration bisecting K-means algorithm is improved,making the clustering result obtained at a faster speed and the weight of the relevant data points calculated under the condition of ensuring the clustering effect,thus constructing the coresets.Experiments show that compared with the original algorithm,the improved algorithm has higher computational efficiency,similar approximation performance and better approximation effect in some cases.

Key words: Bayesian logistic regression, Coresets, Large-scale dataset, Limited-iteration bisecting K-means

CLC Number: 

  • TP391
[1]BRODERICK T,BOYD N,WIBISONO A,et al.Streaming Va-riational Bayes[C]∥Advances In Neural Information Proces-sing Systems.Necada,USA:MIT Press,2013:1727-1735.
[2]CAMPBELL T,STRAUB J,III J W F,et al.Streaming,Distri-buted Variational Inference for Bayesian Nonparametrics[C]∥Advances in Neural Information Processing Systems.Montreal,Canada:MIT Press,2015:280-288.
[3]AHN S,KORATTIKARA A,WELLING M.Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.[C]∥International Conference on Machine Learning.Edinburgh,Scotland:ACM,2012:1591-1598.
[4]BARDENET R,DOUCET A,HOLMES C.Towards scaling up Markov chain Monte Carlo:An adaptive subsampling approach[C]∥International Conference on Machine Learning.Beijing:ACM,2014:405-413.
[5]ENTEZARI R,CRAIU R V,ROSENTHAL J S.Likelihood inflating sampling algorithm[J].Canadian Journal of Statistics,2017,46:147-175.
[6]RABINOVICH M,ANGELINO E,JORDAN M I.Variationalconsensus Monte Carlo[C]∥Advances in Neural Information Processing Systems.Montreal,Canada:MIT Press,2015:1207-1215[7]HOFFMAN M D,BLEI D M,WANG C,et al.Stochastic variational inference[J].Journal of Machine Learning Research,2013,14(1):1303-1347.
[8]ALQUIER P,FRIEL N,EVERITT R,et al.Noisy MonteCarlo:convergence of Markov chains with approximate transition kernels[J].Statistics and Computing,2016,26(1/2):29-47.
[9]BARDENET R,DOUCET A,HOLMES C.On Markov chainMonte Carlo methods for tall data[J].Journal of Machine Learning Research,2016,18:1-43.
[10]SCOTT S L,BLOCKER A W,BONASSI F V,et al.Bayes and big data:the consensus Monte Carlo algorithm[J].International Journal of Management Science and Engineering Management,2016,11(2):78-88.
[11]SRIVASTAVA S,CEVHER V,DINH Q,et al.WASP:Scalable Bayes via barycenters of subset posteriors[C]∥Proceedings of the International Conference on Artificial Intelligence and Statistics.San Diego,California,USA:JMLR,2015:912-920.
[12]HUGGINS J H,CAMPBELL T,BRODERICK T.Coresets for Scalable Bayesian Logistic Regression[C]∥Advances in Neural Information Processing Systems.Barcelona,Spain:MIT Press,2016:4080-4088.
[13]STEINBACH M,KARYPIS G,KUMAR V,et al.A comparison of document clustering techniques[C]∥KDD Workshop on Text Mining.Boston,USA:2000:525-526[14]SAVARESI S M,BOLEY D L.On the Performance of Bisecting K-Means and PDDP[J].Intelligent Data Analysis,2004,8(4):345-362.
[15]LIU G C,HUANG T T,CHEN H N.Improved Bisecting K-means Clustering Algorithm[J].Computer Application and Software,2015,32(2):261-263.
[16]ZHUANG Y,MAO Y,CHEN X.A Limited-Iteration Bisecting K-Means for Fast Clustering Large Datasets[C]∥IEEE Trust Com-Big Data SE-ISPA.Tianjin,China:IEEE,2017:2257-2262.
[17]HAARIO H.An Adaptive Metropolis Algorithm[J].Bernoulli,2001,7(2):223-242.
[18]ROBERTS G O,TWEEDIE R L.Exponential Convergence ofLangevin Distributions and Their Discrete Approximations[J].Bernoulli,1996,2(4):341-363.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] WANG Ming, WU Wen-fang, WANG Da-ling, FENG Shi, ZHANG Yi-fei. Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity [J]. Computer Science, 2022, 49(9): 33-40.
[3] ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[4] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[5] SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[6] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[7] ZHENG Wen-ping, LIU Mei-lin, YANG Gui. Community Detection Algorithm Based on Node Stability and Neighbor Similarity [J]. Computer Science, 2022, 49(9): 83-91.
[8] LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi. Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network [J]. Computer Science, 2022, 49(9): 92-100.
[9] XU Tian-hui, GUO Qiang, ZHANG Cai-ming. Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance [J]. Computer Science, 2022, 49(9): 101-110.
[10] NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[11] CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang. Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model [J]. Computer Science, 2022, 49(9): 123-131.
[12] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[13] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[14] QU Qian-wen, CHE Xiao-ping, QU Chen-xin, LI Jin-ru. Study on Information Perception Based User Presence in Virtual Reality [J]. Computer Science, 2022, 49(9): 146-154.
[15] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
Full text



No Suggested Reading articles found!