计算机科学 ›› 2013, Vol. 40 ›› Issue (6): 256-259.

• 人工智能 • 上一篇    下一篇

贝叶斯推断在MCDB分布式平台上的实现

周志敏,高申勇   

  1. 浙江水利水电学院计算机与信息工程系 杭州310018;浙江大学电气工程学院 杭州310058
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金(61272539)资助

Implementation of Bayesian Inference on MCDB Distributed System

ZHOU Zhi-min and GAO Shen-yong   

  • Online:2018-11-16 Published:2018-11-16

摘要: 提出了应用贝叶斯统计方法在分布式数据库MCDB上处理超大规模数据的实现方法,并以贝叶斯线性回归、话题模型的LDA和狄利克雷过程的聚类算法为例进行了论证。用户可以通过SQL语言定义变量之间的关系进行模拟。探索了一种使用简洁的SQL设计大规模统计学习系统的方法,其利用MCDB能够自动解决并行化和资源优化问题,以获得高性能的并行处理能力。

关键词: 贝叶斯推断,并行算法,SQL,分布式系统

Abstract: This paper described how the Monte Carlo database system (MCDB) can be used to easily implement Baye-sian inference via Markov chain Monte Carlo (MCMC) over very large datasets.Linear Bayesian regression,LDA and Dirichlet clustering were used as examples to demonstrate this task.To implement an MCMC simulation in MCDB,a programmer specifies dependencies among variables and how they parameterize one another using the SQL language.This paper devised a simple scheme for developing large scale machine learning systems with SQL,which with the help of MCDB,can automaticly deal with parallelization and optimization problems,to achieve high efficiency in computation.

Key words: Bayesian inference,Parallel algorithms,SQL,Distributed system

[1] Drost I,Dunning T,Eastman J,et al.Introduction to ApacheMahout [Z].mahout.apache.org.2011
[2] Lunn D,Spiegelhalter D,Thomas A,et al.The BUGS project:Evolution,critique and future directions [J].Statist.Med.,2009,28(25):3049-3067
[3] Jampani R,Xu Fei,Wu Ming-xi,et al.The Monte Carlo Database System:Stochastic analysis close to the data [J].ACM Trans.Database Syst.,2011,36(3):18
[4] Singh S,Subramanya A,Pereira F,et al.Distributed MAP inference for undirected graphical models [C]∥Neural Information Processing Systems (NIPS),Workshop on Learning on Cores,Clusters and Clouds.2010
[5] Cai Z,Vagena Z,Jermaine C,et al.Very Large Scale Baye-sian Inference Using MCDB [C]∥Big Learn Workshop,Neural Information Processing Systems.2011
[6] Blei D M,Ng A Y,Jordan M I.Latent Dirichlet Allocation [J].Journal of Machine Learning Research,2003,3:993-1022
[7] Porteous I,Newman D,Ihler A T,et al.Fast collapsed Gibbssampling for Latent Dirichlet Allocation [C]∥ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2008:569-577
[8] Liu Zhi-yuan,Zhang Yu-zhou,Chang E Y,et al.Parallel Latent Dirichlet Allocation with Data Placement and Pipeline Proces-sing [J].ACM Transactions on Intelligent Systems and Techno-logy,special issue on Large Scale Machine Learning,2011,2(3):26
[9] Smola A J,Narayanamurthy S.An Architecture for ParallelTopic models [J].The Proceedings of the VLDB Endowment,2010,3(1):703-710
[10] Newman D,Asuncion A,Smyth P.et al,Distributed Inference for Latent Dirichlet Allocation [C]∥Neural Information Processing Systems.2007
[11] 张步良.基于分类概率加权的朴素贝叶斯分类方法[J].重庆理工大学学报:自然科学版,2012,26(7):81-83

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!