基于拓扑信息加速马尔科夫毯学习

计算机科学 ›› 2015, Vol. 42 ›› Issue (Z11): 42-48.

基于拓扑信息加速马尔科夫毯学习

傅顺开,苏致祯,Sein Minn,吕天依

华侨大学计算机科学与技术学院厦门361021,华侨大学计算机科学与技术学院厦门361021,华侨大学计算机科学与技术学院厦门361021,华侨大学计算机科学与技术学院厦门361021

出版日期:2018-11-14 发布日期:2018-11-14
基金资助:
本文受国家自然科学基金资助

Accelerating the Recovery of Markov Blanket Using Topology Information

FU Shun-kai, SU Zhi-zhen, Sein Minn and LV Tian-yi

Online:2018-11-14 Published:2018-11-14

摘要/Abstract

摘要： 目标变量的马尔科夫毯（MB）是用于预测其状态的最优特征子集。提出一种新的约束学习类MB推导算法FSMB，它遵循后向选择的搜索策略，并依赖条件独立（CI）测试删除任意结点对之间的伪连接。与传统约束学习类算法不同，FSMB能从已执行的CI测试推导出不同结点扮演d分割（dseparation）结点的优先等级；而后基于该信息在未来优先执行条件集中包含高优先级结点的CI测试，从而更快速地判断并删除伪连接边。该策略可帮助快速缩小搜索空间，从而大大提升学习效率。基于仿真网络的实验研究显示，FSMB在计算效率上较经典的PCMB和IPCMB有显著的提升，而学习效果相当；在面对较大网络结构时（比如100和200个结点），甚至比公认最快速的IAMB还节省近40%的计算量，但学习效果要远优于IAMB。基于16个UCI数据集和4个经典的分类模型的实验显示，基于FSMB输出的特征集合所训练模型的分类准确率普遍接近或高于基于原有特征全集训练所得模型。因此，FSMB是快速且有效的MB推导算法。

关键词: 马尔科夫毯,贝叶斯网络,局部搜索,结构学习,约束学习,条件独立测试

Abstract: Markov blanket(MB) has been known as the optimal feature subset for prediction,and there exist fertile works to induce MB by local search since 1996.A novel one called FSMB was proposed which heavily relies on conditional independence(CI) test to determine the existence of connection between nodes,so it is kind of constraint-based learning as well.However,it differs from previous works by treating candidate CI tests unfairly.FSMB extracts critical d-separation topology information from conducted CI tests,and applies them to sort and perform those more likely to uncover independent relations with priority.Search space therefore is expected to shrink quickly in a more efficient manner.Experimental studies indicate that FSMB achieves tremendous improvement over state-of-art works PCMB and IPC-MB in term of time efficiency,but with no sacrifice on learning quality.When given large networks(e.g.100 and 200 nodes),FSMB runs even more efficiently than IAMB which is recognized as the fastest algorithm by now,requiring up to 40% fewer CI tests,and produces much higher quality of results.Experiments with UCI data sets and four classical classification models indicate that the classification accuracy of the models trained on the output of FSMB are close to or exceed performance achieved by models trained on all features,hence FSMB is an effective feature subset selector.

Key words: Markov blanket,Bayesian network,Local search,Structure learning,Constraint-based learning,Conditional independence test

傅顺开,苏致祯,Sein Minn,吕天依. 基于拓扑信息加速马尔科夫毯学习[J]. 计算机科学, 2015, 42(Z11): 42-48. https://doi.org/

FU Shun-kai, SU Zhi-zhen, Sein Minn and LV Tian-yi. Accelerating the Recovery of Markov Blanket Using Topology Information[J]. Computer Science, 2015, 42(Z11): 42-48. https://doi.org/

参考文献

[1] Pearl J.Probabilistic reasoning in expert systems[M].San Ma-tego:Morgan Kaufmann,1988
[2] Koller D,Sahami M.Toward optimal feature selection[C]∥the 13th International Conference on Machine Learning(ICML).Bari,Italy:Morgan Kaufmann,1996
[3] Chickering D M,Geiger D,Heckerman D.Learning BayesianNetwork is NP-Hard[R].Microsoft Research,1994
[4] Campos C P D,Zeng Z,Ji Q.Efficient structure learning ofBayesian networks using constraints[J].Journal of Machine Learning Research(JMLR),2011,12(11):663-689
[5] Tsamardinos I,Aliferis C,Statnikov A,et al.Algorithms forlarge scale Markov blanket discovery[C]∥16th International FLAIRS Conference,2003.AAAI,2003
[6] Zhang Y,Zhang Z,Liu K,et al.An improved IAMB algorithm for Markov blanket discovery[J].Journal of Computers,2010,5(11):1755-1761
[7] Zhang Y,Xu H,Huang Y,et al.S-IAMB algorithm for Markov blanket discovery[C]∥Asia-Pacific Conference on Information Processing(APCIP’09).Washington:IEEE Computer Society
[8] Tsamardinos I,Aliferis C F,Statnikov A.Time and sample efficient discovery of Markov blankets and direct causal relations[C]∥9th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining(KDD).ACM,2003
[9] Aliferis C,Tsamardinos I,Statnikov A.HITON,A Novel Mar-kov Blanket Algorithm for Optimal Variable Selection[C]∥Annual Symposium on American Medical Informatics Association(AMIA).2003
[10] Pena J M,Nilsson R,Bjorkegren J,et al.Towards scalable and data efficient learning of Markov boundaries[J].International Journal of Approximate Reasoning,2007,45(2):211-232
[11] Fu S,Desmarais M C.Fast Markov blanket discovery algorithm via local learning within single pass[C]∥21st Conference of the Canadian Society for Computational Studies of Intelligence(Canadian AI).Springer,2008
[12] Zeng Y X,Xiang H Y,Mao H.Dynamic ordering-based search algorithm for Markov blanket discovery[C]∥15th Pacific-Asia Conference on Data Mining,2011.Shenzhen,China:Springer,2011
[13] Acid S,De Campos L M,Castellano J G.Learning Bayesian network classifiers:Searching in a space of partially directed acyclic graphs[J].Machine Learning,2005,59(3):213-235
[14] Fu Shun-kai, Minn M C D S, Lv Tian-yi.A Survey of Advances in Feature Selection by Markov Blanket[C]∥ICNC-FSKD,2014.Xiamen,China,2014
[15] Koller D,Friedman N.Probabilistic graphical models:Principles and Techniques[M].MIT Press,2009
[16] Bromberg F,Margaritis D,Honavar V.Efficient Markov net-work structure discovery using independence tests[J].Journal of Artificial Intelligence Research,2009,35(1):449-484
[17] Fu S,Desmarais M C.Tradeoff analysis of different Markovblanket local learning approaches[C]∥12th Pacific-Asia Confe-rence on Advances in Knowledge Discovery and Data Mining(PAKDD).Osaka,Japan:Springer,2008
[18] Duda R O,Hart P E.Pattern Classification and Scene Analysis[M].John Wiley & Sons,1973-2-9
[19] Zhang H,Jiang L,Su J.Hidden Naive Bayes[C]∥AAAI.2005

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed