计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 210-214.doi: 10.11896/jsjkx.200500082

• 人工智能 • 上一篇    下一篇


徐慧慧, 晏华   

  1. 电子科技大学计算机科学与工程学院 成都611731
  • 收稿日期:2020-05-19 修回日期:2020-08-08 出版日期:2021-06-15 发布日期:2021-06-03
  • 通讯作者: 晏华(huayan@uestc.edu.cn)
  • 基金资助:

Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children

XU Hui-hui, YAN Hua   

  1. School of Computer Science & Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China
  • Received:2020-05-19 Revised:2020-08-08 Online:2021-06-15 Published:2021-06-03
  • About author:XU Hui-hui,born in 1995,postgra-duate.Her main research interests include data mining and so on.(494655043@qq.com)
    YAN Hua,born in 1970,Ph.D,associate professor.Her main research interests include computational intelligence and data mining.
  • Supported by:
    National Natural Science Foundation of China(61976046) and Key Research and Development Projects of Sichuan Province(2018SZ0065).

摘要: 对疾病相关风险项的分析是数据挖掘理论在医疗领域应用的一个重要内容,可以帮助医生分析疾病成因,从而有效地开展防治工作。医学领域的疾病数据有其自身的特征,例如其高度不平衡性的特点往往使得大量珍贵的信息蕴藏于支持度小的属性项中,直接采用经典的基于支持度的关联规则挖掘算法易造成重要信息的丢失。因此,文中结合医疗领域的知识,基于医学领域常用的统计标准——相对危险度,提出了一种挖掘疾病高风险项集的算法(Mining Algorithm for high Relative Risk Itemsets,MARRI),以及与之相匹配的两种规则剪枝方法,即作用叠加剪枝和样本数剪枝,并在儿童先心病数据集上对算法进行验证。实验结果表明,该算法具有挖掘低支持度项集信息的能力,挖掘出的疾病关联因素更有价值。

关键词: 关联规则, 疾病分析, 数据挖掘, 相对危险度

Abstract: The analysis of disease-related risk factors is an important part of application of data mining theory in the medical field,which is helpful for doctors to analyze causes of disease and carry out effective work of disease prevention and control.But disease data in the medical field have their own characteristics,such as high imbalance,which means that most valuable information is contained in the attribute items with a small support.It is easy to lose important information when applying the classical association rule algorithm based on the support directly.Therefore,based on the knowledge of medical field and the common statistical standard of medical field——Relative Risk,this paper proposes a mining algorithm for high relative risk itemsets(MARRI) and two corresponding pruning methods,which are interaction pruning and sample number pruning,and verifies the algorithm on the dataset of children’s congenital heart disease.Experimental results show that the algorithm is effective to mine the information in low support items and disease-related factors mined out are more valuable.

Key words: Association rules, Data mining, Disease analysis, Relative risk


  • TP181
[1]AGRAWAL R,IMIELIŃSKI T,SWAMI A.Mining association rules between sets of items in large database[J].ACM SIGMOD Record,1993,22(2):207-216.
[2]GAO L,WANG J,LI F G,et al.Symptoms-herbs relationship in lung diseases based on association rules[J].Journal of Traditional Chinese Medicine,2013,54(8):697-700.
[3]JIANG Y R,XIE Y H,ZHANG J C,et al.Data mining of the medication rule of Chen Keji in the treatment of blood stasis syndrome of cardiovascular disease[J].Journal of Traditional Chinese Medicine,2015,56(5):376-380.
[4]LI Q,CHEN D T,LUO X L.Implementation of the association rule algorithm in medical big data[J].Software Engineering,219,22(1):12-15.
[5]LEE W H,WANG E T,CHEN A L P.Mining accompanying relationships between diseases from patient records[C]//IEEE International Conference on Big Data.IEEE,2018:3861-3868.
[6]WANG M X.The prediction model for disease based on logistic regression and association rules[D].Jinan:Shandong Univer-sity,2016.
[7]GAO S Y,CHENG S Z.Application of clustering-based entropy weighted association analysis[J/OL].[2019-01].http://dpi-proceedings.com/index.php/dtcse/article/view/27565.
[8]OJHA D,PANDEY P.Optimizing Association Rule using Ge-netic Algorithm and Data Sampling Approach[J].International Journal of Computer Applications,2018,179(11):15-19.
[9]DING Y,ZHU C S,WU Y Y.Association Rule Mining Algo-rithm Based on Hadoop[J].Computer Science,2018,45(11A):409-411,416.
[10]IBRAHIM A,SHEHADA D.Study of Association Rule Mining for Discovery of Frequent Item Sets on Big Data Sets[J].International Journal of Materials Science,2018,13(4):345-358.
[11]LIU Z,HU L,WU C,et al.A novel process-based association rule approach through maximal frequent itemsets for big data processing[J].Future Generation Computer Systems,2018,81:414-424.
[12]WANG Q S,JIANG F S,LI F.Multi-label Learning Algorithm Based on Association Rules in Big Data Environment[J].Computer Science,2020,47(5):90-95.
[13]RATHEE S,KASHYAP A.Adaptive-Miner:an efficient dis-tributed association rule mining algorithm on Spark[J].Journal of Big Data,2018,5(1):6.
[14]VOUGAS K,KROCHMAL M,JACKSON T,et al.Deep lear-ning and association rule mining for predicting drug response in cancer[J/OL].https://www.biorxiv.org/content/10.1101/070490v3.full.
[15]KHAN A,USMAN M.Early diagnosis of Alzheimer’s disease using machine learning techniques:a review paper[C]//2015 7th International Joint Conference on Knowledge Discovery.IEEE,2016:380-387.
[16]ZHOU W,NIELSEN J B,FRITSCHE L G,et al.Efficientlycontrolling for case-control imbalance and sample relatedness in large-scale genetic association studies[J].Nature Genetic,2018,50(9):1335-1341.
[17]WANG W P.A dissertation for the master degree of engineering[D].Zhangzhou:Minnan Normal University,2016.
[18]CUI X J.The study of association rule based classification forimbalanced data[D].Dalian:Dalian University of Technology,2015.
[19]LI M L.Epidemiology[M].Beijing:People’s Medical Publishing House,2008:71.
[1] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[2] 曹扬晨, 朱国胜, 孙文和, 吴善超.
Study on Key Technologies of Unknown Network Attack Identification
计算机科学, 2022, 49(6A): 581-587. https://doi.org/10.11896/jsjkx.210400044
[3] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[4] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[5] 马董, 李新源, 陈红梅, 肖清.
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[6] 张亚迪, 孙悦, 刘锋, 朱二周.
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[7] 沈夏炯, 杨继勇, 张磊.
Attribute Exploration Algorithm Based on Unrelated Attribute Set
计算机科学, 2021, 48(4): 54-62. https://doi.org/10.11896/jsjkx.200800082
[8] 张岩金, 白亮.
Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph
计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011
[9] 张寒烁, 杨冬菊.
Technology Data Analysis Algorithm Based on Relational Graph
计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154
[10] 邹承明, 陈德.
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
[11] 刘新斌, 王丽珍, 周丽华.
MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution
计算机科学, 2021, 48(11): 208-218. https://doi.org/10.11896/jsjkx.201000097
[12] 刘晓楠, 宋慧超, 王洪, 江舵, 安家乐.
Survey on Improvement and Application of Grover Algorithm
计算机科学, 2021, 48(10): 315-323. https://doi.org/10.11896/jsjkx.201100141
[13] 张煜, 陆亿红, 黄德才.
Weighted Hesitant Fuzzy Clustering Based on Density Peaks
计算机科学, 2021, 48(1): 145-151. https://doi.org/10.11896/jsjkx.200400043
[14] 游兰, 韩雪薇, 何正伟, 肖丝雨, 何渡, 潘筱萌.
Improved Sequence-to-Sequence Model for Short-term Vessel Trajectory Prediction Using AIS Data Streams
计算机科学, 2020, 47(9): 169-174. https://doi.org/10.11896/jsjkx.190800060
[15] 张素梅, 张波涛.
Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization
计算机科学, 2020, 47(6A): 84-88. https://doi.org/10.11896/JsJkx.190900148
Full text



No Suggested Reading articles found!