基于相对危险度的儿童先心病风险因素分析算法

doi:10.11896/jsjkx.200500082

计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 210-214.doi: 10.11896/jsjkx.200500082

基于相对危险度的儿童先心病风险因素分析算法

徐慧慧, 晏华

电子科技大学计算机科学与工程学院成都611731

收稿日期:2020-05-19 修回日期:2020-08-08 出版日期:2021-06-15 发布日期:2021-06-03
通讯作者: 晏华(huayan@uestc.edu.cn)
基金资助:
国家自然科学基金(61976046);四川省重点研发计划项目(2018SZ0065)

Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children

XU Hui-hui, YAN Hua

School of Computer Science & Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China

Received:2020-05-19 Revised:2020-08-08 Online:2021-06-15 Published:2021-06-03
About author:XU Hui-hui,born in 1995,postgra-duate.Her main research interests include data mining and so on.(494655043@qq.com)
YAN Hua,born in 1970,Ph.D,associate professor.Her main research interests include computational intelligence and data mining.
Supported by:
National Natural Science Foundation of China(61976046) and Key Research and Development Projects of Sichuan Province(2018SZ0065).

摘要/Abstract

摘要： 对疾病相关风险项的分析是数据挖掘理论在医疗领域应用的一个重要内容,可以帮助医生分析疾病成因,从而有效地开展防治工作。医学领域的疾病数据有其自身的特征,例如其高度不平衡性的特点往往使得大量珍贵的信息蕴藏于支持度小的属性项中,直接采用经典的基于支持度的关联规则挖掘算法易造成重要信息的丢失。因此,文中结合医疗领域的知识,基于医学领域常用的统计标准——相对危险度,提出了一种挖掘疾病高风险项集的算法(Mining Algorithm for high Relative Risk Itemsets,MARRI),以及与之相匹配的两种规则剪枝方法,即作用叠加剪枝和样本数剪枝,并在儿童先心病数据集上对算法进行验证。实验结果表明,该算法具有挖掘低支持度项集信息的能力,挖掘出的疾病关联因素更有价值。

关键词: 关联规则, 疾病分析, 数据挖掘, 相对危险度

Abstract: The analysis of disease-related risk factors is an important part of application of data mining theory in the medical field,which is helpful for doctors to analyze causes of disease and carry out effective work of disease prevention and control.But disease data in the medical field have their own characteristics,such as high imbalance,which means that most valuable information is contained in the attribute items with a small support.It is easy to lose important information when applying the classical association rule algorithm based on the support directly.Therefore,based on the knowledge of medical field and the common statistical standard of medical field——Relative Risk,this paper proposes a mining algorithm for high relative risk itemsets(MARRI) and two corresponding pruning methods,which are interaction pruning and sample number pruning,and verifies the algorithm on the dataset of children’s congenital heart disease.Experimental results show that the algorithm is effective to mine the information in low support items and disease-related factors mined out are more valuable.

Key words: Association rules, Data mining, Disease analysis, Relative risk

中图分类号:

TP181

徐慧慧, 晏华. 基于相对危险度的儿童先心病风险因素分析算法[J]. 计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082

XU Hui-hui, YAN Hua. Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children[J]. Computer Science, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082

参考文献

[1]AGRAWAL R,IMIELIŃSKI T,SWAMI A.Mining association rules between sets of items in large database[J].ACM SIGMOD Record,1993,22(2):207-216.
[2]GAO L,WANG J,LI F G,et al.Symptoms-herbs relationship in lung diseases based on association rules[J].Journal of Traditional Chinese Medicine,2013,54(8):697-700.
[3]JIANG Y R,XIE Y H,ZHANG J C,et al.Data mining of the medication rule of Chen Keji in the treatment of blood stasis syndrome of cardiovascular disease[J].Journal of Traditional Chinese Medicine,2015,56(5):376-380.
[4]LI Q,CHEN D T,LUO X L.Implementation of the association rule algorithm in medical big data[J].Software Engineering,219,22(1):12-15.
[5]LEE W H,WANG E T,CHEN A L P.Mining accompanying relationships between diseases from patient records[C]//IEEE International Conference on Big Data.IEEE,2018:3861-3868.
[6]WANG M X.The prediction model for disease based on logistic regression and association rules[D].Jinan:Shandong Univer-sity,2016.
[7]GAO S Y,CHENG S Z.Application of clustering-based entropy weighted association analysis[J/OL].[2019-01].http://dpi-proceedings.com/index.php/dtcse/article/view/27565.
[8]OJHA D,PANDEY P.Optimizing Association Rule using Ge-netic Algorithm and Data Sampling Approach[J].International Journal of Computer Applications,2018,179(11):15-19.
[9]DING Y,ZHU C S,WU Y Y.Association Rule Mining Algo-rithm Based on Hadoop[J].Computer Science,2018,45(11A):409-411,416.
[10]IBRAHIM A,SHEHADA D.Study of Association Rule Mining for Discovery of Frequent Item Sets on Big Data Sets[J].International Journal of Materials Science,2018,13(4):345-358.
[11]LIU Z,HU L,WU C,et al.A novel process-based association rule approach through maximal frequent itemsets for big data processing[J].Future Generation Computer Systems,2018,81:414-424.
[12]WANG Q S,JIANG F S,LI F.Multi-label Learning Algorithm Based on Association Rules in Big Data Environment[J].Computer Science,2020,47(5):90-95.
[13]RATHEE S,KASHYAP A.Adaptive-Miner:an efficient dis-tributed association rule mining algorithm on Spark[J].Journal of Big Data,2018,5(1):6.
[14]VOUGAS K,KROCHMAL M,JACKSON T,et al.Deep lear-ning and association rule mining for predicting drug response in cancer[J/OL].https://www.biorxiv.org/content/10.1101/070490v3.full.
[15]KHAN A,USMAN M.Early diagnosis of Alzheimer’s disease using machine learning techniques:a review paper[C]//2015 7th International Joint Conference on Knowledge Discovery.IEEE,2016:380-387.
[16]ZHOU W,NIELSEN J B,FRITSCHE L G,et al.Efficientlycontrolling for case-control imbalance and sample relatedness in large-scale genetic association studies[J].Nature Genetic,2018,50(9):1335-1341.
[17]WANG W P.A dissertation for the master degree of engineering[D].Zhangzhou:Minnan Normal University,2016.
[18]CUI X J.The study of association rule based classification forimbalanced data[D].Dalian:Dalian University of Technology,2015.
[19]LI M L.Epidemiology[M].Beijing:People’s Medical Publishing House,2008:71.

相关文章 15

[1]	黎嵘繁, 钟婷, 吴劲, 周帆, 匡平. 基于时空注意力克里金的边坡形变数据插值方法 Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation 计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[2]	曹扬晨, 朱国胜, 孙文和, 吴善超. 未知网络攻击识别关键技术研究 Study on Key Technologies of Unknown Network Attack Identification 计算机科学, 2022, 49(6A): 581-587. https://doi.org/10.11896/jsjkx.210400044
[3]	么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明. 大数据驱动的社会经济地位分析研究综述 Big Data-driven Based Socioeconomic Status Analysis:A Survey 计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[4]	孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[5]	马董, 李新源, 陈红梅, 肖清. 星型高影响的空间co-location模式挖掘 Mining Spatial co-location Patterns with Star High Influence 计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[6]	张亚迪, 孙悦, 刘锋, 朱二周. 结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究 Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index 计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[7]	沈夏炯, 杨继勇, 张磊. 基于不相关属性集合的属性探索算法 Attribute Exploration Algorithm Based on Unrelated Attribute Set 计算机科学, 2021, 48(4): 54-62. https://doi.org/10.11896/jsjkx.200800082
[8]	张岩金, 白亮. 一种基于符号关系图的快速符号数据聚类算法 Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph 计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011
[9]	张寒烁, 杨冬菊. 基于关系图谱的科技数据分析算法 Technology Data Analysis Algorithm Based on Relational Graph 计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154
[10]	邹承明, 陈德. 高维大数据分析的无监督异常检测方法 Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis 计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
[11]	刘新斌, 王丽珍, 周丽华. MLCPM-UC:一种基于模式实例分布均匀系数的多级co-location模式挖掘算法 MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution 计算机科学, 2021, 48(11): 208-218. https://doi.org/10.11896/jsjkx.201000097
[12]	刘晓楠, 宋慧超, 王洪, 江舵, 安家乐. Grover算法改进与应用综述 Survey on Improvement and Application of Grover Algorithm 计算机科学, 2021, 48(10): 315-323. https://doi.org/10.11896/jsjkx.201100141
[13]	张煜, 陆亿红, 黄德才. 基于密度峰值的加权犹豫模糊聚类算法 Weighted Hesitant Fuzzy Clustering Based on Density Peaks 计算机科学, 2021, 48(1): 145-151. https://doi.org/10.11896/jsjkx.200400043
[14]	游兰, 韩雪薇, 何正伟, 肖丝雨, 何渡, 潘筱萌. 基于改进Seq2Seq的短时AIS轨迹序列预测模型 Improved Sequence-to-Sequence Model for Short-term Vessel Trajectory Prediction Using AIS Data Streams 计算机科学, 2020, 47(9): 169-174. https://doi.org/10.11896/jsjkx.190800060
[15]	张素梅, 张波涛. 一种基于量子耗散粒子群的评估模型构建方法 Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization 计算机科学, 2020, 47(6A): 84-88. https://doi.org/10.11896/JsJkx.190900148

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于相对危险度的儿童先心病风险因素分析算法

Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0