计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 427-430.

• 大数据与数据挖掘 • 上一篇    下一篇

基于领域关联冗余的教务数据关联规则挖掘

陆鑫赟1, 王兴芬2   

  1. 北京信息科技大学计算机学院 北京1001921;
    北京信息科技大学信息管理学院 北京1001922
  • 出版日期:2019-06-14 发布日期:2019-07-02
  • 通讯作者: 王兴芬(1968-),女,博士,教授,主要研究方向为数据挖掘、电子商务,E-mail:xfwang@bistu.edu.cn(通信作者)。
  • 作者简介:陆鑫赟(1994-),女,硕士生,主要研究方向为数据处理技术、数据挖掘,E-mail:918047774@qq.com;

Educational Administration Data Mining of Association Rules Based on Domain Association Redundancy

LU Xin-yun1, WANG Xing-fen2   

  1. Computer School,Beijing Information Science and Technology University,Beijing 100192,China1;
    School of Information management,Beijing Information Science and Technology University,Beijing 100192,China2
  • Online:2019-06-14 Published:2019-07-02

摘要: 教育教学的周期性以及教学环境的变化使高校教务数据具有时序性的特点,并且高校教务数据存在较多的关联冗余,因此挖掘出高效有趣的关联规则较为困难。虽然序列模式挖掘算法能够挖掘出时序频繁项集,但其并不能消除教务数据中的关联冗余,挖掘结果的效用性以及新颖性均无法满足要求。为此,文中提出了一种基于教育领域关联冗余的FUI_DK关联规则挖掘算法。FUI_DK算法基于序列模式挖掘算法产生频繁候选项集,在经典关联规则算法的支持度、置信度的基础上增加效用度以及有趣度这两个参数来得到高效用有趣项集,并根据满足条件的关联规则的支持度、置信度、效用度对其进行排序输出,最终得到具有高效用性以及有趣性的关联规则结果。在某高校学生教务数据上进行实验对比及挖掘结果分析,实验证明该算法缩短了运算时间,领域内已知关联规则的消除率可达43%,可帮助高校进行省时有效的教育数据挖掘。

关键词: 高效用有趣项集, 关联规则, 教务数据, 领域知识, 序列模式挖掘

Abstract: Due to the periodicity of teaching and the change of teaching environment,the data of educational administration in colleges and universities have the characteristics of time series,and there are many association redundancy,so it is difficult to find out the efficient and interesting association rules.Although the sequential pattern mining algorithm can mine the time series frequent itemsets,it can not eliminate the association redundancy in educational administration data,and the utility and novelty of mining results can not meet the requirements.Therefore,this paper proposed a FUI_DK association rule mining algorithm based on association redundancy in the educational field.FUI_DK algorithm generates frequent candidate itemsets based on sequential pattern mining algorithm,and increases utility and interest to obtain high utility interesting itemsets based on the support,confidence of classical association rule algorithms,and the association rules satisfying the conditions are sorted out according to their support,confidence and utility.Finally,the result of association rules with high utility and high interest is obtained.The experiment contrast and mining result analysis are carried out on the data of a university student educational administration.The experimental results show that the FUI_DK algorithm has better time performance in the data mining of university educational administration,and the elimination rate of known association rules in the field can reach 43%,which can help colleges and universities to carry out time-saving and effective educational data mining.

Key words: Association rules, Domain knowledge, Educational administration data, High utility and interesting itemsets, Sequential pattern mining

中图分类号: 

  • TP399
[1]FAYYAD U,PIATETSKY-SHAPIRO G,SMYTH P.From data mining to knowledge Discovery:an overview [C]∥Advances in Knowledge Discovery and Data Mining.Menlo Park,California:AAAI Press,1996:1-35.
[2]AGRAWL R,SRIKANT R.Fast algorithms for mining association rules[C]∥Very Large Data Base.1994:487-499.
[3]HAN J,PEI J,YIN Y.Mining frequent patterns without candidate generation[C]∥Special Interest Group on Management of Data.2000:1-12.
[4]YAO H,HAMILTON H J,BUTS C J.A foundational approach to mining itemset utilities from databases[C]∥Siam International Conference on Data Mining.2004:482-486.
[5]TSENG V S,WU C W,SHIE B E,et al.UP-Growth:an efficient algorithm for high utility itemset mining[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2010:253-262.
[6]WU C W,SHIE B E,TSENG V S,et al.Mining top-K high utility itemsets[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2012:78-86.
[7]李慧,刘贵全,瞿春燕.频繁和高效用项集挖掘[J].计算机科学,2015,42(5):82-87.
[8]PEI J,HAN J,MORTAZAVIASL B,et al.PrefixSpan:Mining sequential patterns efficiently by prefix-projected pattern growth[C]∥Proceedings 17th International Conference on Data Engineering.ICDE,2001:215-224.
[9]吴倩,王林平,罗相洲,等.一种快速挖掘top-k高效用模式的算法[J].计算机应用研究,2017,34(11):3303-3307.
[10]王敬华,罗相洲,吴倩.基于投影的高效用项集挖掘算法[J].小型微型计算机系统,2016,37(6):1212-1216.
[11]潘海为,韩启龙,印桂生,等.基于领域知识指导的医学图像关联规则挖掘[J].计算机研究与发展,2007,44(z3):424-428.
[12]潘海为,谭小雷,韩启龙.领域知识驱动的医学图像关联模式挖掘算法[J].黑龙江大学自然科学学报,2009,26(5):585-590.
[13]张晶,张斌,胡学钢.基于领域知识的冗余关联规则消除算法[J].合肥工业大学学报(自然科学版),2011,34(2):246-250.
[14]SHEN W,WANG J,HAN J.Sequential Pattern Mining[M]∥Frequent Pattern Mining.Springer International Publishing,2014:512-517.
[1] 曹扬晨, 朱国胜, 孙文和, 吴善超.
未知网络攻击识别关键技术研究
Study on Key Technologies of Unknown Network Attack Identification
计算机科学, 2022, 49(6A): 581-587. https://doi.org/10.11896/jsjkx.210400044
[2] 梁静茹, 鄂海红, 宋美娜.
基于属性图模型的领域知识图谱构建方法
Method of Domain Knowledge Graph Construction Based on Property Graph Model
计算机科学, 2022, 49(2): 174-181. https://doi.org/10.11896/jsjkx.210500076
[3] 徐慧慧, 晏华.
基于相对危险度的儿童先心病风险因素分析算法
Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children
计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082
[4] 沈夏炯, 杨继勇, 张磊.
基于不相关属性集合的属性探索算法
Attribute Exploration Algorithm Based on Unrelated Attribute Set
计算机科学, 2021, 48(4): 54-62. https://doi.org/10.11896/jsjkx.200800082
[5] 张素梅, 张波涛.
一种基于量子耗散粒子群的评估模型构建方法
Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization
计算机科学, 2020, 47(6A): 84-88. https://doi.org/10.11896/JsJkx.190900148
[6] 陈孟辉, 曹黔峰, 兰彦琦.
基于区块挖掘与重组的启发式算法求解置换流水车间调度问题
Heuristic Algorithm Based on Block Mining and Recombination for Permutation Flow-shop Scheduling Problem
计算机科学, 2020, 47(6A): 108-113. https://doi.org/10.11896/JsJkx.190300151
[7] 崔巍, 贾晓琳, 樊帅帅, 朱晓燕.
一种新的不均衡关联分类算法
New Associative Classification Algorithm for Imbalanced Data
计算机科学, 2020, 47(6A): 488-493. https://doi.org/10.11896/JsJkx.190600132
[8] 王青松, 姜富山, 李菲.
大数据环境下基于关联规则的多标签学习算法
Multi-label Learning Algorithm Based on Association Rules in Big Data Environment
计算机科学, 2020, 47(5): 90-95. https://doi.org/10.11896/jsjkx.190300150
[9] 朱岸青, 李帅, 唐晓东.
Spark平台中的并行化FP_growth关联规则挖掘方法
Parallel FP_growth Association Rules Mining Method on Spark Platform
计算机科学, 2020, 47(12): 139-143. https://doi.org/10.11896/jsjkx.191000110
[10] 张蕾,蔡明.
基于主题融合和关联规则挖掘的图像标注
Image Annotation Based on Topic Fusion and Frequent Patterns Mining
计算机科学, 2019, 46(7): 246-251. https://doi.org/10.11896/j.issn.1002-137X.2019.07.037
[11] 张维国.
面向知识推荐服务的选课决策
Decision Making of Course Selection Oriented by Knowledge Recommendation Service
计算机科学, 2019, 46(6A): 507-510.
[12] 张洪泽, 洪征, 王辰, 冯文博, 吴礼发.
基于闭合序列模式挖掘的未知协议格式推断方法
Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method
计算机科学, 2019, 46(6): 80-89. https://doi.org/10.11896/j.issn.1002-137X.2019.06.011
[13] 孙文平, 常亮, 宾辰忠, 古天龙, 孙彦鹏.
基于知识图谱和频繁序列挖掘的旅游路线推荐
Travel Route Recommendation Based on Knowledge Graph and Frequent Sequence Mining
计算机科学, 2019, 46(2): 56-61. https://doi.org/10.11896/j.issn.1002-137X.2019.02.009
[14] 李智星, 任诗雅, 王化明, 沈柯.
基于非结构化文本增强关联规则的知识推理方法
Knowledge Reasoning Method Based on Unstructured Text-enhanced Association Rules
计算机科学, 2019, 46(11): 209-215. https://doi.org/10.11896/jsjkx.181001939
[15] 王斌, 马俊杰, 房新秀, 魏天佑.
基于时间戳和垂直格式的关联规则挖掘算法
Association Rule Mining Algorithm Based on Timestamp and Vertical Format
计算机科学, 2019, 46(10): 71-76. https://doi.org/10.11896/jsjkx.190100223
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!