计算机科学 ›› 2018, Vol. 45 ›› Issue (6A): 437-441.

• 大数据与数据挖掘 • 上一篇    下一篇

基于考试过程和知识结构的数据挖掘算法研究

代明竹,高嵩峰   

  1. 北京建筑大学机电与车辆工程学院 北京100044
  • 出版日期:2018-06-20 发布日期:2018-08-03
  • 作者简介:代明竹(1991-),女,硕士生,主要研究方向为管理信息系统;高嵩峰(1972-),男,博士,副教授,主要研究方向为管理信息系统、生物力学,E-mail:gaosongfeng@bucea.edu.cn。

Research on Data Mining Algorithm Based on Examination Process and Knowledge Structure

DAI Ming-zhu,GAO Song-feng   

  1. School of Mechanical-electronic and Vehicle Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044,China
  • Online:2018-06-20 Published:2018-08-03

摘要: 为了研究学生在不同阶段对知识点的掌握情况,基于对数据挖掘的理论研究,把知识结构与考试成绩相结合来进行数据研究。以教育测量学为基础,结合数据挖掘的决策树算法,针对原有的C4.5算法提出改进算法,应用试卷中涉及到的知识点的难易程度与知识点种类进行知识结构细化,以便确定单个学生或群体学生对知识点的掌握程度和试卷中各知识点之间的关系。结果显示,改进后算法的计算公式比原计算公式简单实用;根据决策树模型,使用剩余数据对计算公式进行验证,能够更快地得出对程序设计这个知识点的掌握是影响成绩相对重要的因素。使用测试数据对已创建的决策树进行验证,准确率为90%。最后对决策树进行可视化展示,为学生的学习安排、教师的教学方案及安排等提供有效的参考。

关键词: C4.5, 决策树, 试卷分析, 数据挖掘, 知识结构

Abstract: In order to study the mastery of knowledge points at different stages of student,based on the theory of data mining,knowledge structure was combined with examination results to study data.Based on the theory of educational measurement and the decision tree algorithm of data mining,an improved algorithm was proposed according to the original C4.5 algorithm,applying the difficulty level of the knowledge points involved in the test papers and the knowledge structure to refine the knowledge structure in order to determine the degree of knowledge of individual students or groups of students and the relationship between the knowledge points.The experimental results show that the efficiency of the improved algorithm is improved,whose formula is simple and practical compared with the original formula.According to the decision tree model,the remaining data is used to verify the improved formula,and it is faster to draw the conclusion that the effect of knowledge points on programming is relatively important.Test data is used to verify the decision tree,and the accuracy rate is 90%.Finally,a visual display of the decision tree can give an effective reference for students to learn the arrangements,teachers to develop teaching programs and arrangements.

Key words: C4.5, Data mining, Decision tree, Knowledge structure, Paper analysis

中图分类号: 

  • TP391
[1]白彦辉.关联规则挖掘在试卷分析系统中的应用[J].内蒙古民族大学学报(自然科学版),2012,27(2):159-161.
[2]牛瑞敏.数据挖掘在国内教育领域应用的研究综述[J].中山大学研究生学刊(人文社会科学版),2016,37(2):193-200.
[3]刘志妩.基于决策树算法的学生成绩的预测分析[J].计算机应用与软件,2012,29(11):312-314,330.
[4]王黎黎,刘学军.决策树C4.5算法在成绩分析中的应用[J].河南工程学院学报( 自然科学版),2014,26(4):69-73.
[5]胡庆.基于决策树的试卷知识点掌握程度分析研究[D].南昌:江西财经大学,2014.
[6]段薇,马丽,路向阳.基于信息增益和最小距离分类的决策树改进算法[J].科学技术与工程,2013,13(6):1643-16552.
[7]NOH C H,CHO K C,MA Y B,et al.Grid resource selection system using decision tree method [J].Korea Soc Comput Inf,2009,13(1):1-10.
[8]阮晓宏,黄小猛,袁鼎荣,等.基于异构代价敏感决策树的分类器算法[J].计算机科学,2013,40(11):140-142,146.
[9]毛国君,段立娟.数据挖掘原理与算法[M].北京:清华大学出版社,2016:128-137.
[10]于孝美,陈贞翔,彭立志.基于决策树的网络流量分类方法[J].济南大学学报(自然科学版),2012,26(3):291-295.
[11]王领,胡扬.基于C4.5决策树的股票数据挖掘[J].计算机与现代化,2015(10):20-24.
[12]JIANG W L.Research and Application of Credit Score Based on Decision Tree Model∥Applied Informatics and Communication.Springer Berlin Heidelberg,2011:493-501.
[13]黄秀霞,孙力.C4.5算法的优化[J].计算机工程与设计,2016,67(5):1267-1271.
[14]宋万洋,李国和,洪云峰,等.基于平衡准确率和规模的决策树剪枝算法[J].科学技术与工程,2016,16(16):79-82.
[15]KANTARDZIE M.Data mining:concepts models methods and algorithms[M].John Wiley & Sons,Inc.,2004.
[1] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[2] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[3] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[4] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[5] 刘振宇, 宋晓莹.
一种可用于分类型属性数据的多变量回归森林
Multivariate Regression Forest for Categorical Attribute Data
计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[6] 张亚迪, 孙悦, 刘锋, 朱二周.
结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[7] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[8] 唐亮, 李飞.
基于决策树的车联网安全态势预测模型研究
Research on Forecasting Model of Internet of Vehicles Security Situation Based on Decision Tree
计算机科学, 2021, 48(6A): 514-517. https://doi.org/10.11896/jsjkx.200700158
[9] 曹扬晨, 朱国胜, 祁小云, 邹洁.
基于随机森林的入侵检测分类研究
Research on Intrusion Detection Classification Based on Random Forest
计算机科学, 2021, 48(6A): 459-463. https://doi.org/10.11896/jsjkx.200600161
[10] 徐慧慧, 晏华.
基于相对危险度的儿童先心病风险因素分析算法
Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children
计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082
[11] 丁思凡, 王锋, 魏巍.
一种基于标签相关度的Relief特征选择算法
Relief Feature Selection Algorithm Based on Label Correlation
计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025
[12] 张岩金, 白亮.
一种基于符号关系图的快速符号数据聚类算法
Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph
计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011
[13] 张寒烁, 杨冬菊.
基于关系图谱的科技数据分析算法
Technology Data Analysis Algorithm Based on Relational Graph
计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154
[14] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
[15] 刘新斌, 王丽珍, 周丽华.
MLCPM-UC:一种基于模式实例分布均匀系数的多级co-location模式挖掘算法
MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution
计算机科学, 2021, 48(11): 208-218. https://doi.org/10.11896/jsjkx.201000097
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!