计算机科学 ›› 2018, Vol. 45 ›› Issue (6A): 447-452.

• 大数据与数据挖掘 • 上一篇    下一篇

基于关联规则的交通事故影响因素的挖掘

贾熹滨,叶颖婕,陈军成   

  1. 北京工业大学信息学部计算机学院 北京100124
  • 出版日期:2018-06-20 发布日期:2018-08-03
  • 作者简介:贾熹滨(1969-),女,博士,教授,CCF会员,主要研究方向为视觉图像处理及认知;叶颖婕(1993-),女,硕士生,主要研究方向为数据挖掘;陈军成(1980-),男,博士,讲师,主要研究方向为大数据、软件测试与分析,E-mail:juncheng@bjut.edu.cn。
  • 基金资助:
    国家重点研发计划(2017YFC0803300),国家自然科学基金项目(91646201,91546111,61672070,61672071),北京市教委重点项目(KZ201610005009)资助

Influence Factors Mining of Traffic Accidents Based on Association Rules

JIA Xi-bin,YE Ying-jie,CHEN Jun-cheng   

  1. College of Computer Science,Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China
  • Online:2018-06-20 Published:2018-08-03

摘要: 道路交通安全是一个公共的安全问题,每年因交通事故死亡的人数在所有安全事故导致的总死亡人数中占比最高。随着大数据智能分析技术的发展,广泛利用交通数据朔源事故原因,有利于提出针对性措施,预防交通事故的发生。文中针对导致交通事故的原因具有多样性的特点,提出利用交通事故的相关新闻数据,广泛结合新闻报道具有的真实性和时效性特点来进行交通事故因素及责任的分析。以新浪网站交通事故新闻为数据源,从新闻事件中提取引发交通事故的相关因素。针对经典Apriori只适用于单一维度的关联挖掘以及需要频繁扫描数据库的缺点,提出了改进的多值属性Apriori算法。以省市为关注点,挖掘出导致事故发生的多种组合因素,由此总结出省市多发交通事故的规律,并提供给有关部门作为采取预防和监管措施的依据。

关键词: 多值属性关联规则, 交通事故, 数据库, 数据挖掘

Abstract: The road traffic safety is a public safety issue.The number of deaths due to traffic accidents account for the highest proportion in all accidents every year.With the development of big data intelligent analysis technology,the traffic accident data are extesively used to trace the causes,it is helpful to propose specific measures to avoid and prevent the occurrence of traffic accidents.According to the characteristics of diversity causes of traffic accidents,this paper proposd to use the news’ data of traffic accident combining with a wide range of news’ authenticity and characteristics timeliness to do the analysis of factors and the liability of traffic accidents.Taking the traffic accident news in Sina as the data source,the relevant factors of traffic accidents are extracted from it.In terms of the limitation in classic Apriori that only applies to a single dimension association mining and needs to scan database frequently,an improved multi-va-lued attribute Apriori algorithm was proposed.Focuing on the traffic accident data of provinces and cities,a variety of combination factors which lead to these traffic accidents were mined,thus the rules of frequent traffic accidents in pro-vinces and cities were summarized as the basis for taking preventive and regulatory measures.

Key words: Data mining, Database, Multi-valued attribute association rules, Traffic accident

中图分类号: 

  • TP181
[1]张鹏辉.道路交通事故规律分析及预防对策研究[D].合肥:合肥工业大学,2008.
[2]张迪.北京市区道路交通事故致因分析与安全对策[D].郑州:河南理工大学,2012.
[3]盛来运.中华人民共和国国家统计局,国家统计局.中国统计年鉴[M].北京:中国统计出版社,2015.
[4]孙平,宋瑞,王海霞.我国道路交通事故成因分析及预防对策[J].安全与环境工程,2007,14(2):97-100.
[5]张丽霞,刘涛,潘福全,等.驾驶员因素对道路交通事故指标的影响分析[J].中国安全科学学报,2014,24(5):79-84.
[6]SHEIKH,RAHMAN M M.A statistical analysis of road traffic accidents and casualties in Bangladesh[J/OL].https://www.lap-publishing.com.
[7]TIAN R,YANG Z,ZHANG M.Method of Road Traffic Accidents Causes Analysis Based on Data Mining[C]∥International Conference on Computational Intelligence and Software Engineering.IEEE,2010:1-4.
[8]FOGUE M,GARRIDO P,MARTINEZ F J,et al.Using Data Mining and Vehicular Networks to Estimate the Severity of Traffic Accidents[M]∥Management Intelligent Systmes.Springer Berlin Heidelberg,2012:37-46.
[9]李瑶.数据挖掘技术在交通事故分析中的应用[J].电子设计工程,2009(2):77-78,81.
[10]ZHANG C,WANG S.Application of Data Mining in Urban Traffic Accidents Governance Based on Association Rules[J].Lecture Notes in Computer Science,2012,4(19):169-176.
[11]董立岩,刘光远,苑森淼,等.数据挖掘技术在交通事故分析中的应用[J].吉林大学学报(理学版),2006,44(6):951-955.
[12]王冬秀,李辉.关联规则在道路交通事故中的应用研究[J].福建电脑,2010,26(7):7-8.
[13]魏玉晓,李宗平,李宵寅.基于加权关联规则的交通事故分析[J].交通信息与安全,2009,27(1):94-97.
[14]肖冬荣,杨磊.基于遗传算法的关联规则数据挖掘[J].通信技术,2010,43(1):205-207.
[15]AGRAWAL R,IMIELIN′SKI T,SWAMI A.Mining Assocation Rules between Sets of Items in Large Databases[J].Acm Sigmod Record,1993,22(2):207-216.
[16]AGRAWAL R,SRIKANT R.Fast algorithms for mining associa- tion rules(3rd ed.)[M]∥Readings in database systems.Morgan Kaufmann Publishers Inc.1998:2299-308.
[17]SRIKANT R,AGRAWAL R.Mining quantitative association rules in large relational tables[C]∥ACM SIGMOD InternationalConference on Management of Data.ACM,1996:1-12.
[18]哈工大停用词表[EB/OL].https://wenku.baidu.com/view/b8b30382e53a580216fcfeb7.html.
[19]四川大学机器智能实验室停用词库[EB/OL].https://wenku.baidu.com/view/37f18269561252d380eb6e1e.html.
[20]百度中文停用词表[EB/OL].https://wenku.baidu.com/view/5059a59c2e3f5727a4e96245.html.
[1] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[2] 王润安, 邹兆年.
基于物理操作级模型的查询执行时间预测方法
Query Performance Prediction Based on Physical Operation-level Models
计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074
[3] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[4] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[5] 梁静茹, 鄂海红, 宋美娜.
基于属性图模型的领域知识图谱构建方法
Method of Domain Knowledge Graph Construction Based on Property Graph Model
计算机科学, 2022, 49(2): 174-181. https://doi.org/10.11896/jsjkx.210500076
[6] 张亚迪, 孙悦, 刘锋, 朱二周.
结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[7] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[8] 黄梅根, 刘川, 杜欢, 刘佳乐.
基于知识图谱的认知诊断模型及其在教辅中的应用研究
Research on Cognitive Diagnosis Model Based on Knowledge Graph and Its Application in Teaching Assistant
计算机科学, 2021, 48(6A): 644-648. https://doi.org/10.11896/jsjkx.200700163
[9] 刘蕴涵, 沙朝锋, 牛军钰.
基于Stack Overflow的数据库相关主题分析
Analysis of Topics on Database Systems in Stack Overflow
计算机科学, 2021, 48(6): 48-56. https://doi.org/10.11896/jsjkx.200800217
[10] 徐慧慧, 晏华.
基于相对危险度的儿童先心病风险因素分析算法
Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children
计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082
[11] 张岩金, 白亮.
一种基于符号关系图的快速符号数据聚类算法
Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph
计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011
[12] 范鹏浩, 黄国锐, 金培权.
NVRC:一种面向NVM的写限制日志方案
NVRC:Write-limited Logging for Non-volatile Memory
计算机科学, 2021, 48(3): 130-135. https://doi.org/10.11896/jsjkx.200900071
[13] 张寒烁, 杨冬菊.
基于关系图谱的科技数据分析算法
Technology Data Analysis Algorithm Based on Relational Graph
计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154
[14] 刘立成, 徐一凡, 谢贵才, 段磊.
面向NoSQL数据库的JSON文档异常检测与语义消歧模型
Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database
计算机科学, 2021, 48(2): 93-99. https://doi.org/10.11896/jsjkx.200900039
[15] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!