计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 110-117.doi: 10.11896/jsjkx.200900083

所属专题: 智能数据治理技术与系统

• 智能数据治理技术与系统* 上一篇    下一篇

面向大数据分析的智能交互向导系统

余乐章1,2, 夏天宇1,2, 荆一楠1,2, 何震瀛1,2, 王晓阳1,3   

  1. 1 复旦大学计算机科学技术学院 上海201203
    2 上海市数据科学重点实验室(复旦大学) 上海200433
    3 上海智能电子与系统研究院 上海201203
  • 收稿日期:2020-09-10 修回日期:2021-01-29 出版日期:2021-09-15 发布日期:2021-09-10
  • 通讯作者: 荆一楠(jingyn@fudan.edu.cn)
  • 作者简介:19212010048@fudan.edu.cn
  • 基金资助:
    国家重点研发计划资助项目(2018YFB1004404)

Smart Interactive Guide System for Big Data Analytics

YU Yue-zhang1,2, XIA Tian-yu1,2, JING Yi-nan1,2, HE Zhen-ying1,2, WANG Xiao-yang1,3   

  1. 1 School of Computer Science and Technology,Fudan University,Shanghai 201203,China
    2 Shanghai Key Laboratory of Data Science,Fudan University,Shanghai 200433,China
    3 Shanghai Institute of Intelligent Electronics and Systems,Shanghai 201203,China
  • Received:2020-09-10 Revised:2021-01-29 Online:2021-09-15 Published:2021-09-10
  • About author:YU Yue-zhang,born in 1997,master.His main research interest includes big data analysis.
    JING Yi-nan,born in 1978,Ph.D,associate professor.His main research intere-sts include big data analysis,spatial and temporal data management,mobile computing,and security and privacy.
  • Supported by:
    National Key R&D Program Funded Project of China(2018YFB1004404)

摘要: 传统的大数据工具一般为专业数据分析人员打造,具有难以上手、操作交互性差以及不够智能化等特点。而智能交互向导系统是针对大数据交互式分析系统目前存在的问题而研制的一套大数据分析辅助工具。系统既研发了用户意图理解、数据抽样及列推荐、可视化推荐、分析方法推荐等核心关键技术,也拥有良好的图形化界面与人性化的智能交互体验。在满足用户多种交互式分析需求的同时,还具有极高的响应速度。不仅可以随时回溯到分析流程任意一步重新选择方法的执行流程,还可以通过接口与各种分析应用快速集成以部署应用于不同场景。经过实验测试,系统的平均交互时间均在3 s以内,且与传统分析方法相比系统交互的执行时效加快了3倍左右。通过用户用例测试,系统的满意度相比传统工具更加优秀。智能交互向导系统通过在易用性、时效性、可交互性和智能性等方面的探索,让不同基础的用户群体都可以使用此系统完成所需的大数据分析目标。

关键词: 大数据系统, 方法推荐, 数据分析, 用户意图, 智能交互

Abstract: Traditional big data tools are generally built for professional data analysts,and they have the characteristics of being difficult to get started,poor operation interaction,and not intelligent enough.The intelligent interactive guidance system is a set of big data analysis auxiliary tools developed around the current problems of the big data interactive analysis system.The system not only develops core key technologies such as user intention understanding,data sampling and column recommendation,visualization recommendation,and analysis method recommendation,but also has a good graphical interface and a humanized intelligent interactive experience.While meeting the user's multiple interactive analysis needs,it also has a very high response speed.Not only can you go back to any step of the analysis process to reselect the method execution process at any time,but you can also quickly integrate with various analysis applications through the interface to deploy and apply to different scenarios.After experimental tests,the average interaction time of the system is within 3 seconds,and the execution time of the system interaction is accelerated by about 3 times compared with the traditional analysis method.After using case testing,the system is also more satisfying than the use of traditional tools.Through the exploration of ease of use,timeliness,interactivity,and intelligence,the smart interactive guide system allows users of different basic groups to use the system to complete the required big data analysis goals.

Key words: Big data system, Data analysis, Method recommendation, Smart interaction, User intention

中图分类号: 

  • TP311.5
[1]CHAUDHURI S,DAYAL U.An overview of data warehousing and OLAP technology[J].ACM Sigmod Rec,1997,26(1):65-74.
[2] LO E,KAO B,HOW S,et al.OLAP on sequence data[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data.Vancouver:ACM,2008:649-660.
[3] CHUI C,KAO B,LO E,et al.S-olap:An olap system for analyzing sequence data[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data.Indianapolis:ACM,2010:1131-1134.
[4]MOHAMMAD S,SOUVIK B,BISHWARANJAN B,et al.L-Store:A Real-time OLTP and OLAP System[C]// Procee-dings of the 21th International Conference on Extending Database Technology.Vienna:EDBT,2018:540-551.
[5]MOHAMADINA A A,GHAZALI M R B,IBRAHIM M R B,et al.Business intelligence:concepts issues and current systems[C]//2012 International Conference on Advanced Computer Science Applications and Technologies.Kuala Lumpur,Malaysia:IEEE,2012:234-237.
[6]JOEL R,BRÁULIO A,SéRGIO M.Business intelligence in a public institution-Evaluation of a financial data mart[C]// 12th Iberian Conference on Information Systems and Technologies (CISTI).Lisboa,Portugal,2017:1-6.
[7]GUO C P,WANG Z,HAN F,et al.HaoLap:An Hadoop Based OLAP System for Massive Data[J].Journal of Computer Research and Development,2013,50(S1):378-383.
[8]CHEN X Q,XU J,GUO J F,et al.BDA:An open big data ana-lysis engine[J].Newsletter of Chinese Computer Society,2017,13(8):33-39.
[9]LI G J,CHEN X Q.Research Status and Scientific Thinking of Big Data[J].Bulletin of the Chinese Academy of Sciences,2012,27(6):647-651.
[10]PARK Y,MOZAFARI B,SORENSON J,et al.Verdictdb:Universalizing approximate query processing[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data.New York:ACM,2018:1461-1476.
[11]WU Z G,JING Y N,HE Z Y,et al.POLYTOPE:a flexible sampling system for answering exploratory queries[J].World Wide Web,2019,23(1):1-22.
[12]CLEVELAND W S,MCGILL R.Graphical perception:Theory,experimentation,and application to the development of graphical methods[J].Journal of the American Statistical Association,1984,79(387):531-554.
[13]MACKINLAY J D.Automating the design of graphical presentations of relational information[J].ACM Transactions on Graphics,1986,5(2):110-141.
[14]LUO Y,QIN X,TANG N,et al.DeepEye:Towards Automatic Data Visualization[C]// 2018 IEEE 34th International Confe-rence on Data Engineering.Paris:IEEE,2018:101-112.
[15]TROUILLON T,DANCE C R,GAUSSIER é,et al.Knowledge graph completion via complex tensor factorization[J].The Journal of Machine Learning Research,2017,18(1):4735-4772.
[16]SUN Z Y,CHEN Z X,HE Z Y,et al.A Fast Automated Model Selection Approach Based on Collaborative Knowledge[J].Database System for Advanced Applications,2020,12112:655-662.
[17]ASHISH T,JOYDEEP S S,NAMIT J,et al.Hive:a warehou-sing solution over a map-reduce framework[C]// Proceedings of the 35th International Conference on Very Large Data Bases.Lyon:Springer,2009:1626-1629.
[1] 丛颖男, 王兆毓, 朱金清.
关于法律人工智能数据和算法问题的若干思考
Insights into Dataset and Algorithm Related Problems in Artificial Intelligence for Law
计算机科学, 2022, 49(4): 74-79. https://doi.org/10.11896/jsjkx.210900191
[2] 江昊琛, 魏子麒, 刘璘, 陈俊.
非均衡数据分类经典方法综述与面向医疗领域的实验分析
Imbalanced Data Classification:A Survey and Experiments in Medical Domain
计算机科学, 2022, 49(1): 80-88. https://doi.org/10.11896/jsjkx.210200124
[3] 吴广智, 郭斌, 丁亚三, 成家慧, 於志文.
假消息认知机理研究综述
Cognitive Mechanisms of Fake News
计算机科学, 2021, 48(6): 306-314. https://doi.org/10.11896/jsjkx.201200194
[4] 张寒烁, 杨冬菊.
基于关系图谱的科技数据分析算法
Technology Data Analysis Algorithm Based on Relational Graph
计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154
[5] 胡腾, 王艳平, 张小松, 牛伟纳.
基于区块链的DApp数据与行为分析
Data and Behavior Analysis of Blockchain-based DApp
计算机科学, 2021, 48(11): 116-123. https://doi.org/10.11896/jsjkx.210200134
[6] 朱涤尘, 夏换, 杨秀璋, 于小民, 张亚成, 武帅.
基于文本挖掘和决策树分析的中国手游产业发展研究
Research on Mobile Game Industry Development in China Based on Text Mining and Decision Tree Analysis
计算机科学, 2020, 47(6A): 530-534. https://doi.org/10.11896/JsJkx.190700124
[7] 贾经冬, 张筱曼, 郝璐, 谭火彬.
工业界需求工程关注点分析
Analysis of Focuses of Requirements Engineering in Industry
计算机科学, 2020, 47(12): 25-34. https://doi.org/10.11896/jsjkx.201200048
[8] 冯贵兰, 李正楠, 周文刚.
大数据分析技术在网络领域中的研究综述
Research on Application of Big Data Analytics in Network
计算机科学, 2019, 46(6): 1-20. https://doi.org/10.11896/j.issn.1002-137X.2019.06.001
[9] 黄美蓉, 欧博, 何思源.
一种基于特征提取的访问控制方法
Access Control Method Based on Feature Extraction
计算机科学, 2019, 46(2): 109-114. https://doi.org/10.11896/j.issn.1002-137X.2019.02.017
[10] 李炎, 马俊明, 安博, 曹东刚.
一个基于Web的轻量级大数据处理与可视化工具
Web Based Lightweight Tool for Big Data Processing and Visualization
计算机科学, 2018, 45(9): 60-64. https://doi.org/10.11896/j.issn.1002-137X.2018.09.008
[11] 达一菲, 刘旭东, 孙海龙.
大数据驱动的开发者社区中知识交流网络的分析
Big Data Driven Analysis of Knowledge Exchange Network in Developer Community
计算机科学, 2018, 45(9): 113-118. https://doi.org/10.11896/j.issn.1002-137X.2018.09.017
[12] 陈贵平,王子牛.
基于大数据分析的用户信息多重加密存储技术
Multiple Encrypted Storage Technology of User Information Based on Big Data Analysis
计算机科学, 2018, 45(7): 150-153. https://doi.org/10.11896/j.issn.1002-137X.2018.07.025
[13] 雷雪梅,谢依彤.
用于高血压菜谱识别的基于遗传算法的改进XGBoost模型
Improved XGBoostModel Based on Genetic Algorithm for Hypertension Recipe Recognition
计算机科学, 2018, 45(6A): 476-481.
[14] 郭立轩,卓子寒,何跃鹰,李强,李舟军.
基于邻近序列的IP地址地理定位方法
IP Geolocation Method Based on Neighbor Sequence
计算机科学, 2018, 45(1): 200-204. https://doi.org/10.11896/j.issn.1002-137X.2018.01.035
[15] 郝艳妮,吴素萍,田维丽.
数据挖掘算法在葡萄酒信息数据分析系统中的研究
Research on Data Mining Algorithm in Wine Information Data Analysis System
计算机科学, 2017, 44(Z6): 491-494. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.109
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!