Computer Science ›› 2018, Vol. 45 ›› Issue (1): 1-13.doi: 10.11896/j.issn.1002-137X.2018.01.001

    Next Articles

Data Science Studies:State-of-the-art and Trends

CHAO Le-men, XING Chun-xiao and ZHANG Yong   

  • Online:2018-01-15 Published:2018-11-13

Abstract: The entering big data era gives rise to a novel discipline called data science.First,the differences between domain-general data science and domain-specific data science were proposed based upon conducting an in-depth discussion on its basic concept,brief history,scientific roles and the body of knowledge.Secondly,top ten challenges faced by data science were identified via describing the debates on paradoxical topics including the shifts of thinking pattern (know-ledge pattern or data pattern),perspectives on data (active or negative),implementation of intelligence(via AI or via big data),bottlenecks of data products development(computing intensive or data intensive),data preparation (data preprocessing or data wrangling),quality of services(performance of services or user experiences),data analysis (explanatory or predictive),evaluation of algorithm(by complexity or by scalability),research paradigm(third paradigm or fourth paradigm) as well as main motivations of the education(in order to cultivate data engineer or data scientist).And then,the top ten trends in data science studies were proposed:to vale predictive models and correlation analysis,to give more attention on model integration and meta-analysis,to embrace data first,model later or never paradigm,to be led by rea-lism and ensure data consistence,to support multi-copies and data locality,the coexistence of varieties in implementation techno logies and integrated applications,to be dominated by simple computing and pragmatism,to develop data products and the embedded applications of data science,to embrace the Pro-Am and metadata,and cultivate data scientist and curriculums or majors.Finally,some suggestions on how do further studies were also proposed.

Key words: Data science,Big data,Data products developement,Data wrangling,Data-driven

[1] WALKER J S,NAIMI AI.Big data:A revolution that will transform how we live,work,and think[J].Mathematics & Computer Education,2013,7(17):181-183.
[2] BOYD D,CRAWFORD K.Critical questions for big data:Pro-vocations for a cultural,technological,and scholarly phenomenon[J].Information,Communication & Society,2012,15(5):662-679.
[3] KITCHIN R.Big data,new epistemologies and paradigm shifts[J].Big Data & Society,2014,1(1):1-12.
[4] JAGADISH H V.Big data and science:myths and reality[J].Big Data Research,2015,2(2):49-52.
[5] PROVOST F,FAWCETT T.Data science and its relationship to big data and data-driven decision making[J].Big Data,2013,1(1):51-59.
[6] NAUR P.Concise survey of computer methods[M].Studentlitteratur AB,1974.
[7] CLEVELAND W S.Data science:an action plan for expanding the technical areas of the field of statistics[J].International Statistical Review,2001,69(1):21-26.
[8] MATTMANN C A.Computing:A vision for data science[J].Nature,2013,493(7433):473-475.
[9] DHAR V.Data science and prediction[J].Communications of the ACM,2013,56(12):64-73.
[10] PATIL D J,DAVENPORT T H.Data scientist:the sexiest job of the 21st century[J].Harvard Business Review,2012,90(10):70-76.
[11] KITCHIN R.Big data and human geography:Opportunities,challenges and risks[J].Dialogues in Human Geography,2013,3(3):262-267.
[12] SMITH M.The White House names Dr,DJ Patil as the first US chief data scientist.https://obamawhitehouse.archives.gov/blog/2015/02/18/white-house-names -dr-dj-patil-first-us-chief-data-scientist.
[13] GARTNER J.Gartner’s 2014 hype cycle for emerging technologies maps the journey to digital business.http://www.gartner.com/newsroom/id/2819918.
[14] GARTNER J.Hype Cycle for Data Science.https://www.gartner.com/doc/3388917/hype-cycle-data-science.
[15] SCHUTT R,O’NEIL C.Doing data science:Straight talk from the frontline[M].O’Reilly Media,Inc.,2013:7.
[16] OVERTON J.Going Pro in Data Science [M].O’Reilly Media,Inc.,2016:12.
[17] 朝乐门.数据科学理论与实践[M].北京:清华大学出版社,2017:15.
[18] GRAY J,CHAMBERS L,BOUNEGRU L.The data journalism handbook:how journalists can use data to improve the news[M].O’Reilly Media,Inc.,2012.
[19] KALIDINDI S R,DE GRAEF M.Materials data science:current status and future outlook[J].Annual Review of Materials Research,2015,45:171-193.
[20] FANG B,ZHANG P.Big Data in Finance[M]∥Big Data Concepts,Theories,and Applications.Springer International Publishing,2016:391-412.
[21] DAVIS K.Ethics of Big Data:Balancing risk and innovation[M].O’Reilly Media,Inc.,2012.
[22] WEST D M.Big data for education:Data mining,data analytics,and web dashboards[J].Governance Studies at Brookings,2012,4:1-10.
[23] LABRINIDIS A,JAGADISH H V.Challenges and opportunities with big data[J].Proceedings of the VLDB Endowment,2012,5(12):2032-2033.
[24] KAISLER S,ARMOUR F,E SPINOSA J A,et al.Big data:Issues and challenges moving forward[C]∥2013 46th Hawaii International Conference on System Sciences (HICSS).IEEE,2013:995-1004.
[25] CHEN H,CHIANG R H L,STOREY V C.Business intelli-gence and analytics:From big data to big impact[J].MIS Quarterly,2012,36(4):1164-1188.
[26] PROVOST F,FAWCETT T.Data science and its relationship to big data and data-driven decision making[J].Big Data,2013,1(1):51-59.
[27] CLEVELAND W S.Data science:an action plan for expanding the technical areas of the field of statistics[J].International Statistical Review,2001,69(1):21-26.
[28] MATTMANN C A.Computing:A vision for data science[J].Nature,2013,493(7433):473-475.
[29] SCHUTT R,O’NEIL C.Doing data science:Straight talk from the frontline[M].O’Reilly Media,Inc.,2013.
[30] SHANAHAN J G,DAI L.Large scale distributed data scienceusing apache spark[C]∥Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2015:2323-2324.
[31] HOLMES A.Hadoop in practice[M].Manning PublicationsCo.,2012.
[32] SHARMA S,SHANDILYA R,PATNAIK S,et al.Leading NoSQL models for handling Big Data:a brief review[J].International Journal of Business Information Systems,2016,22(1):1-25.
[33] SADALAGE P J,FOWLER M.NoSQL distilled:a brief guide to the emerging world of polyglot persistence[M].Pearson Education,2012.
[34] MARX V.Biology:The big challenges of big data[J].Nature,2013,498(7453):255-260.
[35] RAGHUPATHI W,RAGHUPATHI V.Big data analytics inhealthcare:promise and potential[J].Health Information Scien-ce and Systems,2014,2(1):3.
[36] KIM G H,TRIMI S,CHUNG J H.Big-data applications in the government sector[J].Communications of the ACM,2014,57(3):78-85.
[37] DANIEL B.Big data and analytics in higher education:Opportunities and challenges[J].British Journal of Educational Techno-logy,2015,46(5):904-920.
[38] GEORGE G,HAAS M R,PENTLAND A.Big data and ma-nagement[J].Academy of Management Journal,2014,57(2):321-326.
[39] SWAN M.The quantified self:Fundamental disruption in big data science and biological discovery[J].Big Data,2013,1(2):85-99.
[40] LEWIS S C.Journalism in an Era of Big Data:Cases,concepts,and critiques.https:/doi.org/10.1080/21670811.2014.976399.
[41] RAHM E.Big Data Analytics[J].IT-Information Technology,2016,58(4):155-156.
[42] BAUMER B.A data science course for undergraduates:Thin-king with data[J].The American Statistician,2015,69(4):334-342.
[43] HARDIN J,HOERL R,HORTON N J,et al.Data science instatistics curricula:Preparing students to “think with data”[J].The American Statistician,2015,69(4):343-353.
[44] CASSEL L N,POSNER M,DICHEVA D,et al(1)Advancing data science for students of all majors[C]∥Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education.ACM,2017:722.
[45] BERMAN F D,BOURNE P E.Let’s make gender diversity in data science a priority right from the start[J].PLoS biology,2015,13(7):e1002206.
[46] CHAO L.Data Science [M].Tsinghua University Press,2016.
[47] COOPER P.Data,information,knowledge and wisdom[J].Anae-sthesia & Intensive Care Medicine,2014,15(1):44-45.
[48] ERL T,KHATTAK W,BUHLER P.Big data fundamentals:concepts,drivers & techniques[M].Prentice Hall Press,2016.
[49] WANG G,GUNASEKARAN A,NGAI E W T,et al(1)Big data analytics in logistics and supply chain management:Certain investigations for research and applications[J].International Journal of Production Economics,2016,176:98-110.
[50] CARDENAS A A,MANADHATA P K,RAJAN S P.Big data analytics for security[J].IEEE Security & Privacy,2013,11(6):74-76.
[51] RAGHUPATHI W,RAGHUPATHI V.Big data analytics inhealthcare:promise and potential[J].Health Information Science and Systems,2014,2(1):3.
[52] LEEK J T,PENG R D.What is the question? Mistaking the type of question being considered is the most common error in data analysis[J].Science,2015,4(6228):1314-1315.
[53] SWAN M.The quantified self:Fundamental disruption in big data science and biological discovery[J].Big Data,2013,1(2):85-99.
[54] RUCKENSTEIN M,PANTZAR M.Beyond the quantified self:Thematic exploration of a dataistic paradigm[J].New Media & Society,2017,19(3):401-418.
[55] KHATRI V,BROWN C V.Designing data governance[J].Communications of the ACM,2010,53(1):148-152.
[56] KHATRI V,BROWN C V.Designing data governance[J].Communications of the ACM,2010,53(1):148-152.
[57] THOMAS G.The DGI data governance framework.ht-tp://www.datagovernance/the-dgi-framework.
[58] LEE S U,ZHU L,JEFFERY R.Design Choices for Data Go-vernance in Platform Ecosystems:A Contingency Model[J].ar-Xiv preprint arXiv:1706.07560,2017.
[59] CMMI Institute.Data Management Maturity (DMM)? Model.http:∥cmmiinstitute.com/data-management-maturity.
[60] LIU J,LI J,LI W,et al.Rethinking big data:A review on the data quality and usage issues[J].ISPRS Journal of Photogrammetry and Remote Sensing,2016,115:134-142.
[61] LI J Z,WANG H Z,GAO H.State-of-the-Art of Research on Big Data Usability[J].Journal of Software,2016,7(7):1605-1625.(in Chinese) 李建中,王宏志,高宏.大数据可用性的研究进展[J].软件学报,2016,27(7):1605-1625.
[62] RAHM E,DO H H.Data cleaning:Problems and current approaches[J].IEEE Data Engineering Bulletin,2000,23(4):3-13.
[63] WICKHAM H.Tidy data[J].Journal of Statistical Software,2014,59(10):1-23.
[64] LAFUENTE G.The big data security challenge[J].Network Security,2015,5(1):12-14.
[65] PERERA C,RANJAN R,WANG L,et al(1)Big data privacy in the internet of things era[J].IT Professional,2015,17(3):32-39.
[66] PATIL D,NOREN A.Building Data Science Teams:The Skills,Tools and Perspectives Behind Great Data Science Groups[M].O’Reilly,2011.
[67] BANERJEE S.Citizen Data Science for Social Good:Case Stu-dies and Vignettes from Recent Projects.https:∥www,researchgate,net/publication/283119007_Citizen_Data_Science_for_Social_Good_Case_Studies_and_Vignettes_from_Recent_Projects.
[68] PARASIE S,DAGIRAL E.Data-driven journalism and the public good:“Computer-assisted-reporters” and “programmer-journalists” in Chicago[J].New Media & Society,2013,15(6):853-871.
[69] DU D,LI A,ZHANG L.Survey on the applications of big data in Chinese real estate enterprise[J].Procedia Computer Science,2014,30:24-33.
[70] MIDDLETON S E,SHADBOLT N R ,DE ROURE D C.Ontological user profiling in recommender systems[J].ACM Tran-sactions on Information Systems (TOIS),2004,22(1):54-88.
[71] MARSHALL P,TODD B,RHODES M.Ultimate Guide toGoogle AdWords[M].Entrepreneur Press,2014.
[72] GURRIN C,SMEATON A F,DOHERTY A R.Lifelogging:Personal big data[J].Foundations and Trends in Information Retrieval,2014,8(1):1-125.
[73] RAGHUPATHI W,RAGHUPATHI V.Big data analytics inhealthcare:promise and potential[J].Health Information Science and Systems,2014,2(1):3.
[74] MARX V,Biology:The big challenges of big data[J].Nature,2013,498(7453):255-260.
[75] BELLO-ORGAZ G,JUNG J J,CAMACHO D.Social big data:Recent achievements and new challenges[J].Information Fusion,2016,28:45-59.
[76] MOHANTY S,JAGADEESH M,SRIVATSA H.Big data imperatives:Enterprise ‘Big Data’warehouse,‘BI’implementations and analytics[M].Apress,2013.
[77] BERTOT J C,GORHAM U,JAEGER P T,et al.Big data,open government and e-government:Issues,policies and recommendations[J].Information Polity,2014,19(1/2):5-16.
[78] AGGARWAL A.Opportunities and Challenges of Big Data inPublic Sector[M]∥Managing Big Data Integration in the Public Sector.2015:289-301.
[79] MATT T.Big Data Landscape 2016 v18 FINAL.(2016-4-28).http:∥mattturck.com/big-data-landscape-2016-v18-final.
[80] KAISLER S,ARMOUR F,ESPINOSA J A,et al.Big data:Issues and challenges moving forward[C]∥2013 46th Hawaii International Conference on System Sciences (HICSS).IEEE,2013:995-1004.
[81] AL-JARRAH,OMAR Y,et al.Efficient machine learning forbig data:A review[J].Big Data Research,2015,2(3):87-93.
[82] BATRA S.Big data analytics and its reflections on DIKW hie-rarchy[J].Review of Management,2014,4(1/2):5.
[83] DONHOST M J,ANFARA J V A.Data-driven decision making[J].Middle School Journal,2010,42(2):56-63.
[84] CHEN C L P,ZHANG C Y.Data-intensive applications,challenges,techniques and technologies:A survey on Big Data[J].Information Sciences,2014,275:314-347.
[85] VOULGARIZ Z,MAGOULAS G D.Extensions of the k nearest neighbour methods for classification problems[C]∥Proc.of the 26th IASTED International Conference on Artificial Intelligence and Applications (AIA).Innsbruck,Austria,2008,13:23-28.
[86] Datawocky.More data usually beats better algorithms.(2008-03-24).http:∥anand.typepad.com/datawocky/2008/03/more-data-usu al(1)html.
[87] KLEPPMANN,MATRIN.Designing Data-Intensive Applica-tions:The Big Ideas Behind Reliable,Scalable,and Maintainable Systems[M].O’Reilly Media,Inc.,2017.
[88] BREWER E.Parallelism in the Cloud.[2013-06-24].https:∥www.usenix.org/sites/default/files/conference/protected-files/brewer_hotpar13_slides.pdf.
[89] MCAFEE A,BRYNJOLFSSON E,DAVENPORT T H.Big data:the management revolution[J].Harvard Business Review,2012,0(10):60-68.
[90] FAN J Q,HAN F,LIU H.Challenges of big data analysis[J].National Science Review,2014(1/2):293-314.
[91] EDGAR,ROBERT C.MUSCLE:a multiple sequence alignment method with reduced time and space complexity[J].BMC Bioinformatics,2004,5(1):113.
[92] GINSBERG J,MOHEBBI M H,PATEL R S,et al(1)Detecting influenza epidemics using search engine query data[J].Nature,2009,457(7232):1012-1014.
[93] LAZER D,KENNEDY R,KING G,et al.The Parable of Google Flu:Traps in Big Data Analysis[J].Science,2014,343(6176):1203-1205.
[94] HEY T.The fourth paradigm:data-intensive scientific discovery[J].Proceedings of the IEEE,2011,9(8):1334-1337.
[95] PROVOST F,FAWCETT T.Data science and its relationshipto big data and data-driven decision making[J].Big Data,2013,1(1):51-59.
[96] DHAR V,CHOU D.A comparison of nonlinear models for financial prediction[J].IEEE Transactions on Neural Networks,2001,12(4):907-921.
[97] FLLESDAL,DAGFINN.Hermeneutics and the hypothetico-deductive method[J].Dialectica,1979,33(3/4):319-336.
[98] BLUMER A,EHRENFEUCHT A,HAUSSLER D,et al(1)Occam’s razor[J].Information Processing Letters,1987,24(6):377-380.
[99] LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[100] LIU Z H,HAMMERSCHMIDT B,MCMAHON D.JSON data management:supporting schema-less development in RDBMS[C]∥Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data.ACM,2014:1247-1258.
[101] BREWER E.CAP twelve years later:How the “rules” have changed[J].Computer,2012,45(2):23-29.
[102] ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al.Spark:Cluster computing with working sets[J].HotCloud,2010,10(10):95.
[103] PLUNKETT,TOM,et al(1)Oracle Big Data Handbook[M].McGraw-Hill Osborne Media,2013.
[104] PATIL D J.Data Jujitsu:the art of turning data into product[M].O’Reilly Media,Inc.,2012.
[105] LEADBEATER C,MILLER P.The Pro-Am revolution:Howenthusiasts are changing our society and economy[M].Demos,2004.
[106] CONWAY D.Data Science in the US Intelligence Community[J].IQT Quarterly,2011,2(4):24-27.
[107] ANDERSON P,MCGUFFEE J,UMINSKY D.Data science as an undergraduate degree[C]∥Proceedings of the 45th ACM Technical Symposium on Computer Science Education.ACM,2014:705-706.
[108] MARSHALL L,ELOFF J H P.Towards an InterdisciplinaryMaster’s Degree Programme in Big Data and Data Science:A South African Perspective[C]∥Annual Conference of the Southern African Computer Lecturers’ Association.Springer International Publishing,2016:131-139.
[109] SUGIMOTO C R,EKBIA H R,MATTIOLI M .The Data Gold Rush in Higher Education[M∥.Big Data Is Not a Monolith.MIT Press,2016:129.
[110] ANDERSON P,BOWRING J,MCCAULEY R,et al.An undergraduate degree in data science:curriculum and a decade of implementation experience[C]∥Proceedings of the 45th ACM Technical Symposium on Computer Science Education.ACM,2014:145-150.
[111] MUENSTERER O J,LACHER M,ZOELLER C,et al.Google Glass in pediatric surgery:an exploratory study [J].Internatio-nal Journal of Surgery,2014,12(4):281-289.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!