计算机科学 ›› 2018, Vol. 45 ›› Issue (9): 113-118.doi: 10.11896/j.issn.1002-137X.2018.09.017

• 第十六届全国软件与应用学术会议 • 上一篇    下一篇

大数据驱动的开发者社区中知识交流网络的分析

达一菲1, 刘旭东2, 孙海龙2   

  1. 北京航空航天大学中法工程师学院 北京1000831
    北京航空航天大学计算机学院 北京1000832
  • 收稿日期:2017-10-03 出版日期:2018-09-20 发布日期:2018-10-10
  • 通讯作者: 孙海龙男,博士,副教授,主要研究领域为软件开发、分布式计算等,E-mail:sunhl@act.buaa.edu.cn
  • 作者简介:达一菲 女,硕士生,主要研究领域为软件开发和数据挖掘;刘旭东 男,主要研究领域为软件开发方法及工具、软件中间件技术等
  • 基金资助:
    本文受国家重点研发计划课题(2016YFB1000804)资助。

Big Data Driven Analysis of Knowledge Exchange Network in Developer Community

DA Yi-fei1, LIU Xu-dong2, SUN Hai-long2   

  1. Sino-French Engineer School,Beihang University,Beijing 100083,China1
    School of Computer Science and Engineering,Beihang University,Beijing 100083,China2
  • Received:2017-10-03 Online:2018-09-20 Published:2018-10-10

摘要: 开发者社区一般包括博客、问答和论坛等多个版块,这些版块共同构成了用户贡献和交流软件开发知识的平台。文中以CSDN为研究对象,通过分析CSDN平台上积累的大数据构建了开发者知识交流网络,并基于复杂网络理论对其进行了分析,发现多版块的知识交流网络具有小世界、无标度等复杂网络特性。基于知识交流网络进一步分析了知识贡献者的分布情况,发现多版块用户中有较多的知识贡献者,其在知识交流网中起着比较重要的作用。

关键词: 大数据, 复杂网络, 开发者知识交流, 数据分析

Abstract: Developer community generally has various sections such as blog,QA,bbs,etc.These sections together constitute a platform where users contribute and communicate software development knowledge.This paper concentrated and analyzed the big data accumulated in CSDN and constructed knowledge exchange networks.Analysis of the networks based on complex network method indicates the small-world effect and scale-free property of multi-section knowledge exchange network.Further study shows that there exist relatively more knowledge distributors in multi-section users and they play a relative more important role in the knowledge exchange network.

Key words: Big data, Complex network, Data analysis, Developer knowledge exchange

中图分类号: 

  • TP391
[1]BANSAL T,DAS M,BHATTACHARYYA C.Content driven user profiling for comment-worthy recommendations of news and blog articles[C]∥Proceedings of the 9th ACM Conference on Recommender Systems.ACM,2015:195-202.
[2]SAHA R K,SAHA A K,PERRY D E.Toward understanding the causes of unanswered questions in software information sites:a case study of stack overflow[C]∥Proceedings of the 2013 9th Joint Meeting on Foundations of Software Enginee-ring.ACM,2013:663-666.
[3]XU B,YE D,XING Z,et al.Predicting semantically linkable
knowledge in developer online forums via convolutional neural network[C]∥Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering.ACM,2016:51-62.
[4]JURCZYK P,AGICHTEIN E.Discovering authorities in question answer communities by using link analysis[C]∥Procee-dings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management.ACM,2007:919-922.
[5]PENG H B,WANG J.Topology of the Knowledge Communication Network in Virtual Communities:Based on CSDN[J].New Technology of Library and Information Service,2009,25(4):44-49.(in Chinese)
彭红彬,王军.虚拟社区中知识交流的特点分析——基于 CSDN 技术论坛的实证研究[J].现代图书情报技术,2009,25(4):44-49.
[6]LIN H F,WANG J,XIONG D P,et al.Category participation-based approach to find experts for community question answer services[J].Computer Engineering & Design,2014,35(1):333-338.(in Chinese)
林鸿飞,王健,熊大平,等.基于类别参与度的社区问答专家发现方法[J].计算机工程与设计,2014,35(1):333-338.
[7]WATTS D J,STROGATZ S H.Collective dynamics of ‘small-world’ networks[J].Nature,1998,393(6684):440.
[8]ALBERT R,BARABÁSI A L.Statistical Mechanics of Complex Networks[J/OL].Reviews of Modern Physics,2002,74:47 -97.
[9]CHEN L,NAYAK R.Expertise analysis in a question answer
portal for author ranking[C]∥Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology-Volume 01.IEEE Computer Socie-ty,2008:134-140.
[10]KAO W C,LIU D R,WANG S W.Expert finding in question-answering websites:a novel hybrid approach[C]∥Proceedings of the 2010 ACM Symposium on Applied Computing.ACM,2010:867-871.
[11]BARABÁSI A L, ALBERT R.Emergence of scaling in random networks[J].Science,1999,286(5439):509-512.
[12]何大韧,刘宗华,汪秉宏.复杂系统与复杂网络[M].北京:高等教育出版社,2009:111.
[1] 郑文萍, 刘美麟, 杨贵.
一种基于节点稳定性和邻域相似性的社区发现算法
Community Detection Algorithm Based on Node Stability and Neighbor Similarity
计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146
[2] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[3] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[4] 杨波, 李远彪.
数据科学与大数据技术课程体系的复杂网络分析
Complex Network Analysis on Curriculum System of Data Science and Big Data Technology
计算机科学, 2022, 49(6A): 680-685. https://doi.org/10.11896/jsjkx.210800123
[5] 何茜, 贺可太, 王金山, 林绅文, 杨菁林, 冯玉超.
比特币实体交易模式分析
Analysis of Bitcoin Entity Transaction Patterns
计算机科学, 2022, 49(6A): 502-507. https://doi.org/10.11896/jsjkx.210600178
[6] 王本钰, 顾益军, 彭舒凡, 郑棣文.
融合动态距离和随机竞争学习的社区发现算法
Community Detection Algorithm Based on Dynamic Distance and Stochastic Competitive Learning
计算机科学, 2022, 49(5): 170-178. https://doi.org/10.11896/jsjkx.210300206
[7] 孙轩, 王焕骁.
政务大数据安全防护能力建设:基于技术和管理视角的探讨
Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives
计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010
[8] 丛颖男, 王兆毓, 朱金清.
关于法律人工智能数据和算法问题的若干思考
Insights into Dataset and Algorithm Related Problems in Artificial Intelligence for Law
计算机科学, 2022, 49(4): 74-79. https://doi.org/10.11896/jsjkx.210900191
[9] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[10] 陈世聪, 袁得嵛, 黄淑华, 杨明.
基于结构深度网络嵌入模型的节点标签分类算法
Node Label Classification Algorithm Based on Structural Depth Network Embedding Model
计算机科学, 2022, 49(3): 105-112. https://doi.org/10.11896/jsjkx.201000177
[11] 赵学磊, 季新生, 刘树新, 李英乐, 李海涛.
基于路径连接强度的有向网络链路预测方法
Link Prediction Method for Directed Networks Based on Path Connection Strength
计算机科学, 2022, 49(2): 216-222. https://doi.org/10.11896/jsjkx.210100107
[12] 江昊琛, 魏子麒, 刘璘, 陈俊.
非均衡数据分类经典方法综述与面向医疗领域的实验分析
Imbalanced Data Classification:A Survey and Experiments in Medical Domain
计算机科学, 2022, 49(1): 80-88. https://doi.org/10.11896/jsjkx.210200124
[13] 李家文, 郭炳晖, 杨小博, 郑志明.
基于信息传播的致病基因识别研究
Disease Genes Recognition Based on Information Propagation
计算机科学, 2022, 49(1): 264-270. https://doi.org/10.11896/jsjkx.201100129
[14] 王俊, 王修来, 庞威, 赵鸿飞.
面向科技前瞻预测的大数据治理研究
Research on Big Data Governance for Science and Technology Forecast
计算机科学, 2021, 48(9): 36-42. https://doi.org/10.11896/jsjkx.210500207
[15] 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳.
面向大数据分析的智能交互向导系统
Smart Interactive Guide System for Big Data Analytics
计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!