Computer Science ›› 2017, Vol. 44 ›› Issue (1): 277-282.doi: 10.11896/j.issn.1002-137X.2017.01.051

Previous Articles     Next Articles

Research and Application of Social Network Data Acquisition Technology

XU Yan-fei, LIU Yuan and WU Wen-peng   

  • Online:2018-11-13 Published:2018-11-13

Abstract: With the rapid development of social networks,the study on it is also gradually deepening.Obviously,the acquisition of basic data of social networks has very important significance to the study.In this paper,aiming at the exis-ting data acquisition programs,according to the Sina authorization standards and the latest microblog encryption,the paper studied two kinds of acquisition programs.One obtains data through the API interface after the OAuth2.0 certification,and another crawls data through the Web crawler after being simulated by the RSA2 encryption.At the same time,it also studied the acquisition of the data by using the appropriate acquisition rules for the microblog.Three kinds of data acquisition programs are able to collect the data effectively and they have their own characteristics.According to the requirements of data acquisition,the fusion of different acquisition programs were proposed in this paper.Through the experimental study,the fusion strategy can quickly and efficiently obtain vast amount of data.

Key words: Python,Microblog API,Simulated login,Web crawler,Collector,Fusion strategy

[1] WANG Yuan-zhuo,JIN Xiao-long,CHENG Xue-qi.Network BigData:Present and Future[J].Chinese Journal of Computers,2013,36(6):1125-1138.(in Chinese) 王元卓,靳小龙,程学旗.网络大数据:现状与展望[J].计算机学报,2013,36(6):1125-1138.
[2] 中国互联网信息中心.第35次中国互联网络发展状况报告[EB/OL].[2015-02-03].http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201502/t20150203_51634.htm.
[3] 新浪科技.2015年第一季度财务报告[EB/OL].[2015-05-15].http://tech.sina.com.cn/i/2015-05-15/doc-iavxeafs7518570.shtml.
[4] STAM K M,CAMERON G T,STAM A,et al.Stam Sociometric Attractiveness on Facebook[J].Proceedings of the International Conference on Information Manag,2014,6(6):180-188.
[5] ALLOWAY T P,ALLOWAY R G.The impact of engagement with social networking sites (SNSs) on cognitive skills[J].Computers in Human Behavior,2012,28(5):1748-1754.
[6] DING Zhao-Yun,JIA Yan,ZHOU Bin.Survey of Data Minging for Microblogs[J].Journal of Computer Research and Development,2014,1(4):691-706.(in Chinese) 丁兆云,贾焰,周斌.微博数据挖掘研究综述[J].计算机研究与发展,2014,1(4):691-706.
[7] LI D,NIU J,QIU M,et al.Sentiment analysis on Weibo data[C]∥2014 IEEE Computing,Communications and IT Applications Conference (ComComAp).IEEE,2014:249-254.
[8] LIAN Jie,ZHOU Xin,LIU Yun.SINA microblog data retrieval[J].J T sing hua Univ(Sci & Tech),2011,1(10):1300-1305.(in Chinese) 廉捷,周欣,刘云.新浪微博数据挖掘方案[J].清华大学学报(自然科学版),2011,1(10):1300-1305.
[9] HUANG Yan-wei,LIU Jia-yong.Study on Sinamicroblog DataAcquisition Technology[J].Information Security and Communication Security,2013(6):71-73.(in Chinese) 黄延炜,刘嘉勇.新浪微博数据获取技术研究[J].信息安全与通信保密,2013(6):71-73.
[10] YAO Ke.Open API:Sina micro Bo way?[J].Internet World,2010(8):71-72.(in Chinese) 姚科.开放API:新浪微博必经之路?[J].互联网天地,2010(8):71-72.
[11] LI X,XIE Y,LI C,et al.Analyzing the public events’ influence via open microblogging APIs[C]∥2012 International Confe-rence on Machine Learning and Cybernetics (ICMLC).IEEE,2012:84-90.
[12] SUN Xiao,YE Jia-qi,TANG Chen-yi,et al.Method of Sina microblogging big data grabbing based on multi-strategy and its application[J].Journal of Hefei University of Technology,2014,37(10):1210-1215.(in Chinese) 孙晓,叶嘉麒,唐陈意,等.基于多策略的新浪微博大数据抓取及应用[J].合肥工业大学学报(自然科学版),2014,37(10):1210-1215.
[13] YAO Feng.Improvement of Base64 Encoding/Decoding Algorithm in Java[J].Computer Applications and Software,2008,5(12):164-165.(in Chinese) 姚峰.Java平台中Base64编码/解码算法的改进[J].计算机应用与软件,2008,5(12):164-165.
[14] SUN Qing-yun,WANG Jun-feng,ZHAO Zong-qu,et al.A Microblog Data Collection Method Based on Simulated Login Technology[J].Computer Technology and Development,2014,24(3):6-10.(in Chinese) 孙青云,王俊峰,赵宗渠,等.一种基于模拟登录的微博数据采集方案[J].计算机技与发展,2014,24(3):6-10.
[15] DANGRE A,WANKHEDE V,AKRE P,et al.Design and Implementation of Web Crawler[J].International Journal of Computer Science & Information Technolo,2014,5(1):921-922.
[16] SHEN D,WANG H,CAO J,et al.The Design and Implement of High Efficient Incremental Microblogging Crawler[C]∥2012 Fourth International Conference on Multimedia Information Networking and Security (MINES).IEEE,2012:537-540.
[17] VASILE A I,PAVALOIU B,CRISTEA P D.Building a specia-lized high performance web crawler[C]∥2013 20th International Conference on System,Signals and Image Processing (IWSSIP).IEEE,2013:183-186.
[18] WANG Ye.The design and implementation of the theme craw-ler based on the breadth first[D].Shanghai:Fudan University,2011.(in Chinese) 王桦.基于广度优先的主题爬虫的设计与实现[D].上海:复旦大学,2011.
[19] LIAN Jie.Research on social network data mining based on user characteristics [D].Beijing:Beijing Jiaotong University,2013.(in Chinese) 廉捷.基于用户特征的社交网络数据挖掘研究[D].北京:北京交通大学,2013.
[20] LIU J,CAO Z,CUI K,et al.Identifying Important Users in Sina Microblog[C]∥2012 Fourth International Conference on Multimedia Information Networking and Security (MINES).IEEE,2012:839-842.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .