计算机科学 ›› 2019, Vol. 46 ›› Issue (8): 42-49.doi: 10.11896/j.issn.1002-137X.2019.08.007

• 大数据与数据科学* • 上一篇    下一篇

基于地理标签的推文话题时空演变的可视分析方法

孙国道, 周志秀, 李思, 刘义鹏, 梁荣华   

  1. (浙江工业大学信息工程学院 杭州310023)
  • 收稿日期:2018-11-26 出版日期:2019-08-15 发布日期:2019-08-15
  • 通讯作者: 梁荣华(1974-),男,博士,教授,博士生导师,CCF高级会员,主要研究方向为信息可视化和计算机图形图像处理,E-mail:rhliang@zjut.edu.cn
  • 作者简介:孙国道(1988-),男,博士,讲师,CCF会员,主要研究方向为信息可视化,E-mail:guodao@zjut.edu.cn;周志秀(1994-),女,硕士生,主要研究方向为信息可视化;李思(1992-),男,硕士生,主要研究方向为信息可视化;刘义鹏(1987-),男,博士,讲师,CCF会员,主要研究方向为医学成像和图像分析
  • 基金资助:
    国家自然科学基金(61602409)

Spatio-Temporal Evolution of Geographical Topics

SUN Guo-dao, ZHOU Zhi-xiu, LI Si, LIU Yi-peng, LIANG Rong-hua   

  1. (College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China)
  • Received:2018-11-26 Online:2019-08-15 Published:2019-08-15

摘要: 社交媒体中,用户所发布的推文内容记录了与用户相关的各种信息。文字信息中涵盖了推文中包含的各种话题,以及时间和空间信息,从这些信息中分析出话题的时空演变情况具有十分重要的研究意义。针对推文数据,设计了一套可视分析流程来挖掘推文信息,通过用户交互的方式多角度地展示了推文话题的时空演变过程。首先,基于部分历史推文数据,通过DBSCAN(Density-Based Spatial Clustering of Applications with Noise)聚类算法,结合泰森多边形对全球地理空间进行区域划分;然后,针对用户查询搜索的兴趣话题,索引找到所有相关的推文内容,并将信息与聚类中心绑定;最后,通过设计的多个结合时序聚类算法和自适应算法的可视化视图来展示话题的时空演变过程。通过推特官网提供的API抓取存储的推文数据,并进行实验和分析,结果表明:改进的可视化视图自适应布局算法有效地解决了图形遮挡问题,完整展现了推文的时空演变模式;地理区域的划分以及可视化组件能够有效帮助研究人员分析推文的时空演变以及全球关注的热点话题分布。

关键词: 聚类, 可视化分析流程, 时空演变, 推文话题, 自适应布局算法

Abstract: The tweets posted by users in social media record a wide variety of user information.The text information includes various topics contained in the tweet.It is very important to analyze the temporal and spatial evolution of topics from these messages.Based on the tweet data,this paper designed a set of visual analysis process to mine the tweet information and display the spatiotemporal evolution process of the tweet topic through user interaction.Specifically,based on the partial historical tweet data,the global geographic space is divided by the DBSCAN clustering algorithm combined with the Tyson polygon.For the user to query the search topic of interest,the index finds all relevant tweet content and binds the information to the cluster center.Finally,the temporal and spatial evolution of the topic is demonstrated by the design of multiple combined time series clustering algorithms and visualization components of the adaptive algorithm.Through the experiment and analysis of the tweet data stored in the API provided by Twitter official website,the improved visual view adaptive layout algorithm effectively solves the problem of graphic occlusion and fully displays the temporal and spatial evolution mode of the tweet.The division of geographic regions and visualization components can effectively help researchers analyze the temporal and spatial evolution of tweets,as well as the distribution of hot topics of global concern

Key words: Adaptive layout algorithm, Clustering, Spatio-Temporal evolution, Tweet topic, Visual analysis process

中图分类号: 

  • TP391
[1]DWYER N,MARSH S.What can the hashtag #trust tell us about how users conceptualise trust?[C]∥Twelfth InternationalConference on Privacy,Security and Trust.New York:IEEE Press,2014:398-402.
[2]ZAPPAVIGNA M.Discourse of Twitter and social media:How we use language to create affiliation on the web[M].A&C Black,2012.
[3]IVANOVA M.Understanding microblogging hashtags for learning enhancement[J].Form,2013,11(74):17-23.
[4]HUBERMAN B A,ROMERO D M,WU F.Social networks that matter:Twitter under the microscope[J].arXiv:0812.1045,2008.
[5]KWAK H,LEE C,PARK H,et al.What is Twitter,a social network or a news media?[C]∥Proceedings of the 19th International Conference on World Wide Web.New York:ACM Press,2010:591-600.
[6]YANG J,LESKOVEC J.Modeling information diffusion in implicit networks[C]∥IEEE 10th International Conference on Data Mining (ICDM),2010.New York:IEEE Press,2010:599-608.
[7]LERMAN K,GHOSH R.Information contagion:An empirical study of the spread of news on Digg and Twitter social networks[C]∥Proceedings of 4th International Conference on Weblogs and Social Media (ICWSM ).Menlo Park,CA:AAAI Press,2010:90-97.
[8]ROMERO D M,MEEDER B,KLEINBERG J.Differences in the mechanics of information diffusion across topics:idioms,political hashtags,and complex contagion on twitter[C]∥Proceedings of the 20th International Conference on World Wide Web.New York:ACM Press,2011:695-704.
[9]CUNHA E,MAGNO G,COMARELA G,et al.Analyzing the Dynamic Evolution of Hashtags on Twitter:a Language-Based Approach∥Workshop on Languages in Social Media.Association for Computational Linguistics,2011.
[10]MACEACHREN A M,JAISWAL A,ROBINSON A C,et al. Senseplace2:Geotwitter analytics support for situational awareness[C]∥2011 IEEE Conference on Visual Analytics Science and Technology (VAST).New York:IEEE Press,2011:181-190.
[11]KONG S,MEI Q,FENG L,et al.Predicting bursts and popularity of hashtags in real-time[C]∥Proceedings of the 37th international ACM SIGIR Conference on Research & Development in Information Retrieval.New York:ACM Press,2014:927-930.
[12]MA Z,SUN A,CONG G.Will this# hashtag be popular tomorrow?[C]∥Proceedings of the 35th international ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM Press,2012:1173-1174.
[13]TSUR O,RAPPOPORT A.What’s in a hashtag?:content based prediction of the spread of ideas in microblogging communities[C]∥Proceedings of the Fifth ACM International Conference on Web Search and Data Mining.New York:ACM Press,2012:643-652.
[14]CHO I,WESSLEN R,VOLKOVA S,et al.CrystalBall:A Visual Analytic System for Future Event Discovery and Analysis from Social Media Data[C]∥IEEE Conference on Visual Analytics Science and Technology (VAST).New York:IEEE Press,2017:25-35.
[15]KAMATH K Y,CAVERLEE J,CHENG Z,et al.Spatial in- fluence vs.community influence:modeling the global spread of social media[C]∥Proceedings of the 21st ACM International Conference on Information and Knowledge Management.New York:ACM Press,2012:962-971.
[16]HE J,CHEN C.Spatiotemporal Analytics of Topic Trajectory[C]∥Proceedings of the 9th International Symposium on Visual Information Communication and Interaction.New York:ACM Press,2016:112-116.
[17]KAMATH K Y,CAVERLEE J.Spatio-temporal meme prediction:learning what hashtags will be popular where[C]∥Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management.New York:ACM Press,2013:1341-1350.
[18]HONG L,AHMED A,GURUMURTHY S,et al.Discovering geographical topics in the twitter stream[C]∥Proceedings of the 21st International Conference on World Wide Web.New York:ACM Press,2012:769-778.
[19]LU Y,WANG H,LANDIS S,et al.A visual analytics framework for identifying topic drivers in media events[J].IEEE Transactions on Visualization and Computer Graphics,2018,24(9):2501-2515.
[20]EL-ASSADY M,SPERRLE F,DEUSSEN O,et al.Visual Analytics for Topic Model Optimization based on User-Steerable Speculative Execution[J].IEEE Transactions on Visualization and Computer Graphics,2019,25(4):1-20.
[21]WU Y,CHEN Z,SUN G,et al.Streamexplorer:a multi-stage system for visually exploring events in social streams[J].IEEE Transactions on Visualization and Computer Graphics,2018,24(10):2758-2772.
[22]ANDRIENKO G,ANDRIENKO N,FUCHS G,et al.Revealing patterns and trends of mass mobility through spatial and temporal abstraction of origin-destination movement data[J].IEEE Transactions on Visualization & Computer Graphics,IEEE,2017(1):1.
[23]MARCUS A,BERNSTEIN M S,BADAR O,et al.Twitinfo:aggregating and visualizing microblogs for event exploration[C]∥Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.New York:ACM Press,2011:227-236.
[24]CAO N,LIN Y-R,SUN X,et al.Whisper:Tracing the spatiotemporal process of information diffusion in real time[J].IEEE Transactions on Visualization and Computer Graphics,2012,18(12):2649-2658.
[1] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[3] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于DBSCAN聚类的集群联邦学习方法
Clustered Federated Learning Methods Based on DBSCAN Clustering
计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[4] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[5] 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞.
基于密度敏感距离和模糊划分的改进FCM算法
FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition
计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042
[6] 陈景年.
一种适于多分类问题的支持向量机加速方法
Acceleration of SVM for Multi-class Classification
计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[7] 刘丽, 李仁发.
医疗CPS协作网络控制策略优化
Control Strategy Optimization of Medical CPS Cooperative Network
计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230
[8] 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳.
三维城市场景中的小物体检测
Small Object Detection in 3D Urban Scenes
计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174
[9] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[10] 朱哲清, 耿海军, 钱宇华.
面向化学结构的线段聚类算法
Line-Segment Clustering Algorithm for Chemical Structure
计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131
[11] 张宇姣, 黄锐, 张福泉, 隋栋, 张虎.
基于菌群优化的近邻传播聚类算法研究
Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization
计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218
[12] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[13] 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞.
基于节点相似性和网络嵌入的复杂网络社区发现算法
Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding
计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009
[14] 韩洁, 陈俊芬, 李艳, 湛泽聪.
基于自注意力的自监督深度聚类算法
Self-supervised Deep Clustering Algorithm Based on Self-attention
计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[15] 蒲实, 赵卫东.
一种面向动态科研网络的社区检测算法
Community Detection Algorithm for Dynamic Academic Network
计算机科学, 2022, 49(1): 89-94. https://doi.org/10.11896/jsjkx.210100023
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!