计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 316-321.doi: 10.11896/jsjkx.180901624

• 交叉与前沿 • 上一篇    下一篇

基于LBSN用户生成短文本的细粒度位置推测技术

邓尧1, 冀汶莉1, 李勇军2, 高兴1   

  1. (西安科技大学通信与信息工程学院 西安710054)1
    (西北工业大学计算机学院 西安710072)2
  • 收稿日期:2018-09-03 修回日期:2019-01-06 出版日期:2019-10-15 发布日期:2019-10-21
  • 通讯作者: 李勇军(1973-),男,博士,副教授,主要研究方向为社交计算与社会网络,E-mail:lyj@nwpu.edu.cn。
  • 作者简介:邓尧(1993-),男,硕士生,主要研究方向为自然语言处理;冀汶莉(1973-),女,硕士,副教授,主要研究方向为数据分析与挖掘;高兴(1994-),女,硕士生,主要研究方向为社交计算与社会网络。
  • 基金资助:
    本文受陕西省自然科学基础研究计划(2018JM6063)资助。

Fine-grained Geolocalisation of User Generated Short Text Based on LBSN

DENG Yao1, JI Wen-li1, LI Yong-jun2, GAO Xing1   

  1. (School of Communication and Information Engineering,Xi ’an University of Science and Technology,Xi’an 710054,China)1
    (School of Computer,Northwestern Polytechnical University,Xi’an 710072,China)2
  • Received:2018-09-03 Revised:2019-01-06 Online:2019-10-15 Published:2019-10-21

摘要: 利用用户生成短文本(User Generated Short Text,UGST)推测用户的细粒度位置对基于位置服务的应用有重要的意义。现有的细粒度位置推测方法较少引入UGST中的语义信息,且未考虑UGST中语义实体的权重,因此性能较低。针对这些问题,提出了一种基于位置社交网络(Location-based Social Network,LBSN)的UGST细粒度位置推测方法。该方法包括如下3个过程:1)使用Foursquare中的UGST构建实体和位置之间的关联模型,以解决位置标记稀疏问题;2)判断待推测位置的UGST中是否含有位置信息,过滤不包含任何位置语义信息的UGST,以消除噪声短文本的干扰;3)根据UGST内容推测可能的候选位置,并对每个候选位置进行排名,选择排名最靠前的位置作为推测位置。实验结果验证了所提方法的有效性。

关键词: LBSN, 地理定位, 短文本, 位置推测, 细粒度

Abstract: It is significant to use user generated short text (UGST) to estimate user’s fine-grained location.Most exis-ting methods rarely introduce the semantic information about the location in UGST,and do not prioritize the entities according to their importance,thus leading to the decrease of performance.A fine-grained geolocalisation of user-generated short text based on location-based social network (LBSN) was proposed to solve these problems.The proposed algorithm consists of three key components.1) UGST of Foursquare is used to build the tight coupling between entity and location,which can address the location-annotated sparseness problem.2) UGST is filtered out if it does not contain any location-specific entities,which allows us to eliminate the interference of noisy UGSTs.3) The candidate locations for each remaining UGST are ranked based only on its textual data,and the top-ranked location is selected for UGST.The experimental results show the effectiveness of the proposed method.

Key words: Fine-grained, Geolocalisation, LBSN, Position estimation, Short text

中图分类号: 

  • TP311
[1]ATEFEH F,KHREICH W.A Survey of Techniques for Event Detection in Twitter [J].Computational Intelligence,2013,31(1):132-164.
[2]OZDIKIS O,OĞUZTÜZÜN H,KARAGOZ P.A Survey on Location Estimation Techniques for Events Detected in Twitter [J].Knowledge and Information Systems,2017,52(2):291-339.
[3]NOULAS A,MOFFATT C,HRISTOVA D,et al.Foursquare to the Rescue:Predicting Ambulance Calls Across Geographies[C]//Proceedings of the 2018 International Conference on Di-gital Health.New York:ACM,2018:100-109.
[4]ACHREKAR H,GANDHE A,LAZARUS R,et al.Predicting Flu Trends Using Twitter Data[C]//2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).New York:IEEE,2011:702-707.
[5]MCCREADIE R,MACDONALD C,OUNIS I.EAIMS:Emer-gency Analysis Identification and Management System[C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2016:1101-1104.
[6]BAUCOM E,SANJARI A,LIU X Z,et al.Mirroring the Real World in Social Media:Twitter,Geolocation,and Sentiment Analysis[C]//Proceedings of the 2013 International Workshop on Mining Unstructured Big Data Using Natural Language Processing.New York:ACM,2013:61-68.
[7]CHONG W H,LIM E P.Tweet Geolocation:Leveraging Location,User and Peer Signals[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.New York:ACM,2017:1279-1288.
[8]LI Y J,ZHANG Z,PENG Y,et al.Matching User Accounts Based on User Generated Content across Social Networks [J].Future Generation Computer Systems,2018,83(1):104-115.
[9]GRAHAM M,HALE S A,GAFFNEY D.Where in the World Are You? Geolocation and Language Identification in Twitter [J].The Professional Geographer,2014,66(4):568-578.
[10]LEE K,GANTI R K,SRIVATSA M,et al.When Twitter Meets Foursquare:Tweet Location Prediction Using Foursquare[C]//Proceedings of the 11th International Conference on Mobile and Ubiquitous Systems:Computing,Networking and Services.Brussels:ICST,2014:198-207.
[11]MURDOCK V.Your Mileage May Vary:On the Limits of Social Media [J].SIGSPATIAL Special,2011,3(2):62-66.
[12]HAN B,COOK P,BALDWIN T.Text-Based Twitter User Geo-location Prediction [J].Journal of Artificial Intelligence Research,2014,49(1):451-500.
[13]EBRAHIMI M,SHAFIEIBAVANI E,WONG R,et al.Twitter User Geolocation by Filtering of Highly Mentioned Users [J].Journal of the Association for Information Science and Techno-logy,2018,69(7):879-889.
[14]HUANG B X,CARLEY K M.On Predicting Geolocation of Tweets Using Convolutional Neural Networks[C]//International Conference on Social Computing,Behavioral-Cultural Modeling and Prediction and Behavior Representation in Mo-deling and Simulation.Berlin:Springer,2017:281-291.
[15]KINSELLA S,MURDOCK V,O’HARE N.I’m Eating a Sandwich in Glasgow:Modeling Locations with Tweets[C]//Proceedings of the 3rd International Workshop on Search and Mi-ning User-generated Contents.New York:ACM,2011:61-68.
[16]PARASKEVOPOULOS P,PALPANAS T.Fine-Grained Geolocalisation of Non-Geotagged Tweets[C]//Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015.New York:ACM,2015:105-112.
[17]PAULE J D G,MOSHFEGHI Y,JOSE J M,et al.On Fine-Grained Geolocalisation of Tweets[C]//Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval.New York:ACM,2017:313-316.
[18]MANNING C D,SURDEANU M,BAUER J,et al.The Stanford CoreNLP Natural Language Processing Toolkit[C]//Meeting of the Association for Computational Linguistics:System Demonstrations.Stroudsburg:ACL,2014:55-60.
[19]WANG Z,WANG H,WEN J R,et al.An Inference Approach to Basic Level of Categorization[C]//Proceedings of the 24th ACM International on Conference on Information and Know-ledge Management.New York:ACM,2015:653-662.
[20]LI C,SUN A.Extracting Fine-Grained Location with Temporal Awareness in Tweets:A Two-Stage Approach [J].Journal of the Association for Information Science and Technology,2017,68(7):1652-1670.
[21]SALTON G,FOX E A,WU H.Extended Boolean Information Retrieval [J].Commun.ACM,1983,26(11):1022-1036.
[1] 吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇.
基于异质信息网的短文本特征扩充方法
Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network
计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241
[2] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[3] 邵欣欣.
TI-FastText自动商品分类算法
TI-FastText Automatic Goods Classification Algorithm
计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[4] 张文轩, 吴秦.
基于多分支注意力增强的细粒度图像分类
Fine-grained Image Classification Based on Multi-branch Attention-augmentation
计算机科学, 2022, 49(5): 105-112. https://doi.org/10.11896/jsjkx.210100108
[5] 刘硕, 王庚润, 彭建华, 李柯.
基于混合字词特征的中文短文本分类算法
Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words
计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027
[6] 李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁.
融合双重权重机制和图卷积神经网络的微博细粒度情感分类
Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network
计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073
[7] 张虎, 柏萍.
融入句子中远距离词语依赖的图卷积短文本分类方法
Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification
计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062
[8] 史伟, 付月.
考虑语境的微博短文本挖掘:情感分析的方法
Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis
计算机科学, 2021, 48(6A): 158-164. https://doi.org/10.11896/jsjkx.210200089
[9] 赵潇, 李仕林, 李凡, 余正涛, 张林华, 杨勇.
局部细粒度信息引导的双循环一致性绝缘子缺陷样本生成
Double-cycle Consistent Insulator Defect Sample Generation Method Based on Local Fine-grainedInformation Guidance
计算机科学, 2021, 48(6A): 581-586. https://doi.org/10.11896/jsjkx.200500026
[10] 鲁博仁, 胡世哲, 娄铮铮, 叶阳东.
面向铁路文本分类的字符级特征提取方法
Character-level Feature Extraction Method for Railway Text Classification
计算机科学, 2021, 48(3): 220-226. https://doi.org/10.11896/jsjkx.200200061
[11] 纪南巡, 孙晓燕, 李祯其.
多源异构用户生成内容的融合向量化表示学习
Fusion Vectorized Representation Learning of Multi-source Heterogeneous User-generated Contents
计算机科学, 2021, 48(10): 51-58. https://doi.org/10.11896/jsjkx.200900194
[12] 刘洋, 金忠.
一种结合非局部和多区域注意力机制的细粒度图像识别方法
Fine-grained Image Recognition Method Combining with Non-local and Multi-region Attention Mechanism
计算机科学, 2021, 48(1): 197-203. https://doi.org/10.11896/jsjkx.191000135
[13] 刘振鹏, 苏楠, 秦益文, 卢家欢, 李小菲.
FS-CRF:基于特征切分与级联随机森林的异常点检测模型
FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest
计算机科学, 2020, 47(8): 185-188. https://doi.org/10.11896/jsjkx.190600162
[14] 程婧, 刘娜娜, 闵可锐, 康昱, 王新, 周扬帆.
一种低频词词向量优化方法及其在短文本分类中的应用
Word Embedding Optimization for Low-frequency Words with Applications in Short-text Classification
计算机科学, 2020, 47(8): 255-260. https://doi.org/10.11896/jsjkx.191000163
[15] 倪海清, 刘丹, 史梦雨.
基于语义感知的中文短文本摘要生成模型
Chinese Short Text Summarization Generation Model Based on Semantic-aware
计算机科学, 2020, 47(6): 74-78. https://doi.org/10.11896/jsjkx.190600006
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!