计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 80-87.doi: 10.11896/jsjkx.211100014

• 基于社会计算的多学科交叉融合专题* 上一篇    下一篇

大数据驱动的社会经济地位分析研究综述

么晓明1,2, 丁世昌3, 赵涛4, 黄宏5, 罗家德6, 傅晓明1   

  1. 1 哥廷根大学计算机系 哥廷根 37077 德国;
    2 中国电信集团云计算分公司大数据事业部 北京 100033;
    3 信息工程大学网络空间安全学院 郑州 276800;
    4 国防科技大学前沿交叉学科学院 长沙 410073;
    5 华中科技大学计算机学院 武汉 4300746 清华大学社会学系 北京 100084
  • 收稿日期:2021-10-29 修回日期:2022-02-16 发布日期:2022-04-01
  • 通讯作者: 傅晓明(fu@cs.uni-goettingen.de)
  • 作者简介:(yaoxm@chinatelecom.cn)
  • 基金资助:
    欧盟水平线2020 COSAFE项目(824019); 国家重点研发计划(2020YFE0200500)

Big Data-driven Based Socioeconomic Status Analysis:A Survey

YAO Xiao-ming1,2, DING Shi-chang3, ZHAO Tao4, HUANG Hong5, LUO Jar-der6, FU Xiao-ming1   

  1. 1 Institute of Computer Science, University of Goettingen, Goettingen 37077, Germany;
    2 Cloud Branch Big Data Department, China Telecom Co.Ltd, Beijing 100033, China;
    3 School of Cyberspace Security, State Key Laboratory of Mathematical Engineering & Advanced Computing, Zhengzhou 276800, China;
    4 College of Advanced Interdisciplinary Studies, National University of Defense Technology, Changsha 410073, China;
    5 College of Computer Science and Technology, Huazhong University of Science & Technology, Wuhan 430074, China;
    6 Department of Sociology, Tsinghua University, Beijing 100084, China
  • Received:2021-10-29 Revised:2022-02-16 Published:2022-04-01
  • About author:YAO Xiao-ming, born in 1970,technical director of big data unit at the Cloud Branch,China Telecom.His main research interests include smart cities,mobile big data and data mining.FU Xiao-ming,born in 1973,Ph.D,professor,IEEE fellow,IET fellow,ACM distinguished scientist,is a member of Academia Europaea.His main research interests include networked systems,cloud computing and big data analytics.
  • Supported by:
    This work was supported by the European Union's Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie Grant Agreement(824019) and Chinese National Key R&D Program (2020YFE0200500).

摘要: 一个人的社会经济地位(Socioeconomic Status,SES)是结合经济学和社会学等因素相对于其他人的经济和社会地位的总体衡量,包含其职业、学历、收入等多维度信息。对这些信息进行综合评估可以帮助政府和相关机构制定各种政策、决策(如政府制定社会政策、企业进行广告个性化服务等),因此该研究得到了研究人员的广泛关注。随着近几年大数据技术和机器学习的发展,以数据驱动的方法来评估社会经济地位时,可以通过融合多维数据和利用各种算法来自动评估人们的社会经济地位,解决传统方法数据采集困难、成本过高的问题。文中旨在概述近年来将大数据技术应用于社会经济地位分析的相关研究进展。首先介绍社会经济地位的基本概念,并讨论大数据方法与传统方法所带来的不同挑战;然后,根据学习过程中的信息,系统性地总结各种相关方法,并详细讨论各类方法的利弊;最后,讨论目前个人社会经济地位分析存在的挑战和问题,并展望未来的相关研究方向。

关键词: 机器学习, 社会经济地位, 社交媒体, 深度学习, 数据挖掘

Abstract: Socioeconomic Status (SES), an overall measure of a person's economic and social status relative to others combining factors such as economics and sociology, has received a lot of attention from researchers, as its assessment can help relevant orga-nizations to make various policies and decisions (governmental formulation of social policies, advertising personalized services, etc).In addition, with the development of big data technology and machine learning in recent years, assessing people's socioeconomic attributes (SEAs) and further obtaining the corresponding socioeconomic status with a data-driven approach can address the issue of extremely high cost of traditional methods.Therefore, this paper summarizes the research progresses of applying big data techniques to socioeconomic status analysis in recent years.It first introduces the basic concept of socioeconomic status and discusses the challenges posed by big data methods compared to traditional methods.After that, it systematically summarizes and classifies the state-of-the-art related methods based on the information in the learning process, and present them in detail, discusses the pros and cons of each type of method.Finally, it discusses the challenges and problems of inferring people's socioeconomic status and provides an outlook on future research directions.

Key words: Data mining, Deep learning, Machine learning, Social media, Socioeconomic status

中图分类号: 

  • TP391
[1] ALETRAS N,CHAMBERLAIN B P.Predicting twitter usersocioeconomic attributes with network and language information[C]//Proceedings of the 29th ACM on Hypertext and Social Media.2018:20-24.
[2] SZOPI��SKI T S.Factors affecting the adoption of online ban-king in Poland[J].Journal of Business Research,2016,69(11):4763-4768.
[3] CHEN D,JIN D,GOH T T,et al.Context-awareness based personalized recommendation of anti-hypertension drugs[J].Journal of Medical Systems,2016,40(9):1-10.
[4] HUNG L.A personalized recommendation system based onproduct taxonomy for one-to-one marketing online[J].Expert Systems with Applications,2005,29(2):383-392.
[5] WU Y,CARNT N,STAPLETON F.Contact lens user profile,attitudes and level of compliance to lens care[J].Contact Lens and Anterior Eye,2010,33(4):183-188.
[6] SOTO V,FRIAS-MARTINEZ V,VIRSEDA J,et al.Prediction of socioeconomic levels using cell phone records[C]//International Conference on User Modeling,Adaptation,and Personalization.Berlin:Springer,2011:377-388.
[7] BLUMENSTOCK J,CADAMURO G,ON R.Predicting poverty and wealth from mobile phone metadata[J].Science,2015,350(6264):1073-1076.
[8] ALMAATOUQ A,PRIETO-CASTRILLO F,PENTLAND A.Mobile communication signatures of unemployment[C]//International Conference on Social Informatics.Cham:Springer,2016:407-418.
[9] XU Y,BELYI A,BOJIC I,et al.Human mobility and socioeconomic status:Analysis of Singapore and Boston[J].Computers,Environment and Urban Systems,2018,72:51-67.
[10] PREOTIUC-PIETRO D,LAMPOS V,ALETRAS N.An analysis of the user occupational class through Twitter content[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers).2015:1754-1764.
[11] PREOTIUC-PIETRO D,VOLKOVA S,LAMPOS V,et al.Studying user income through language,behaviour and affect in social media[J/OL].PloS One.https://doi.org/10.1371/journal.pone.0138717. [12] LAMPOS V,ALETRAS N,GEYTI J K,et al.Inferring the socioeconomic status of social media users based on behaviour and language[C]//European Conference on Information Retrieval.Cham:Springer,2016:689-695.
[13] WANG P,GUO J,LAN Y,et al.Your cart tells you:Inferringdemographic attributes from purchase data[C]//Proceedings of the ninth ACM International Conference on Web Search and Data Mining.2016:173-182.
[14] OYAMADA M,NAKADAI S.Relational mixture of experts:Explainable demographics prediction with behavioral data[C]//International Conference on Data Mining (ICDM).IEEE,2017:357-366.
[15] DING S,HUANG H,ZHAO T,et al.Estimating socioeconomic status via temporal-spatial mobility analysis-A case study of smart card data[C]//28th International Conference on Compu-ter Communication and Networks (ICCCN).IEEE,2019:1-9.
[16] DING S,GAO X,DONG Y,et al.Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China[J].Journal of Social Computing,2021,2(1):71-88.
[17] CULLUMBINE H.The health of a tropical people.A survey in Ceylon.2.Environment,health and physique[J].Lancet,1953,264:1144-1147.
[18] GOVER M.Physical impairments of members of low-incomefarm families;11490 persons in 2,477 rural families examined by the Farm Security Administration,1940;variation of blood pressure and heart disease with age;and the correlation of blood pressure with height and weight[J].Public Health Reports,1944,59(36):1163-1184.
[19] AYYAGARI P,GROSSMAN D,SLOAN F.Education andhealth:evidence on adults with diabetes[J].International Journal of Health Care Finance and Economics,2011,11(1):35-54.
[20] SHORTELL S M.Occupational prestige differences within the medical and allied health professions[J].Social Science & Medicine,1974,8(1):1-9.
[21] SMITH A M,BAGHURST K I.Public health implications of dietary differences between social status and occupational category groups[J].Journal of Epidemiology & Community Health,1992,46(4):409-416.
[22] MEEKER M,EELLS K.Social Class in America[J].Journal of Consulting Psychology,1949,13(6):451-452.
[23] CONGER R D,CONGER K J,MARTIN M J.Socioeconomicstatus,family processes,and individual development[J].Journal of Marriage and Family,2010,72(3):685-704.
[24] JETTEN J,HASLAM S A,BARLOW F K.Bringing back the system:One reason why conservatives are happier than liberals is that higher socioeconomic status gives them access to more group memberships[J].Social Psychological and Personality Science,2013,4(1):6-13.
[25] BRADLEY R H,CORWYN R F.Socioeconomic status and child development[J].Annual Review of Psychology,2002,53(1):371-399.
[26] SIRIN S R.Socioeconomic status and academic achievement:A meta-analytic review of research[J].Review of Educational Research,2005,75(3):417-453.
[27] ABITBOL J L,KARSAI M.Socioeconomic correlations of urban patterns inferred from aerial images:interpreting activation maps of Convolutional Neural Networks[J].arXiv:2004.04907,2020.
[28] ZHAO T,HUANG H,YAO X,et al.Predicting individual socio-economic status from mobile phone data:a semi-supervised hypergraph-based factor graph approach[J].International Journal of Data Science and Analytics,2019,9(1):1-12.
[29] BAGCHI M,WHITE P R.The potential of public transportsmart card data[J].Transport Policy,2005,12(5):464-474.
[30] MOHAMED K,CÔME E,OUKHELLOU L,et al.Clusteringsmart card data for urban mobility analysis[J].IEEE Transactions on intelligent transportation systems,2016,18(3):712-728.
[31] ZHONG Y,YUAN N J,ZHONG W,et al.You are where you go:Inferring demographic attributes from location check-ins[C]//Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.2015:295-304.
[32] ANTIPOV G,BERRANI S A,DUGELAY J L.MinimalisticCNN-based ensemble model for gender prediction from face images[J].Pattern Recognition Letters,2016,70:59-65.
[33] STEELE J E,SUNDSØY P R,PEZZULO C,et al.Mappingpoverty using mobile phone and satellite data[J/OL].Journal of The Royal Society Interface,2017,14(127).https://doi.org/10.1098/rsif.2016.0690[34] XIE M,JEAN N,BURKE M,et al.Transfer learning from deep features for remote sensing and poverty mapping[C]//Thirtieth AAAI Conference on Artificial Intelligence.2016.
[35] LOBELL D B.The use of satellite data for crop yield gap analysis[J].Field Crops Research,2013,143:56-64.
[36] YOU J,LI X,LOW M,et al.Deep gaussian process for crop yield prediction based on remote sensing data[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017.
[37] GEBRU T,KRAUSE J,WANG Y,et al.Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States[J].Proceedings of the National Academy of Sciences,2017,114(50):13108-13113.
[38] NAIK N,PHILIPOOM J,RASKAR R,et al.Streetscore-predicting the perceived safety of one million streetscapes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2014:779-785.
[39] NAIK N,KOMINERS S D,RASKAR R,et al.Computer vision uncovers predictors of physical urban change[J].Proceedings of the National Academy of Sciences,2017,114(29):7571-7576.
[40] SEIFERLING I,NAIK N,RATTI C,et al.Green streets-Quantifying and mapping urban trees with street-level imagery and computer vision[J].Landscape and Urban Planning,2017,165:93-101.
[41] RICHARDS D R,EDWARDS P J.Quantifying street tree regulating ecosystem services using Google Street View[J].Ecological Indicators,2017,77:31-40.
[42] BLUMENSTOCK J E.Estimating economic characteristics with phone data[C]//AEA Papers and Proceedings.2018:72-76.
[43] VOLKOVA S.Predicting demographics and Affect in social networks[D/OL].John Hopkins University.https://jscholarship.library.jhu.edu/handle/1774.2/39639?show=full.
[44] VOLKOVA S,BACHRACH Y.Inferring perceived demogra-phics from user emotional tone and user-environment emotional contrast[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers).2016:1567-1578.
[45] HASANUZZAMAN M,KAMILA S,KAUR M,et al.Temporal orientation of tweets for predicting income of users[C]//Asso-ciation for Computational Linguistics (ACL).2017.
[46] VOLKOVA S,BACHRACH Y.On predicting sociodemographic traits and emotions from communications in social networks and their implications to online self-disclosure[J].Cyberpsychology,Behavior,and Social Networking,2015,18(12):726-736.
[47] VOLKOVA S,BACHRACH Y,ARMSTRONG M,et al.Inferring latent user properties from texts published in social media[C]//Twenty-Ninth AAAI Conference on Artificial Intelligence.2015.
[48] Annual survey of hours and earnings[OL].http://www.ons.gov.uk/ons/rel/ashe/annual-survey-of-hoursandearnings/.
[49] FILHO R M,BORGES G R,ALMEIDA J M,et al.Inferringuser social class in online social networks[C]//Proceedings of the 8th Workshop on Social Network Mining and Analysis.2014:1-5.
[50] MATZ S C,MENGES J I,STILLWELL D J,et al.Predicting individual-level income from Facebook profiles[J/OL].PLoS One. https://doi.org/10.1371/journal.pone.0214369.
[51] FIXMAN M,BERENSTEIN A,BREA J,et al.A Bayesian approach to income inference in a communication network[C]//2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).IEEE,2016:579-582.
[52] SUNDSØY P,BJELLAND J,REME B A,et al.Deep learning applied to mobile phone data for individual income classification[C]//Proceedings of the 2016 International Conference on Artificial Intelligence:Technologies and Applications.Bangkok,Thailand.2016:24-25.
[53] ATAHAN P.Learning profiles from user interactions and personalizing recommendations based on learnt profiles[M].The University of Texas at Dallas,2009.
[54] REN Y,TOMKO M,SALIM F D,et al.Understanding the predictability of user demographics from cyber-physical-social behaviours in indoor retail spaces[J].EPJ Data Science,2018,7:1-21.
[55] ZHANG Y,YANG Q.A survey on multi-task learning[J].ar-Xiv:1707.08114,2017.
[56] KIM R,KIM H,LEE J,et al.Predicting multiple demographic attributes with task specific embedding transformation and attention network[C]//Proceedings of the 2019 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2019:765-773.
[57] LI C L.Prestige Stratification in Contemporary Chinese Society-Occupational Prestige and Socioeconomic Status Index Measurements[J].Sociological Studies,2005(2):74-102.
[58] QI L S,WANG C W.Health status and socioeconomic status:a study based on multiple indicators[J].Chinese Health Econo-mics,2010,29(8):47-50.
[59] ZHANG W H,YU Y M.Effects of social network,social status and social trust on Residents’ mental health[J].Journal of Fujian Normal University (Philosophy and Social Sciences Edition),2020(2):100-111,170.
[60] WEI X P,WU R J.The impact of social participation of the el-derly on the risk of death in China[J].Southern Population,2015(2):57-69.
[61] WANG F Q.Socioeconomic status,lifestyle and health inequality[J].Society,2012(2):125-143.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[3] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[4] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[5] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[6] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[7] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[8] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[9] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[10] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[11] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[12] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[13] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[14] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[15] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!