Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240400206-8.doi: 10.11896/jsjkx.240400206

• Artificial Intelligence • Previous Articles     Next Articles

Review on Methods and Applications of Short Text Similarity Measurement in Social Media Platforms

FAN Xing1, ZHOU Xiaohang2,3, ZHANG Ning1   

  1. 1 School of Business,Qingdao University,Qingdao,Shangdong 266000,China
    2 Qingdao City University,Qingdao,Shangdong 266000,China
    3 School of Information Management and Engineering,Shanghai University of Finance and Economics,Shanghai 200000,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:FAN Xing,born in 1999,postgraduate.Her main research interests include online public opinion management and ecommerce.
    ZHANG Ning,born in 1980,Ph.D,professor.His main research interests include business data analysis and online public opinion management.
  • Supported by:
    Shandong Provincial Natural Science Foundation Project(ZR2022MG022).

Abstract: Short text similarity measurement is a fundamental task in the field of natural language processing.With the increase in user activity on social media platforms,short text data is positioned as the primary carrier of internet information dissemination.This data type holds considerable value for businesses in gaining insights into consumer sentiments and in accurately representing user profiles through big data.By systematically categorizing short text similarity measurement methods,these can be divided into three categories:string-based methods,vector-based methods,and deep learning methods.Furthermore,the paper explores the advantages and limitations of these methods.Moreover,this research emphasizes the practical applications of short text similarity in business analytics,demonstrating how short text similarity measurement can enable businesses to derive insights into consumer opinions and attitudes,and to refine marketing strategies.Finally,this study provides a comprehensive summary of the challenges encountered in short text similarity measurement on social media platforms and anticipates future developments,with the aim of offering valuable references and insights for related researchers.

Key words: Short text similarity, Social media platforms, String-based, Vector-based, Deep learning, Sentiment analysis, User analysis

CLC Number: 

  • TP391
[1]AMUR Z H,KWANG HOOI Y,BHANBHRO H,et al.Short-text semantic similarity(STSS):techniques,challenges and future perspectives[J].Applied Sciences,2023,13(6):3911.
[2]AHMED M H,TIUN S,OMAR N,et al.Short text clustering algorithms,application and challenges:A survey[J].Applied Sciences,2022,13(1):342.
[3]PRAKOSO D W,ABDI A,AMRIT C.Short text similaritymeasurement methods:a review[J].Soft Computing,2021,25:4699-4723.
[4]TIUN S,SAAD S,NOR N F M,et al.Quantifying semanticshift visually on a Malay domain-specific corpus using temporal word embedding approach[J].Asia-Pacific Journal of Information Technology and Multimedia,2020,9(2):1-10.
[5]HU X,SUN N,ZHANG C,et al.Exploiting internal and external semantics for the clustering of short texts using world knowledge[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management.2009:919-928.
[6]LEVENSHTEIN V I.Binary codes capable of correcting dele-tions,insertions,and reversals[C]//Soviet Physics Doklady.1966,10(8):707-710.
[7]ELHADI M T.Text similarity calculations using text and syntactical structures[C]//7th International Conference on Computing and Convergence Technology(ICCCT).IEEE,2012:715-719.
[8]KONDRAK G.N-gram similarity and distance[C]//Interna-tional Symposium on String Processing and Information Retrieval.Berlin,Heidelberg:Springer Berlin Heidelberg,2005:115-126.
[9]JACCARD P.Étude comparative de la distribution florale dans une portion des Alpes et des Jura[J].Bull Soc Vaudoise Sci Nat,1901,37:547-579.
[10]DICE L R.Measures of the amount of ecologic association between species[J].Ecology,1945,26(3):297-302.
[11]SINGH N,CHAUDHARI N S.N-gram approach for a URLsimilarity measure[C]//1st India International Conference on Information Processing(IICIP).IEEE,2016:1-6.
[12]DOLEV S,GHANAYIM M,BINUN A,et al.Relationship of Jaccard and edit distance in malware clustering and online identification[C]//IEEE 16th International Symposium on Network Computing and Applications(NCA).IEEE,2017:1-5.
[13]SULTANA S,BISKRI I.Identifying similar sentences by using n-grams of characters[C]//Recent Trends and Future Technology in Applied Intelligence:31st International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems,IEA/AIE 2018.Springer International Publishing,2018:833-843.
[14]BERGER H,DITTENBACH M,MERKL D.Analyzing theeffect of document representation on machine learning approaches in multi-class e-mail filtering[C]//2006 IEEE/WIC/ACM International Conference on Web Intelligence(WI 2006 Main Conference Proceedings)(WI’06).IEEE,2006:297-300.
[15]SALTON G,WONG A,YANG C S.A vector space model for automatic indexing[J].Communications of the ACM,1975,18(11):613-620.
[16]ROBERTSON S E,WALKER S.Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval[C]//SIGIR’94:Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval.Springer London,1994:232-241.
[17]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26.
[18]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//International Conference on Machine Learning.PMLR,2014:1188-1196.
[19]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:1532-1543.
[20]BIGGERS F B,MOHANTY S D,MANDA P.A deep semantic matching approach for identifying relevant messages for social media analysis[J].Scientific Reports,2023,13(1):12005.
[21]KUMAR V,GUPTA A K,GARG R R,et al.The ultimate recommendation system:proposed Pranik System[J].Multimedia Tools and Applications,2023:1-22.
[22]MOHOTTI W A,NAYAK R.Deep hierarchical non-negative matrix factorization for clustering short text[C]//Neural Information Processing,ICONIP 2020.Springer International Publishing,2020:270-282.
[23]MIHALCEA R,CORLEY C,STRAPPARAVA C.Corpus-based and knowledge-based measures of text semantic similarity[C]//Aaai.2006,6(2006):775-780.
[24]O’SHEA J,BANDAR Z,CROCKETT K,et al.A comparative study of two short text semantic similarity measures[C]//Agent and Multi-Agent Systems:Technologies and Applications.Springer Berlin Heidelberg,2008:172-181.
[25]RUS V,NIRAULA N,BANJADE R.Similarity measures based on latent dirichlet allocation[C]//Computational Linguistics and Intelligent Text Processing:14th International Conference.Springer Berlin Heidelberg,2013:459-470.
[26]LOTTO M,ZAKIR HUSSAIN I,KAUR J,et al.Analysis of fluoride-free content on twitter:topic modeling study[J].Journal of Medical Internet Research,2023,25:e44586.
[27]SEAR R,RESTREPO N J,LUPU Y,et al.Dynamic topic modeling reveals variations in online hate narratives[C]//Science and Information Conference.Cham:Springer International Publishing,2022:564-578.
[28]BANERJEE S,RAMANATHAN K,GUPTA A.Clusteringshort texts using wikipedia[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2007:787-788.
[29]ZHAO C,YAO X,SUN S.A HowNet-based feature selectionmethod for Chinese text representation[C]//2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.IEEE,2009,1:26-30.
[30]WANG C,LONG L,LI L.HowNet based evaluation for Chinese text summarization[C]//2008 International Conference on Natural Language Processing and Knowledge Engineering.IEEE,2008:1-6.
[31]SUN X,WANG H,YU Y.Towards effective short text deep classification[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval.2011:1143-1144.
[32]GABRILOVICH E,MARKOVITCH S.Wikipedia-based semantic interpretation for natural language processing[J].Journal of Artificial Intelligence Research,2009,34:443-498.
[33]CHANDRASEKARAN D,MAGO V.Evolution of semanticsimilarity-a survey[J].ACM Computing Surveys,2021,54(2):41:1-41:37.
[34]XU J,XU B,WANG P,et al.Self-taught convolutional neural networks for short text clustering[J].Neural Networks,2017,88:22-31.
[35]ZHOU Y,LI J,CHI J,et al.Set-CNN:A text convolutional neu-ral network based on semantic extension for short text classification[J].Knowledge-Based Systems,2022,257:109948.
[36]WANG H,TIAN K,WU Z,et al.A short text classification method based on convolutional neural network and semantic extension[J].International Journal of Computational Intelligence Systems,2021,14(1):367-375.
[37]LIU J,MA H,XIE X,et al.Short text classification for faults information of secondary equipment based on convolutional neural networks[J].Energies,2022,15(7):2400.
[38]GAO Z,LI Z,LUO J,et al.Short text aspect-based sentimentanalysis based on CNN plus BiGRU[J].Applied Sciences,2022,12(5):2707.
[39]LIU Y,LI P,HU X.Combining context-relevant features with multi-stage attention network for short text classification[J].Computer Speech & Language,2022,71:101268.
[40]VISHWAKARMA D K,MEEL P,YADAV A,et al.A framework of fake news detection on web platform using ConvNet[J].Social Network Analysis and Mining,2023,13(1):24.
[41]ALKHODAIR S A,FUNG B C M,DING S H H,et al.Detecting high-engaging breaking news rumors in social media[J].ACM Transactions on Management Information Systems,2021,12(1):8.
[42]WANG Z,YANG B.Attention-based bidirectional long short-term memory networks for relation classification using knowledge distillation from BERT[C]//2020 IEEE Intl Conf on Dependable,Autonomic and Secure Computing,Intl Conf on Pervasive Intelligence and Computing,Intl Conf on Cloud and Big Data Computing,Intl Conf on Cyber Science and Technology Congress.IEEE,2020:562-568.
[43]ZHANG D,HONG M,ZOU L,et al.Attention pooling-based bidirectional gated recurrent units model for sentimental classification[J].International Journal of Computational Intelligence Systems,2019,12(2):723-732.
[44]AGARWAL B,RAMAMPIARO H,LANGSETH H,et al.A deep network model for paraphrase detection in short text messages[J].Information Processing & Management,2018,54(6):922-937.
[45]SALMAN AL-TAMEEMI I K,FEIZI-DERAKHSHI M-R,PASHAZADEH S,et al.An efficient sentiment classification method with the help of neighbors and a hybrid of RNN models[J].Complexity,2023,2023(1):e1896556.
[46]MA J,GUO X,ZHAO X.Identifying purchase intention through deep learning:analyzing the Q &D text of an E-Commerce platform[J].Annals Of Operations Research,2022,339(1):329-348.
[47]FASEEH M,KHAN M A,IQBAL N,et al.Enhancing user experience on Q&A platforms:measuring text similarity based on hybrid CNN-LSTM model for efficient duplicate question detection[J].IEEE Access,2024,12:34512-34526.
[48]DEVLIN J,CHANG M W,LEE K,et al.Bert:pre-training ofdeep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[49]PUGACHEV L,BURTSEV M.Short Text Clustering withTransformers[J].arXiv:2102.00541,2021.
[50]YUAN S,LIU N,SUN B,et al.A domain-knowledge based reconstruction framework for out-of-domain news title classification[J].Expert Systems with Applications,2024,237:121483.
[51]QIU S,NIU Y,LI J,et al.Research on semantic similarity ofshort text based on bert and time warping distance[J].Journal Of Web Engineering,2021,20(8):2521-2543.
[52]NOORIAN A,HAROUNABADI A,HAZRATIFARD M.A sequential neural recommendation system exploiting BERT and LSTM on social media posts[J].Complex & Intelligent Systems,2024,10(1):721-744.
[53]ALTAMIMI A,UMER M,HANIF D,et al.Employing siamese malstm model and elmo word embedding for quora duplicate questions detection[J].IEEE ACCESS,Piscataway:IEEE-Inst Electrical Electronics Engineers Inc,2024,12:29072-29082.
[54]DIGUTSCH J,KOSINSKI M.Overlap in meaning is a stronger predictor of semantic activation in GPT-3 than in humans[J].Scientific Reports,2023,13(1):5035.
[55]SINGH N K,TOMAR D S,SANGAIAH A K.Sentiment analysis:a review and comparative analysis over social media[J].Journal of Ambient Intelligence and Humanized Computing,2020,11(1):97-117.
[56]LEHRER S,XIE T,ZHANG X.Social media sentiment,model uncertainty,and volatility forecasting[J].Economic Modelling,2021,102:105556.
[57]BEHERA R K,JENA M,RATH S K,et al.Co-LSTM:Convolutional LSTM model for sentiment analysis in social big data[J].Information Processing & Management,2021,58(1):102435.
[58]PANG J,LI X,XIE H,et al.SBTM:topic modeling over short texts[C]//Database Systems for Advanced Applications.Cham:Springer International Publishing,2016:43-56.
[59]ARAQUE O,ZHU G,IGLESIAS C A.A semantic similarity-based perspective of affect lexicons for sentiment analysis[J].Knowledge-Based Systems,2019,165:346-359.
[60]JING Y,GOU H,FU C,et al.Sentiment classification of online reviews based on LDA and semantic analysis of sentimental words[C]//12th International Symposium on Computational Intelligence and Design(ISCID).IEEE,2019,1:249-252.
[61]RANE A,KUMAR A.Sentiment classification system of twitter data for US airline service analysis[C]//IEEE 42nd Annual Computer Software and Applications Conference(COMPSAC).IEEE,2018,1:769-773.
[62]LI Y M,LIN L,CHIU S W.Enhancing targeted advertisingwith social context endorsement[J].International Journal of Electronic Commerce,2014,19(1):99-128.
[63]GHOSE A,IPEIROTIS P G,LI B.Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content[J].Marketing Science,2012,31(3):493-520.
[64]SIMSEK A,KARAGOZ P.Wikipedia enriched advertisementrecommendation for microblogs by using sentiment enhanced user profiles[J].Journal of Intelligent Information Systems,2020,54(2):245-269.
[65]BLUNDO C,MAIO C D,PARENTE M,et al.Targeted advertising that protects the privacy of social networks users[J].Human-centric Computing and Information Sciences,2021,11:1-1.
[66]JIA K.Chinese sentiment classification based on Word2vec and vector arithmetic in human-robot conversation[J].Computers & Electrical Engineering,2021,95:107423.
[67]KALLOUBI F,NFAOUI E H,EL BEQQALI O.Microblog semantic context retrieval system based on linked open data and graph-based theory[J].Expert Systems with Applications,2016,53:138-148.
[68]TOKAREV G,CHERNEVA N.On the Features of a Quasi-Symbol[J].Chuzhdoezikovo Obuchenie-Foreign Language Teaching,2020,47(5):508-519.
[69]SON Y,LEE Y.The reverse translator for symbol table verification in Objective C compiler on Smart Cross Platform[J].Asia Life Sciences,2015:625-636.
[70]DI GANGI M A,LO BOSCO G,PILATO G.Effectiveness ofdata-driven induction of semantic spaces and traditional classifiers for sarcasm detection[J].Natural Language Engineering,2019,25(2):257-285.
[71]SHANCHENG T,YUNYUE B,FUYU M.A semantic textsimilarity model for double short Chinese sequences[C]//International Conference on Intelligent Transportation,Big Data & Smart City(ICITBS).IEEE,2018:736-739.
[72]ZHOU Y,LI C,HUANG G,et al.A short-text similarity model combining semantic and syntactic information[J].Electronics,2023,12(14):3126.
[73]SEVERYN A,NICOSIA M,MOSCHITTI A.Building struc-tures from classifiers for passage reranking[C]//Proceedings of the 22nd ACM international conference on Information & Knowledge Management.2013:969-978.
[74]HE S,LI Z,ZHAO H,et al.Syntax for semantic role labeling,to be,or not to be[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:2061-2071.
[75]FENG H,QIAN X.Mining user-contributed photos for personalized product recommendation[J].Neurocomputing,2014,129:409-420.
[76]GAO J,PENG P,LU F,et.al.Knowledge-driven spatial competitive intelligence for tourism[J].Transactions in GIS,2024,28(3):535-563.
[1] WANG Baohui, GAO Zhan, XU Lin, TAN Yingjie. Research and Implementation of Mine Gas Concentration Prediction Algorithm Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240400188-7.
[2] LIU Chengming, LI Haixia, LI Shaochuan, LI Yinghao. Ensemble Learning Model for Stock Manipulation Detection Based on Multi-scale Data [J]. Computer Science, 2025, 52(6A): 240700108-8.
[3] ZHOU Lei, SHI Huaifeng, YANG Kai, WANG Rui, LIU Chaofan. Intelligent Prediction of Network Traffic Based on Large Language Model [J]. Computer Science, 2025, 52(6A): 241100058-7.
[4] GUAN Xin, YANG Xueyong, YANG Xiaolin, MENG Xiangfu. Tumor Mutation Prediction Model of Lung Adenocarcinoma Based on Pathological [J]. Computer Science, 2025, 52(6A): 240700010-8.
[5] TAN Jiahui, WEN Chenyan, HUANG Wei, HU Kai. CT Image Segmentation of Intracranial Hemorrhage Based on ESC-TransUNet Network [J]. Computer Science, 2025, 52(6A): 240700030-9.
[6] RAN Qin, RUAN Xiaoli, XU Jing, LI Shaobo, HU Bingqi. Function Prediction of Therapeutic Peptides with Multi-coded Neural Networks Based on Projected Gradient Descent [J]. Computer Science, 2025, 52(6A): 240800024-6.
[7] YANG Jixiang, JIANG Huiping, WANG Sen, MA Xuan. Research Progress and Challenges in Forest Fire Risk Prediction [J]. Computer Science, 2025, 52(6A): 240400177-8.
[8] YE Jiale, PU Yuanyuan, ZHAO Zhengpeng, FENG Jue, ZHOU Lianmin, GU Jinjing. Multi-view CLIP and Hybrid Contrastive Learning for Multimodal Image-Text Sentiment Analysis [J]. Computer Science, 2025, 52(6A): 240700060-7.
[9] WANG Chanfei, YANG Jing, XU Yamei, HE Jiai. OFDM Index Modulation Signal Detection Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240900122-6.
[10] ZOU Ling, ZHU Lei, DENG Yangjun, ZHANG Hongyan. Source Recording Device Verification Forensics of Digital Speech Based on End-to-End DeepLearning [J]. Computer Science, 2025, 52(6A): 240800028-7.
[11] HUANG Zhiyong, LI Bicheng, WEI Wei. Aspect-level Sentiment Analysis Models Based on Syntax and Semantics [J]. Computer Science, 2025, 52(6A): 240400193-7.
[12] WANG Jiamin, WU Wenhong, NIU Hengmao, SHI Bao, WU Nier, HAO Xu, ZHANG Chao, FU Rongsheng. Review of Concrete Defect Detection Methods Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240900137-12.
[13] HAO Xu, WU Wenhong, NIU Hengmao, SHI Bao, WU Nier, WANG Jiamin, CHU Hongkun. Survey of Man-Machine Distance Detection Method in Construction Site [J]. Computer Science, 2025, 52(6A): 240700098-10.
[14] CHEN Shijia, YE Jianyuan, GONG Xuan, ZENG Kang, NI Pengcheng. Aircraft Landing Gear Safety Pin Detection Algorithm Based on Improved YOlOv5s [J]. Computer Science, 2025, 52(6A): 240400189-7.
[15] GAO Junyi, ZHANG Wei, LI Zelin. YOLO-BFEPS:Efficient Attention-enhanced Cross-scale YOLOv10 Fire Detection Model [J]. Computer Science, 2025, 52(6A): 240800134-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!