计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240700029-7.doi: 10.11896/jsjkx.240700029
郭晓利1,2,3, 李奇峰1,3, 刘羽1,3, 张俊1,3, 赵红涛2, 杨淦1,3, 蒋瑞祥1,3, 余礼根1,3
GUO Xiaoli1,2,3, LI Qifeng1,3, LIU Yu1,3, ZHANG Jun1,3, ZHAO Hongtao2, YANG Gan1,3, JIANG Ruixiang1,3, YU Ligen1,3
摘要: 针对畜禽疫病文本中特征项权重分配不准导致诊断准确率较低的问题,利用提出的TF-IIGM-NW(Term Frequency-Improved Inverse Gravity Moment With Normalization and Weighting)改进算法结合Word2vec词向量进行文本向量化表示。该方法在TF-IIGM(Term Frequency-Improved Inverse Gravity Moment)算法的基础之上,对其进行归一化处理并结合基于关键词抽取算法设定的规则,进一步提升文本内核心关键词权重,然后将其与结合Word2vec词向量获取的文本向量化表示结果输入支持向量机(Support Vector Machine,SVM)进行畜禽疫病诊断。为了验证算法的有效性,基于自建的羊疫病文本数据集,将改进算法与现有词向量常见处理方式进行对比分析。结果表明,基于TF-IIGM-NW算法的macro-F1值与micro-F1值分别达到96.73%,96.76%;与传统经典算法TF-IDF(Term Frequency-Inverse Document Frequency)相比,分别提升2.25%,2.26%;与TF-IIGM算法相比,分别提高0.90%,0.97%。改进算法能够有效提升疫病诊断性能。通过SVM在每类疫病上的实验结果分析表明,羊口疮疫病类别最易被错判。
中图分类号:
[1]JIANG R X,YU L G,DING L Y,et al.Development Status and Prospect of Intelligent Prevention and Control Technology for Livestock and Poultry Diseases[J].Chinese Journal of Animal Science,2020,56(10):23-28. [2]WANG H,SHEN W,ZHANG Y,et al.Diagnosis of dairy cow diseases by knowledge-driven deep learning based on the text reports of illness state[J].Computers and Electronics in Agriculture,2023,205:107564. [3]MUHAMEDIYEVA D T,SAFAROVA L U,TUKHTAMU-RODOV N.Early diagnostics of animal diseases on the basis of modern information technologies[C]//AIP Conference Proceedings.AIP Publishing,2023. [4]ZHENG S,ZHOU C,JIANG X,et al.Progress on infrared imaging technology in animal production:a review[J].Sensors,2022,22(3):705. [5]TERRADA O,CHERRADI B,RAIHANI A,et al.A novelmedical diagnosis support system for predicting patients with atherosclerosis diseases[J].Informatics in Medicine Unlocked,2020,21:100483. [6]ALSMADI I,HOON G K.Term weighting scheme for short-text classification:Twitter corpuses[J].Neural Computing and Applications,2019,31(8):3819-3831. [7]LI C,LI W,TANG Z,et al.An improved term weighting method based on relevance frequency for text classification[J].Soft Computing,2023,27(7):3563-3579. [8]AO X,YU X,LIU D,et al.News keywords extraction algorithm based on TextRank and classified TF-IDF[C]//International Wireless Communications and Mobile Computing(IWCMC 2020).IEEE,2020:1364-1369. [9]LAN X F,LIU Z,XU Z H,et al.A Chinese Text Keyword Extraction Method Based on the Combination of TF-IDF and TextRank——A Case Study of Sports News[J].Software Engineering,2023,26(8):6-10. [10]ZHAO J S,SONG M X,GAO X,et al,Research on Text Representation in Natural Language Processing[J].Journal of Software,2022,33(1):102-128. [11]SIEBERS P,JANIESCH C,ZSCHECH P.A survey of text representation methods and their genealogy[J].IEEE Access,2022,10:96492-96513. [12]SALTON G,WONG A,YANG C S.A vector space model for automatic indexing [J].Communications of the ACM,1974,18(11):613-620. [13]MAHDI A Y,YUHANIZ S S.Automatic Diagnosis of COVID-19 Patients from Unstructured Data Based on a Novel Weighting Scheme[J].Computers,Materials & Continua,2023,74(1). [14]DEBOLE F,SEBASTIANI F.Supervised term weighting for automated text categorization[C]//Proceedings of the 2003 ACM Symposium on Applied Computing.2003:784-788. [15]CHEN K,ZHANG Z,LONG J,et al.Turning from TF-IDF to TF-IGM for Term Weighting in Text Classification[J].Expert Systems with Applications,2016,66:245-260. [16]DOGAN T,UYSAL A K.Improved Inverse Gravity MomentTerm Weighting for Text Classification[J].Expert Systems with Applications,2019,130:45-59. [17]XU T H,WU M L.An Improved Naive Bayes Algorithm Based on TF-IDF[J].Computer Technology and Development,2020,30(2):75-79. [18]XU J.A Natural Language Processing Based Technique for Sentiment Analysis of College English Corpus[J].PeerJ Computer Science,2023,9:e1235. [19]JING L,HE T T.Chinese Text Classification Model Based onImproved TF-IDF and ABLCNN[J].Computer Science,2021,48(S2):170-175. [20]TANG Z,LI W,LI Y.An Improved Supervised Term Weighting Scheme for Text Representation and Classification[J].Expert Systems with Applications,2022,189:115985. [21]TENEVA N,CHENG W.Salience Rank:Efficient KeyphraseExtraction with Topic Modeling[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.2017:530-535. [22]LIU Z,HUANG W,ZHENG Y,et al.Automatic Keyphrase Extraction via Topic Decomposition[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing.2010:366-376. [23]STERCKX L,DEMEESTER T,DELEU J,et al.Topical Word Importance for Fast Keyphrase Extraction[C]//Proceedings of the 24th International Conference on World Wide Web.2015:121-122. [24]YU L G,GUO X L,ZHAO H T,et al.Text Word Segmentation of Livestock and Poultry Diseases Based on BERT-BiLSTM-CRF Model[J].Transactions of the Chinese Society for Agricultural Machinery,2024,55(2):287-294. [25]QIU Y,YANG B.Research on micro-blog text presentationmodel based on word2vec and TF-IDF[C]//IEEE Asia-Pacific Conference on Image Processing,Electronics and Computers(IPEC 2021).IEEE,2021:47-51. [26]CORTES C,VAPNIK V.Support-Vector Networks[J].Ma-chine Learning,1995,20:273-297. |
|