计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240700029-7.doi: 10.11896/jsjkx.240700029

• 智能医学工程 • 上一篇    下一篇

基于改进TF-IIGM算法的畜禽疫病诊断模型研究

郭晓利1,2,3, 李奇峰1,3, 刘羽1,3, 张俊1,3, 赵红涛2, 杨淦1,3, 蒋瑞祥1,3, 余礼根1,3   

  1. 1 北京市农林科学院信息技术研究中心 北京 100097
    2 华北电力大学数理学院 北京 102206
    3 国家数字畜牧业创新中心 北京 100097
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 余礼根(yulg@nercita.org.cn)
  • 作者简介:(guoxiaoli06@163.com)
  • 基金资助:
    国家重点研发计划(2023YFD1300805);云南省重大科技专项计划(202102AE090039);北京市农林科学院创新能力建设专项(KJCX20230204);内蒙古现代畜牧业发展战略研究(2023NM2N-01)

Study on Diagnosis Model of Livestock and Poultry Disease Based on Improved TF-IIGM Algorithm

GUO Xiaoli1,2,3, LI Qifeng1,3, LIU Yu1,3, ZHANG Jun1,3, ZHAO Hongtao2, YANG Gan1,3, JIANG Ruixiang1,3, YU Ligen1,3   

  1. 1 Research Center of Information Technology,Beijing Academy of Agriculture and Forestry Sciences,Beijing 100097,China
    2 School of Mathematics and Physics,North China Electric Power University,Beijing 102206,China3 Innovation Center of National Digital Livestock,Beijing 100097,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:GUO Xiaoli,born in 1998,postgra-duate.Her main research interests include NLP and diagnosis of livestock and poultry diseases.
    YU Ligen,born in 1985,Ph.D,professor.His main research interests include intelligent diagnosis of livestock and poultry diseases.
  • Supported by:
    National Key R&D Program of China(2023YFD1300805),Yunnan Province Major Science and Technology Special Program(202102AE090039),Beijing Academy of Agriculture and Forestry Sciences Innovation Capacity Building Project(KJCX20230204) and Research on the Development Strategy of Modern Animal Husbandry in Inner Mongolia(2023NM2N-01).

摘要: 针对畜禽疫病文本中特征项权重分配不准导致诊断准确率较低的问题,利用提出的TF-IIGM-NW(Term Frequency-Improved Inverse Gravity Moment With Normalization and Weighting)改进算法结合Word2vec词向量进行文本向量化表示。该方法在TF-IIGM(Term Frequency-Improved Inverse Gravity Moment)算法的基础之上,对其进行归一化处理并结合基于关键词抽取算法设定的规则,进一步提升文本内核心关键词权重,然后将其与结合Word2vec词向量获取的文本向量化表示结果输入支持向量机(Support Vector Machine,SVM)进行畜禽疫病诊断。为了验证算法的有效性,基于自建的羊疫病文本数据集,将改进算法与现有词向量常见处理方式进行对比分析。结果表明,基于TF-IIGM-NW算法的macro-F1值与micro-F1值分别达到96.73%,96.76%;与传统经典算法TF-IDF(Term Frequency-Inverse Document Frequency)相比,分别提升2.25%,2.26%;与TF-IIGM算法相比,分别提高0.90%,0.97%。改进算法能够有效提升疫病诊断性能。通过SVM在每类疫病上的实验结果分析表明,羊口疮疫病类别最易被错判。

关键词: TF-IIGM, 权重, 向量化表示, 疫病诊断, SVM

Abstract: In order to deal with the problem of low diagnostic accuracy caused by inaccurate weight allocation of feature items in livestock and poultry diseases texts,the improved TF-IIGM-GW algorithm combined with Word2vec word vector is used to rea-lize the text vectorization.On the basis of the TF-IIGM weighting method,the method is normalized and combined with the rule based on the keyword extraction algorithm to further improve the weight of core keywords in the texts.Finally,the text vectorization results obtained by combining the weight with Word2vec word vector are inputted into the support vector machine(SVM) for diagnosis of livestock and poultry diseases.In order to verify the effectiveness of the improved algorithm,based on the self-built text datasets of livestock and poultry diseases,the improved algorithm is compared with the commonly used methods of word vector.Results show that the macro-F1 value and micro-F1 value based on the TF-IIGM-GW algorithm are 96.73% and 96.76%,respectively,which are 2.25% and 2.26% higher than those of the commonly used algorithm TF-IDF,and 0.90% and 0.97% higher than those of TF-IIGM weighting method.The improved algorithm could effectively improve the performance of disease diagnosis.The analysis of the experimental results of SVM on each type of diseases shows that sheep oral aphthae is most easily misjudged.

Key words: TF-IIGM, Weighting, Vectorization, Disease diagnosis, SVM

中图分类号: 

  • TP391
[1]JIANG R X,YU L G,DING L Y,et al.Development Status and Prospect of Intelligent Prevention and Control Technology for Livestock and Poultry Diseases[J].Chinese Journal of Animal Science,2020,56(10):23-28.
[2]WANG H,SHEN W,ZHANG Y,et al.Diagnosis of dairy cow diseases by knowledge-driven deep learning based on the text reports of illness state[J].Computers and Electronics in Agriculture,2023,205:107564.
[3]MUHAMEDIYEVA D T,SAFAROVA L U,TUKHTAMU-RODOV N.Early diagnostics of animal diseases on the basis of modern information technologies[C]//AIP Conference Proceedings.AIP Publishing,2023.
[4]ZHENG S,ZHOU C,JIANG X,et al.Progress on infrared imaging technology in animal production:a review[J].Sensors,2022,22(3):705.
[5]TERRADA O,CHERRADI B,RAIHANI A,et al.A novelmedical diagnosis support system for predicting patients with atherosclerosis diseases[J].Informatics in Medicine Unlocked,2020,21:100483.
[6]ALSMADI I,HOON G K.Term weighting scheme for short-text classification:Twitter corpuses[J].Neural Computing and Applications,2019,31(8):3819-3831.
[7]LI C,LI W,TANG Z,et al.An improved term weighting method based on relevance frequency for text classification[J].Soft Computing,2023,27(7):3563-3579.
[8]AO X,YU X,LIU D,et al.News keywords extraction algorithm based on TextRank and classified TF-IDF[C]//International Wireless Communications and Mobile Computing(IWCMC 2020).IEEE,2020:1364-1369.
[9]LAN X F,LIU Z,XU Z H,et al.A Chinese Text Keyword Extraction Method Based on the Combination of TF-IDF and TextRank——A Case Study of Sports News[J].Software Engineering,2023,26(8):6-10.
[10]ZHAO J S,SONG M X,GAO X,et al,Research on Text Representation in Natural Language Processing[J].Journal of Software,2022,33(1):102-128.
[11]SIEBERS P,JANIESCH C,ZSCHECH P.A survey of text representation methods and their genealogy[J].IEEE Access,2022,10:96492-96513.
[12]SALTON G,WONG A,YANG C S.A vector space model for automatic indexing [J].Communications of the ACM,1974,18(11):613-620.
[13]MAHDI A Y,YUHANIZ S S.Automatic Diagnosis of COVID-19 Patients from Unstructured Data Based on a Novel Weighting Scheme[J].Computers,Materials & Continua,2023,74(1).
[14]DEBOLE F,SEBASTIANI F.Supervised term weighting for automated text categorization[C]//Proceedings of the 2003 ACM Symposium on Applied Computing.2003:784-788.
[15]CHEN K,ZHANG Z,LONG J,et al.Turning from TF-IDF to TF-IGM for Term Weighting in Text Classification[J].Expert Systems with Applications,2016,66:245-260.
[16]DOGAN T,UYSAL A K.Improved Inverse Gravity MomentTerm Weighting for Text Classification[J].Expert Systems with Applications,2019,130:45-59.
[17]XU T H,WU M L.An Improved Naive Bayes Algorithm Based on TF-IDF[J].Computer Technology and Development,2020,30(2):75-79.
[18]XU J.A Natural Language Processing Based Technique for Sentiment Analysis of College English Corpus[J].PeerJ Computer Science,2023,9:e1235.
[19]JING L,HE T T.Chinese Text Classification Model Based onImproved TF-IDF and ABLCNN[J].Computer Science,2021,48(S2):170-175.
[20]TANG Z,LI W,LI Y.An Improved Supervised Term Weighting Scheme for Text Representation and Classification[J].Expert Systems with Applications,2022,189:115985.
[21]TENEVA N,CHENG W.Salience Rank:Efficient KeyphraseExtraction with Topic Modeling[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.2017:530-535.
[22]LIU Z,HUANG W,ZHENG Y,et al.Automatic Keyphrase Extraction via Topic Decomposition[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing.2010:366-376.
[23]STERCKX L,DEMEESTER T,DELEU J,et al.Topical Word Importance for Fast Keyphrase Extraction[C]//Proceedings of the 24th International Conference on World Wide Web.2015:121-122.
[24]YU L G,GUO X L,ZHAO H T,et al.Text Word Segmentation of Livestock and Poultry Diseases Based on BERT-BiLSTM-CRF Model[J].Transactions of the Chinese Society for Agricultural Machinery,2024,55(2):287-294.
[25]QIU Y,YANG B.Research on micro-blog text presentationmodel based on word2vec and TF-IDF[C]//IEEE Asia-Pacific Conference on Image Processing,Electronics and Computers(IPEC 2021).IEEE,2021:47-51.
[26]CORTES C,VAPNIK V.Support-Vector Networks[J].Ma-chine Learning,1995,20:273-297.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!