Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250400114-9.doi: 10.11896/jsjkx.250400114

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Q&A Model for Agricultural Diseases Based on Transformer

DUAN Pengsong, LUO Yu, WANG Chao   

  1. School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450002,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:DUAN Pengsong,born in 1983,Ph.D,associate professor.His main research interests include wireless sensing,IoT and machine learning.
    WANG Chao,born in 1988,Ph.D,lecturer.His main research interests include edge intelligence,machine intelligence,and human-computer interaction.
  • Supported by:
    Collaborative Education Project of the Ministry of Education(231103873161120),Science and Technology Research and Development Project of Henan Province(232102210050,242102210060),Natural Science Foundation of Henan Province(222300420295,242300421474) and Key Research and Development Project of Zhengzhou City's “Leading the Way” System(20230071A).

Abstract: To address issues such as insufficient recognition accuracy and the lack of pest control recommendation generation in agricultural pest and disease identification,this paper proposes an automatic question-answering model that integrates computer vision techniques with instruction tuning strategies.An improved Vision Transformer(ViT) model is employed for classifying agricultural crop pest and disease images,incorporating an asymmetric convolution embedding module and a channel attention mechanism to enhance feature extraction capabilities and improve classification accuracy on large-scale datasets.Based on the classification results,LoRA(Low-Rank Adaptation) technology is applied to fine-tune the Baichuan large language model through instruction tuning,generating more precise and practical prevention and control recommendations,thereby enhancing the model's applicability in agricultural scenarios.The entire experiment is conducted on the Huawei MindSpore deep learning framework,leveraging the high-performance computing capabilities of the Ascend 910 NPU for efficient model training and inference.Experimental results demonstrate that combining the improved ViT model with the instruction fine-tuning strategy not only significantly improves classification accuracy but also generates highly actionable prevention and control recommendations.

Key words: Image classification, Agricultural pests and diseases, Large models, Instruction fine-tuning, Domestic AI framework

CLC Number: 

  • TP391
[1] ZHANG W,LI Y,WANG Y,et al.Statistical analysis of major crop pest and disease occurrences and their impact on grain production in China from 2006 to 2015[J].Plant Protection,2016,42(5):10-15.
[2] XU J.Assessing global fungal threats to humans[J].MLife,2022,1(3):223-240.
[3] SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[C]//Proceedings of the International Conference on Learning Representations(ICLR).2015.
[4] HE K,ZHANG X,REN S,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[5] NGUGI H N,AKINYELU A A,EZUGWU A E.Machinelearning and deep learning for crop disease diagnosis: performance analysis and review[J].Agronomy,2024,14(12):3001.
[6] HUANG G,LIU S,VAN DER MAATEN L,et al.Con-densenet:An efficient densenet using learned group convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2752-2761.
[7] SHI Z,ZHANG Y,LI G,et al.Real Time Pest Detection in Agricultural Fields Using Convolutional Neural Networks[C]//Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2022:4321-4335.
[8] LU Y,LU X,ZHENG L,et al.Application of MultimodalTransformer Model in Intelligent Agricultural Disease Detection and Question-Answering Systems[J].Plants,2024,13(7):972.
[9] ZHANG M,LIU C,LI Z,et al.From Convolutional Networks to Vision Transformers:Evolution of Deep Learning in Agricultural Pest and Disease Identification[J].Agronomy,2025,15(5):1079.
[10] HAN Y,ZHANG X,LI W,et al.Residual Swin Transformer forClassifying the Types of Cotton Pests in Complex Background[J].Frontiers in Plant Science,2024,15:1445418.
[11] WANG T,WANG N,CUI Y P,et al.Intelligent Q&A System for Fruit and Vegetable Agricultural Knowledge Based on Large-Scale Artificial Intelligence Models[J].Smart Agriculture,2023,5(4):105-116.
[12] KSHETRI N.Navigating the Landscape of Generative AI:In-vestment Trends,Industry Growth,and Economic Effects[J].IT Professional,2024,26(2):90-96.
[13] LIU J,ZHOU Y,LI Y,et al.Exploring the integration of digital twin and generative AI in agriculture[C]//2023 15th International Conference on Intelligent Human Machine Systems and Cybernetics(IHMSC).IEEE,2023:223-228.
[14] CHEN X,CHEN T,ZHAO J,et al.AgriBERT:A Joint Entity Relation Extraction Model Based on Agricultural Text[C]//International Conference on Knowledge Science,and Management.Singapore:Springer Nature Singapore,2024:254-266.
[15] LI Y,ZHANG X,WANG L,et al.A Framework for Agricultural Intelligent Analysis Based on a Visual Language Large Model[J].Applied Sciences,2024,14(18):8350.
[16] TOO E C,YUJIAN L,NJUKI S,et al.A comparative study of fine-tuning deep learning models for plant disease identification[J].Computers and Electronics in Agriculture,2019,161:272-279.
[17] JI L,WANG Z,CHEN M,et al. How much can AI techniques improve surface air temperature forecast?-A report from AI Challenger 2018 Forecast Contest[J].Journal of Meteorological Research,2019,33(5):989-992.
[18] HE K,ZHANG X,REN S,et al.Identity mappings in deep residual networks[C]//Proceedings of the European Conference on Computer Vision.2016:630-645.
[19] GLOROT X,BENGIO Y.Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.2010:249-256.
[20] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words: transformers for image recognition at scale[C]//Proceedings of the International Conference on Learning Representations.2021.
[21] LIU Z,LIN Y,CAO Y,et al.RepViT:Revisiting Mobile CNN From ViT Perspective for Efficient Dense Deployment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:12133-12142.
[22] LIU Z,LIN Y,CAO Y,et al.Swin Transformer:Hierarchical Vision Transformer using Shifted Windows[J].IEEE Transactions on Pattern Analysis and Machine Intelligence.2021:4390-4402.
[23] YUAN L,CHEN Y,WANG T,et al.Tokens-to-Token ViT:Training Vision Transformers from Scratch on ImageNet[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021:5808-5818.
[24] WEI J,BOSMA M,ZHAO V Y,et al.Finetuned language models are zero-shot learners[C]//Proceedings of the 35th Conference on Neural Information Processing Systems,2021:12345-12358.
[25] ZHANG Q T,WANG Y C,WANG H X,et al.A Survey on Fine-Tuning Techniques for Large Language Models[J].Journal of Computer Engineering & Applications,2024,60(17):17-33.
[26] OUYANG L,WU J,JIANG X,et al.Training language models to follow instructions with human feedback[C]//Proceedings of the 36th Conference on Neural Information Processing Systems.2022.
[27] LI G,GOMEZ R,NAKAMURA K,et al.Human centered reinforcement learning:A survey[J].IEEE Transactions on Human-Machine Systems,2019,49(4):337-349.
[28] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[C]//Proceedings of the 34th International Conference on Machine Learning.2017:1897-1905.
[29] HU E J,SHEN Y,WALLIS P,et al.LoRA:low-rank adaptation of large language models[C]//Proceedings of the International Conference on Learning Representations.2022.
[30] TONG Z,DU N,SONG X,et al.Study on mindspore deep learning framework[C]//2021 17th International Conference on Computational Intelligence and Security(CIS).IEEE,2021:183-186.
[31] GUAN B L,ZHANG L P,ZHU J B,et al.Key Issues and Evaluation Methods for Constructing Agricultural Pest and Disease Image Datasets:A Review[J].Smart Agriculture,2023,5(3):17-34.
[32] WANG Z,WANG R,WANG M,et al.Self-supervised trans-former-based pre-training method with General Plant Infection dataset[J].arXiv:2407.14911,2024.
[33] MOHANTY S P,HUGHES D P,SALATHÉ M.Using Deep Learning for Image-Based Plant Disease Detection[J].Frontiers in Plant Science,2016,7:1419.
[34] YANG A,XIAO B,WANG B,et al.Baichuan 2:Open large scale language models[J].arXiv:2309.10305,2023.
[35] ZHANG S,DONG L,LI X,et al.Instruction Tuning for Large Language Models:A Survey[J].Journal of Artificial Intelligence Research,2023,76:1234-1256.
[36] HU E J,SHEN Y,WALLIS P,et al.LoRA:Low-Rank Adaptation of Large Language Models[C]//Proceedings of the International Conference on Learning Representations.2022.
[37] BOCHKOVSKIY A,WANG C Y,LIAO H Y M.YOLOv4:Optimal Speed and Accuracy of Object Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(9):2134-2146.
[38] JADON S.A survey of loss functions for semantic segmentation[C]//2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology.2020:1-7.
[39] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[40] TAGHANAKI S A,ZHENG Y,ZHOU S K,et al.Combo loss:handling input and output imbalance in multi-organ segmentation[J].Computerized Medical Imaging and Graphics,2019,75:24-33.
[41] LOSHCHILOV I,HUTTER F.SGDR:Stochastic Gradient Descent with Warm Restarts[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:1251-1258.
[42] LIALIN V,DESHPANDE V,RUMSHISKY A.Scaling down to scale up:A cost benefit analysis of parameter efficient fine tuning[J].arXiv:2305.10983,2023.
[43] BJORCK N,GOMES C P,SELMAN B,et al.Understanding batch normalization[C]//Proceedings of the 32nd Conference on Neural Information Processing Systems.2018:7705-7716.
[1] GUO Jingchen, YANG Kuiwu, DING Mengdi, WEI Jianghong. Survey of Adversarial Sample Attacks for Vision Transformer [J]. Computer Science, 2026, 53(5): 404-418.
[2] ZHENG Yi, JIA Xinghao, ZHANG Junwen, REN Shuang. Image Classification Based on Hybrid Quantum-Classical Long-Short Range Feature Extension Network [J]. Computer Science, 2026, 53(4): 277-283.
[3] CHEN Han, XU Zefeng, JIANG Jiu, FAN Fan, ZHANG Junjian, HE Chu, WANG Wenwei. Large Language Model and Deep Network Based Cognitive Assessment Automatic Diagnosis [J]. Computer Science, 2026, 53(3): 41-51.
[4] LI Hao, DING Lizhong, FU Jiarun, LINGHU Zhaohuan. Data Compression of Instruction Fine-tuning for Large Models:Refinement Based on Inference Contribution [J]. Computer Science, 2026, 53(3): 136-142.
[5] ZHAI Jie, CHEN Lexuan, PANG Zhiyu. Survey on Graph Neural Network-based Methods for Academic Performance Prediction [J]. Computer Science, 2026, 53(2): 16-30.
[6] WAN Shenghua, XU Xingye, GAN Le, ZHAN Dechuan. Pre-training World Models from Videos with Generated Actions by Multi-modal Large Models [J]. Computer Science, 2026, 53(1): 51-57.
[7] LEI Shuai, QIU Mingxin, LIU Xianhui, ZHANG Yingyao. Image Classification Model for Waste Household Appliance Recycling Based on Multi-scaleDepthwise Separable ResNet [J]. Computer Science, 2025, 52(6A): 240500057-7.
[8] WANG Chundong, ZHANG Qinghua, FU Haoran. Federated Learning Privacy Protection Method Combining Dataset Distillation [J]. Computer Science, 2025, 52(6A): 240500132-7.
[9] LI Jiawei , DENG Yuandan, CHEN Bo. Domain UML Model Automatic Construction Based on Fine-tuning Qwen2 [J]. Computer Science, 2025, 52(6A): 240900155-4.
[10] CHEN Yadang, GAO Yuxuan, LU Chuhan, CHE Xun. Saliency Mask Mixup for Few-shot Image Classification [J]. Computer Science, 2025, 52(6): 256-263.
[11] SUN Jinyong, WANG Xuechun, CAI Guoyong, SHANG Zhiliang. Open Set Recognition Based on Meta Class Incremental Learning [J]. Computer Science, 2025, 52(5): 187-198.
[12] WANG Yifei, ZHANG Shengjie, XUE Dizhan, QIAN Shengsheng. Self-supervised Backdoor Attack Defence Method Based on Poisoned Classifier [J]. Computer Science, 2025, 52(4): 336-342.
[13] SUN Tanghui, ZHAO Gang, GUO Meiqian. Long-tail Distributed Medical Image Classification Based on Large Selective Nuclear Bilateral-branch Networks [J]. Computer Science, 2025, 52(4): 231-239.
[14] XIAO Ziqin, SHI Yaqing, QU Yubin. Research on Optimization of Test Case Generation Based on Neuron Coverage Index [J]. Computer Science, 2025, 52(11): 339-348.
[15] ZHANG Xin, ZHANG Han, NIU Manyu, JI Lixia. Adversarial Sample Detection in Computer Vision:A Survey [J]. Computer Science, 2025, 52(1): 345-361.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!