计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 383-391.doi: 10.11896/jsjkx.260200058
杜剑彤1, 管泽礼2, 薛哲2
DU Jiantong1, GUAN Zeli2, XUE Zhe2
摘要: 针对社交网络眼科视频存在的视觉特征区分度低、文本描述口语化严重以及多模态语义异构等挑战,提出了一种基于多任务学习的眼科视频特征融合与多维画像构建方法(OVP),从非结构化的视频流与文本流中挖掘具有医学语义价值的多维特征,以实现对眼科视频的精准表征。利用预训练深度残差网络提取视频关键帧的高维视觉表征,捕捉眼科图像特有的细粒度特征;提出基于眼科知识图谱的眼科视频文本特征提取方法,通过检索并融合外部实体注解与关联知识,有效弥补了社交媒体文本专业语义稀疏的问题,并结合BERT模型提取富含领域知识的文本特征;在此基础上,设计跨模态注意力融合机制,动态计算视觉与文本特征的交互权重,实现了图像信息与医学语义的深度对齐。构建多任务联合优化与眼科多维画像,协同训练视频疾病分类、传播热度预测与内容质量评估3个子任务,利用任务间的共享信息提升泛化能力。在真实眼科视频数据集上进行实验,实验结果表明,OVP方法在眼科视频疾病分类准确率、热度预测及质量评估性能上均显著优于现有基线方法,验证了该方法在复杂眼科视频特征融合与多维度画像构建方面的有效性。
中图分类号:
| [1]DE CROON R,VAN HOUDT L,HTUN N N,et al.Health recommender systems:Systematic review[J].Journal of Medical Internet Research,2021,23(6):e18035. [2]SUAREZ-LLEDO V,ALVAREZ-GALVEZ J.Prevalence ofhealth misinformation on social media:systematic review[J].Journal of Medical Internet Research,2021,42(1):e026. [3]YUAN L,KANG D,DONG X,et al.Artificial intelligence in clinical education in ophthalmology:a systematic review[J].Visual Neuroscience 2025,12(6):2893-2907. [4]ARNAB A,DEHGHANI M,HEIGOLD G,et al.ViViT:Af vi-deo vision transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).2021:6836-6846. [5]LIU Z,NING J,CAO Y,et al.Video swin transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:3202-3211. [6]LI K,WANG Y,ZHANG J,et al.UniFormer:Unifying convo-lution and self-attention for visual recognition[J].IEEE Tran-sactions on Pattern Analysis and Machine Intelligence(TPAMI),2023,45(10):12581-12600. [7]LUO H,JI L,ZHONG M,et al.Clip4clip:An empirical study of clip for end to end video clip retrieval[J].Neurocomputing,2022,508:293-304. [8]LI D,LI J,LI H,et al.Align and prompt:Video-and-language pre-training with entity prompts[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:4953-4963. [9]NI B,PENG H,CHEN M,et al.Expanding language-image pretrained models for general video recognition[C]//European Conference on Computer Vision(ECCV).Cham:Springer Nature Switzerland,2022:1-18. [10]TONG Z,SONG Y,WANG J,et al.VideoMAE:Masked au-toencoders are data-efficient learners for self-supervised video pre-training[J].Advances in Neural Information Processing Systems,2022,35:10078-10093. [11]LIN K,LI L,LIN C,et al.SwinBERT:End-to-end transformers with sparse attention for video captioning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:17949-17958. [12]ZHU B,BIN Y,XU H,et al.LanguageBind:Extending video-language pretraining to N-modality by language-based semantic alignment[C]//The Twelfth International Conference on Lear-ning Representations(ICLR).2024. [13]HUANG S C,SHEN L,LUNGREN M,et al.GLoRIA:A multi-modal global-local representation learning framework for label-efficient medical image recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:3942-3951. [14]STAHLSCHMIDT S R,ULFENBORG B,SYNNERGREN J.Multimodal deep learning for biomedical data fusion:a review[J].Briefings in Bioinformatics,2022,23(2):1-15. [15]LIAN Z,YANG Q,WANG W,et al.DEEP-FEL:Decentralized,efficient and privacy-enhanced federated edge learning for healthcare cyber physical systems[J].IEEE Transactions onNetwork Science and Engineering,2022,9(5):3558-3569. [16]FAN H,XIONG B,MANGALAM K,et al.Multiscale vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:6824-6835. [17]KONDRATYUK D,YUAN L,LI Y,et al.Movinets:Mobile video networks for efficient video recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:16020-16030. [18]BAIN M,NAGRANI A,VAROL G,et al.Frozen in time:A joint video and image encoder for end-to-end retrieval[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:1728-1738. [19]SUN R,LI Y,ZHANG T,et al.Lesion-aware transformers for diabetic retinopathy grading[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:10938-10947. [20]GU Y,TINN R,CHENG H,et al.Domain-specific language model pretraining for biomedical natural language processing[J].ACM Transactions on Computing for Healthcare,2021,3(1):1-23. [21]TREWARTHA A,WALKER N,HUO H,et al.Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science[J].Patterns,2022,3(4):100488. [22]WANG Z,WU Z,AGARWAL D,et al.MedCLIP:Contrastive learning from unpaired medical images and text[C]//Procee-dings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:3876-3887. [23]WANG J,LI W,LIU W,et al.al Enabling inductive knowledge graph completion via structure-aware attention network[J].Applied Intelligence,2023,53(8):25003-25027. [24]YASUNAGA M,REN H,BOSSELUT A,et al.QA-GNN:Reasoning with Language Models and Knowledge Graphs for Question Answering[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2021:535-546. [25]VANDENHENDE S,GEORGOULIS S,VAN GANSBEKE W,et al.Multi-task learning for dense prediction tasks:A survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,44(7):3614-3633. [26]ZHOU H Y,YU Y,WANG C,et al.A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics[J].Nature Biomedical Engineering,2023,7(6):743-755. [27]KAZEMZADEH K.Artificial intelligence in ophthalmology:opportunities,challenges,and ethical considerations[J].Medical Hypothesis,Discovery and Innovation in Ophthalmology,2025,14(1):255. [28]GAWLIKOWSKI J,TASSI C R,ALI M,et al.A survey of uncertainty in deep neural networks[J].Artificial Intelligence Review,2023,56:1513-1589. [29]MOOR M,BANERJIE O,ABAD Z S H,et al.Foundation mo-dels for generalist medical artificial intelligence[J].Nature,2023,616(7956):259-265. |
|
||