计算机科学 ›› 2026, Vol. 53 ›› Issue (4): 260-268.doi: 10.11896/jsjkx.250700172

• 计算机图形学&多媒体 • 上一篇    下一篇

基于双重语义对比学习的无监督红外图像生成方法

程梓萌1,2, 杨馨悦1,2, 艾浩军1,2, 王中元3   

  1. 1 武汉大学国家网络安全学院 武汉 430072
    2 武汉大学空天信息安全与可信计算教育部重点实验室 武汉 430072
    3 武汉大学计算机学院 武汉 430079
  • 收稿日期:2025-07-25 修回日期:2025-09-22 出版日期:2026-04-15 发布日期:2026-04-08
  • 通讯作者: 艾浩军(aihj@whu.edu.cn)
  • 作者简介:(zmcheng@whu.edu.cn)
  • 基金资助:
    湖北省国际科技合作项目(2025EHA043)

Unsupervised Infrared Image Generation Method Based on Dual Semantic Contrastive Learning

CHENG Zimeng1,2, YANG Xinyue1,2, AI Haojun1,2, WANG Zhongyuan3   

  1. 1 School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
    2 Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, Wuhan University, Wuhan 430072, China
    3 School of Computer Science, Wuhan University, Wuhan 430079, China
  • Received:2025-07-25 Revised:2025-09-22 Published:2026-04-15 Online:2026-04-08
  • About author:CHENG Zimeng,born in 2002,postgraduate,is a member of CCF(No.02769G).Her main research interests include computer vision and image-to-image translation.
    AI Haojun,born in 1972.Ph.D,associate professor,is a senior member of CCF(No.06059S).His main research interests include computer vision,artificial intelligence and deepfake detection.
  • Supported by:
    Hubei Province International Science and Technology Collaboration Program(2025EHA043).

摘要: 红外图像在计算机视觉领域应用广泛。受制于采集条件,高质量红外图像数据集规模较小。把可见光图像转换为红外图像,是扩充红外数据集的有效手段。现有生成方法多依赖有监督学习,需要大量配对数据。为此,提出基于双重语义对比学习的无监督红外图像生成方法DSCGAN。该方法采用双向转换架构,通过语义对比学习增强图像内容保持能力和红外特征学习能力。损失函数增加几何一致性损失,协助保留可见光图像的原始结构与细节。同时,构建多尺度PatchGAN判别器,增强判别能力,提升生成图片的真实感。在AVIID-1,AVIID-2和Day-DroneVehicle数据集上的实验表明,DSCGAN在多项指标上优于对比方法,生成的红外图像热辐射分布更合理,视觉质量更优。在AVIID-1数据集中,DSCGAN的 SSIM值提升至0.814 4,FID分数降低至0.145 6。在Day-DroneVehicle数据集中,DSCGAN的PSNR值提升至18.14,LPIPS值降低至0.294 9。所提方法为无监督红外图像生成提供了新思路,可进一步应用于红外目标检测和场景分割等下游任务。

关键词: 图像到图像转换, 语义对比学习, 红外图像生成, 多尺度判别器, 几何一致性约束

Abstract: Infrared images are widely used in computer vision,but high-quality infrared image datasets are limited in scale due to restricted acquisition conditions.To address this problem,converting visible datasets to infrared datasets has become an effective way.Existing generation methods generally rely on supervised learning,which requires a large amount of paired data that is extremely difficult to obtain in practical applications.This paper proposes an unsupervised infrared image generation method named DSCGAN.This method adopts a bidirectional transformation architecture and introduces semantic contrast learning to enhance the ability to preserve image content and learn discriminative infrared features.The geometric consistency loss is introduced to preserve the original structure and details of visible images effectively.Meanwhile,a multi-scale PatchGAN discriminator is constructed to improve discriminative capability and enhance the realism of generated images.Experimental results on the AVIID-1,AVIID-2,and Day-DroneVehicle datasets show that DSCGAN outperforms the comparison methods in several metrics,and the generated infrared images exhibit a more reasonable thermal radiation distribution and better visual quality.In the AVIID-1 dataset,the SSIM value increases to 0.814 4,and the FID score decreases to 0.145 6.In the Day-DroneVehicle dataset,the PSNR value improves to 18.14,while the LPIPS value drops to 0.294 9.This study provides a new idea for unsupervised infrared image gene-ration,with potential applications in infrared target detection,infrared scene segmentation,and other downstream tasks.

Key words: Image-to-image translation, Semantic contrastive learning, Infrared image generation, Multi-scale discriminator, Geometric consistency constraint

中图分类号: 

  • TP391.41
[1]ZHAO M J,LI W,LI L,et al.Single-frame infrared small-target detection:a survey[J].IEEE Geoscience and Remote Sensing Magazine,2022,10(2):87-119.
[2]ZHAO X F,ZHAO Y J,HU S C,et al.Progress in active infrared imaging for defect detection in the renewable and electronic industries[J].Sensors,2023,23(21):8780.
[3]TANG W,HE F Z,LIU Y,et al.DATFuse:infrared and visible image fusion via dual attention transformer[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(7):3159-3172.
[4]HOU Y,VOLK R,SOIBELMAN L.A novel building temperature simulation approach driven by expanding semantic segmentation training datasets with synthetic aerial thermal images[J].Energies,2021,14(2):353.
[5]POGLIO T,MATHIEU-MARNI S,RANCHIN T,et al.OSIrIS:a physically based simulation tool to improve training in thermal infrared remote sensing over urban areas at high spatial resolution[J].Remote Sensing of Environment,2006,104(2):238-246.
[6]KNIAZ V V,KNYAZ V A,HLADUVKA J,et al.ThermalGAN:multimodal color-to-thermal image translation for person re-identification in multispectral dataset[C]//Proceedings of the European Conference on Computer Vision(ECCV) Workshops.Munich:Germany,2018:606-624.
[7]MA D C,XIAN Y,LI B,et al.Visible-to-infrared image translation based on an improved CGAN[J].The Visual Computer,2024,40(2):1289-1298.
[8]WANG H N,LI N,ZHAO H J,et al.MappingFormer:learning cross-modal feature mapping for visible-to-infrared image translation[C]//Proceedings of the 32nd ACM International Confe-rence on Multimedia.Melbourne:Australia,2024:10745-10754.
[9]HAN Z H,ZHANG S,SU Y R,et al.DR-AVIT:toward diverse and realistic aerial visible-to-infrared image translation[J].IEEE Transactions on Geoscience and Remote Sensing,2024,62(5):1-13.
[10]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.Nevada:USA,2014:2672-2680.
[11]ISOLA P,ZHU J Y,ZHOU T H,et al.Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Hawaii:USA,2017:1125-1134.
[12]ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice:Italy,2017:2223-2232.
[13]LIU M Y,BREUEL T,KAUTZ J.Unsupervised image-to-image translation networks[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.California:USA,2017:700-708.
[14]FU H,GONG M M,WANG C H,et al.Geometry-consistentgenerative adversarial networks for one-sided unsupervised domain mapping[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.California:USA,2019:2427-2436.
[15]PARK T,EFROS A A,ZHANG R,et al.Contrastive learning for unpaired image-to-image translation[C]//Proceedings of the 16th European Conference on Computer Vision.Glasgow:UK,2020:319-345.
[16]HAN J,SHOEIBY M,PETERSSON L,et al.Dual contrastivelearning for unsupervised image-to-image translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Nashville:USA,2021:746-755.
[17]LI B,XUE K T,LIU B,et al.BBDM:Image-to-Image Translation with Brownian Bridge Diffusion Models[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2023:1952-1961.
[18]XIA M F,ZHOU Y,YI R,et al.A Diffusion Model Translator for Efficient Image-to-Image Translation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2024,46(12):10272-10283.
[19]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//Proceedings of the 37th International Conference on Machine Learning.Vienna:Austria,2020:1597-1607.
[20]HU X Q,ZHOU X Y,HUANG Q S,et al.Qs-attn:query-se-lected attention for contrastive learning in i2i translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans:USA,2022:18291-18300.
[21]JUNG C,KWON G,YE J C.Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans:USA,2022:18260-18269.
[22]HAN Z H,ZHANG Z Y,ZHANG S,et al.Aerial visible-to-in-frared image translation:dataset,evaluation,and baseline[J].Journal of Remote Sensing,2023,3(1):96.
[1] 孟思雨, 牛春翔, 谭荃戈, 王蓉.
位置增强与频域分量交互的深度伪造检测方法
Deepfake Detection Method Based on Positional Enhancement and Frequency Domain ComponentInteraction
计算机科学, 2026, 53(4): 445-453. https://doi.org/10.11896/jsjkx.250700070
[2] 许立君, 赵宇杰, 赵敏, 马为駽, 陈侃松.
基于多粒度特征聚合与二分搜索的高效多视图立体重建
Efficient Multi-view Stereo Reconstruction Based on Multi-granularity Feature Aggregation and Binary Search
计算机科学, 2026, 53(3): 257-265. https://doi.org/10.11896/jsjkx.250200094
[3] 李昂, 章杰元, 刘逊韵.
基于双向交叉注意力跨域融合的航拍图像伪装目标识别方法
Camouflaged Object Detection for Aerial Images Based on Bidirectional Cross-attentionCross-domain Fusion
计算机科学, 2026, 53(1): 173-179. https://doi.org/10.11896/jsjkx.250300009
[4] 卜韵阳, 齐彬廷, 卜凡亮.
跨模态不一致感知下双视角交互融合的多模态情感分析
Multimodal Sentiment Analysis for Interactive Fusion of Dual Perspectives Under Cross-modalInconsistent Perception
计算机科学, 2026, 53(1): 187-194. https://doi.org/10.11896/jsjkx.241100029
[5] 吕景刚, 高硕, 李玉芝, 周金.
通道注意力指导全局-局部语义协同的表情识别
Facial Expression Recognition with Channel Attention Guided Global-Local Semantic Cooperation
计算机科学, 2026, 53(1): 195-205. https://doi.org/10.11896/jsjkx.250900051
[6] 曹明伟, 黄宝龙, 赵海峰.
基于外观增强和语义分割的神经辐射场
Appearance Enhancement and Semantic Segmentation-based Neural Radiance Fields
计算机科学, 2025, 52(12): 141-149. https://doi.org/10.11896/jsjkx.250400075
[7] 夏淑芳, 尹昊楠, 瞿中.
ETF-YOLO11n:交通图像的多尺度特征融合目标检测方法
ETF-YOLO11n:Object Detection Method Based on Multi-scale Feature Fusion for TrafficImages
计算机科学, 2025, 52(12): 150-157. https://doi.org/10.11896/jsjkx.241200021
[8] 陈康, 林建涵, 刘元杰.
图像去模糊算法研究综述
Survey on Image Deblurring Algorithms
计算机科学, 2025, 52(11): 98-112. https://doi.org/10.11896/jsjkx.241200045
[9] 段鹏松, 高杨, 张大龙, 曹仰杰, 赵杰.
C2P-YOLO:一种轻量级的风电塔筒裂缝检测算法
C2P-YOLO:A Lightweight Crack Detection Algorithm for Wind Turbine Towers
计算机科学, 2025, 52(11A): 250100126-6. https://doi.org/10.11896/jsjkx.250100126
[10] 陈岐, 孙瑾, 汪纪钢, 黄长城.
基于视觉损失的低照度增强图像多准则质量评价方法
Multi-criteria Quality Assessment Method for Low-illumination Enhanced Images Based on Visual Loss
计算机科学, 2025, 52(11A): 241100114-7. https://doi.org/10.11896/jsjkx.241100114
[11] 纪涛, 杨一帆, 冯亚春, 伍凌帆, 李旭亮, 李亚伟.
基于局部特征和特征融合的无人驾驶场景目标检测方法
Unmanned Driving Scene Object Detection Method Based on Local Features and Feature Fusion
计算机科学, 2025, 52(11A): 250200051-7. https://doi.org/10.11896/jsjkx.250200051
[12] 罗月童, 董子秋, 彭俊, 赵东晟.
面向聚变堆冷却管可视化的管道中心线提取方法研究与应用
Research and Application of Pipe Center-line Extraction Method for Fusion Reactor CoolingPipe Visualization
计算机科学, 2025, 52(11A): 241000137-5. https://doi.org/10.11896/jsjkx.241000137
[13] 岳倩雯, 王东强, 张强.
融合自适应优化与多维聚焦的点云配准网络
Point Cloud Registration Network Integrating Adaptive Optimization and Multi-dimensional Focusing
计算机科学, 2025, 52(11A): 250100019-7. https://doi.org/10.11896/jsjkx.250100019
[14] 刘翘铭, 魏千然, 李智, 王健, 李远方.
基于张量图扩散的共享近邻密度峰值聚类算法
Tensor Graph Diffusion Share Nearest Neighbor Density Peaks Clustering
计算机科学, 2025, 52(11A): 241200068-11. https://doi.org/10.11896/jsjkx.241200068
[15] 尹诗, 施振扬, 吴梦麟, 蔡金燕, 余德.
基于深度学习的肾脏超声图像分割:现状与挑战
Deep Learning-based Kidney Segmentation in Ultrasound Imaging:Current Trends and Challenges
计算机科学, 2025, 52(9): 16-24. https://doi.org/10.11896/jsjkx.250300159
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!