Computer Science ›› 2022, Vol. 49 ›› Issue (2): 107-115.doi: 10.11896/jsjkx.210600085
• Computer Vision: Theory and Application • Previous Articles Next Articles
TAN Xin-yue, HE Xiao-hai, WANG Zheng-yong, LUO Xiao-dong, QING Lin-bo
CLC Number:
[1]VINYALS O,TOSHEV A,BENGIO S,et al.Show and tell:Aneural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3156-3164. [2]KARPATHY A,LI F F.Deep visual-semantic alignments forgenerating image descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3128-3137. [3]ANTOL S,AGRAWAL A,LU J,et al.Vqa:visual question answering[C]//Proceedings of the International Conference on Conputer Vision.2015:2425-2433. [4]ANTOL S,AGRAWAL A,LU J,et al.Vqa:visual question an-swering[C]//Proceedings of the International Conference on Conputer Vision.2015:2425-2433. [5]JOHNSON J,HARIHARAN B,MAATEN L V D,et al.Clevr:A diagnostic dataset for compositional language and elementary visual reasoning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2901-2910. [6]XU K,BA J,KIROS R,et al.Show,attend and tell:Neuralimage caption generation with visual attention[C]//Proceedings of the 32nd International Conference on International Confe-rence on Machine Learning.Lille,France,2015:2048-2057. [7]WEI Y,ZHAO Y,LU C,et al.Cross-modal retrieval with CNN visual features:A new baseline[J].IEEE Transactions on Cybernetics,2016,47(2):449-460. [8]BI J Q,LIU M F,HU H J,et al.Image captioning based on dependency syntax[J].Journal of Beijing University of Aeronautics and Astronautics,2021,47(3):431-440. [9]CHEN M J,LIN G J,HAN Q,et al.Asymmetric Patches Nonlocal Total Variation Model for Image Recovery[ J].Journal of Chongqing University of Technology(Natural Science),2020,34(2):127-132,202. [10]XU F,MA X P,LIU L B.Cross-modal retrieval method for thyroid ultrasound image and text based on generative adversarial network[J].Journal of Biomedical Engineering,2020,37(4):641-651. [11]REED S,AKATA Z,MOHAN S,et al.Learning what andwhere to draw[OL].https://arxiv.org/pdf/1610.02454.pdf. [12]ZHANG H,XU T,LI H S,et al.StackGAN:Text to photo-rea-listic image synthesis with stacked generative adversarial networks[C]//Proceedings of the 2017 IEEE International Confe-rence on Computer Vision.Venice,Italy,2017:5907-7363. [13]XU T,ZHANG P,HUANG Q,et al.AttnGAN:Fine-Grainedtext to image generation with attentional generative adversarial networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA,2018:1316-1324. [14]SUN Y,LI L Y,YE Z H,et al.Text-to-image synthesis method based on multi-level structure generative adversarial networks[J].Journal of Computer Applications,2019,39(11):3204-3209. [15]XU Y N,HE X H,ZHANG J,et al.Text-to-image synthesis method based on multi-level progressive resolution generative adversarial networks[J].Journal of Computer Applications,2020,40(12):3612-3617. [16]MO J W,XU K L,LIN L P,et al.Text-to-image generationcombined with mutual information maximization[J].Journal of Xidian University,2019,46(5):180-188. [17]HUANG Y W,ZHOU B,TANG X.Text Image GenerationMethod with Scene Description [J].Laser & Optoelectronics Progress,2021,58(4):190-198. [18]WAH C,BRANSON S,WELINDER P,et al.The Caltech-UCSD Birds 200-2011 Dataset[J].Technical Report CNS-TR-2011-001,California Institute of Technology,2011. [19]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//ECCV.2014. [20]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.Montreal,Canada,2014:2672-2680. [21]MIRZA M,OSINDERO S.Conditional generative adversarialnets[J].arXiv:1411.1784,2014. [22]NILSBACK M E,ZISSERMAN A.Automated flower classification over a large number of classes[C]//Proceedings of the 2008 Sixth Indian Conference on Computer Vision,Graphics & Image Processing.Bhubaneshwar,India,2008:722-729. [23]REED S,AKATA Z,YAN X,et al.Generative adversarial text-to-image synthesis[C]//ICML.2016. [24]ZHANG H,XU T,LI H S,et al.StackGAN++:Realisticimage synthesis with stacked generative adversarial networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,41(8):1947-1962. [25]ZHU M F,PAN P B,CHEN W,et al.DM-GAN:Dynamic me-mory generative adversarial networks for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2019:5802-5810. [26]HUANG H Y,GU Z F.A generative adversarial network basedon self-attention mechanism for text-to-image generation[J].Journal of Chongqing University,2020,43(3):55-61. [27]JU S B,XU J,LI Y F.Text-to-single image method based onself attention[OL].http://kns.cnki.net/kcms/detail/11.2127.TP.20210223.1347.018.html. [28]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].arXiv:1706.03762,2017. [29]LI G,DUAN N,FANG Y J,et al.Unicoder-vl:A universal encoder for vision and language by cross-modal pre-training[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11336-11344. [30]WANG Z H,LIU X H,LI H S,et al.Camp:Cross-modal adaptive message passing for text-image retrieval[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:5764-5773. [31]LI L H,YATSKAR M,YIN D,et al.Visualbert:A simple and performant baseline for vision and language[J].arXiv:1908.03557,2019. [32]LU J,BATRA D,PARIKH D,et al.Vilbert:Pretraining tas-kagnostic visiolinguistic representations for vision-and-language tasks[J].arXiv:1908.02265,2019. [33]TAN H,BANSAL M.Lxmert:Learning cross-modality encoder representations from transformers[J].arXiv:1908.07490,2019. [34]SCHUSTER M,PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Trans on Signal Processing,1997,45(11):2673-2681. [35]SZEGEDY C,ANHOUCKE V V,IOFFE S,et al.Rethinkingthe inception architecture for computer vision[C]//IEEE.IEEE,2016:2818-2826. [36]SALIMANS T,GOODFELLOW I J,ZAREMBA W,et al.Improved techniques for training gans[C]//NIPS.2016. [37]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium [C]//NIPS.2017:6626-6637. [38]GOU Y C,WU Q C,LI M H,et al.SegAttnGAN:Text to ImageGeneration with Segmentation Attention[J].arXiv:2005.12444,2020. [39]LI W,ZHANG P,ZHANG L,et al.Object-driven text-to-imagesynthesis via adversarial training[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2019:12174-12182. [40]HINZ T,HEINRICH S,WERMTER S.Semantic object accuracy for generative text-to-image synthesis[J].arXiv:1910.13321,2020. |
[1] | XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190. |
[2] | XU Hui, KANG Jin-meng, ZHANG Jia-wan. Digital Mural Inpainting Method Based on Feature Perception [J]. Computer Science, 2022, 49(6): 217-223. |
[3] | GAO Zhi-yu, WANG Tian-jing, WANG Yue, SHEN Hang, BAI Guang-wei. Traffic Prediction Method for 5G Network Based on Generative Adversarial Network [J]. Computer Science, 2022, 49(4): 321-328. |
[4] | ZHANG Ji-kai, LI Qi, WANG Yue-ming, LYU Xiao-qi. Survey of 3D Gesture Tracking Algorithms Based on Monocular RGB Images [J]. Computer Science, 2022, 49(4): 174-187. |
[5] | DOU Zhi, WANG Ning, WANG Shi-jie, WANG Zhi-hui, LI Hao-jie. Sketch Colorization Method with Drawing Prior [J]. Computer Science, 2022, 49(4): 195-202. |
[6] | LI Si-quan, WAN Yong-jing, JIANG Cui-ling. Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal [J]. Computer Science, 2022, 49(3): 179-184. |
[7] | SHI Da, LU Tian-liang, DU Yan-hui, ZHANG Jian-ling, BAO Yu-xuan. Generation Model of Gender-forged Face Image Based on Improved CycleGAN [J]. Computer Science, 2022, 49(2): 31-39. |
[8] | GAN Chuang, WU Gui-xing, ZHAN Qing-yuan, WANG Peng-kun, PENG Zhi-lei. Multi-scale Gated Graph Convolutional Network for Skeleton-based Action Recognition [J]. Computer Science, 2022, 49(1): 181-186. |
[9] | ZHANG Wei-qi, TANG Yi-feng, LI Lin-yan, HU Fu-yuan. Image Stream From Paragraph Method Based on Scene Graph [J]. Computer Science, 2022, 49(1): 233-240. |
[10] | LIN Zhen-xian, ZHANG Meng-kai, WU Cheng-mao, ZHENG Xing-ning. Face Image Inpainting with Generative Adversarial Network [J]. Computer Science, 2021, 48(9): 174-180. |
[11] | XU Tao, TIAN Chong-yang, LIU Cai-hua. Deep Learning for Abnormal Crowd Behavior Detection:A Review [J]. Computer Science, 2021, 48(9): 125-134. |
[12] | PAN Xiao-qin, LU Tian-liang, DU Yan-hui, TONG Xin. Overview of Speech Synthesis and Voice Conversion Technology Based on Deep Learning [J]. Computer Science, 2021, 48(8): 200-208. |
[13] | YE Hong-liang, ZHU Wan-ning, HONG Lei. Music Style Transfer Method with Human Voice Based on CQT and Mel-spectrum [J]. Computer Science, 2021, 48(6A): 326-330. |
[14] | FENG Fu-rong, ZHANG Zhao-gong. Recent Advances for Object Contour Detection Technology [J]. Computer Science, 2021, 48(6A): 1-9. |
[15] | WANG Jian-ming, LI Xiang-feng, YE Lei, ZUO Dun-wen, ZHANG Li-ping. Medical Image Deblur Using Generative Adversarial Networks with Channel Attention [J]. Computer Science, 2021, 48(6A): 101-106. |
|