Computer Science ›› 2024, Vol. 51 ›› Issue (11): 112-132.doi: 10.11896/jsjkx.231100089
• Computer Graphics & Multimedia • Previous Articles Next Articles
WANG Shuaiwei1, LEI Jie1, FENG Zunlei2, LIANG Ronghua1
CLC Number:
[1] BENGIO Y,COURVILLE A,VINCENT P.Representationlearning:A review and new perspectives[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1798-1828. [2] ZHANG D ,YIN J,ZHU X,et al.Network RepresentationLearning:A Survey[J].IEEE Transactions on Big Data,2020,6(1):3-28. [3] CHEN F X,WANG Y C,WANG B,et al.Graph representa-tion learning:a survey[J].Transactions on Signal and Information Processing,2020,9:e15. [4] CHENG K Y,MENG C Y,WANG W S,et al.Research advances in disentangled representation learning[J].Journal of Computer Applications,2021,41(12):10. [5] WEN Z D ,WANG J R ,WANG X X,,et al.A Review of Disentangled Representation Learning[J].Acta Automatica Sinica,2022,48(2):351-374. [6] DU P F,LI X Y,GAO Y L.Survey of Multimodal Visual Language Representation Learning[J].Journal of Software,2021,32(2):22. [7] YIN J,ZHANG Z D,GAO Y H,,et al.A survey on visual language pre-training[J].Journal of Software,2023,34(5):2000-2023. [8] PEARSON K.On lines and planes of closest fit to systems of points in space[J].London,Edinburgh & Dublin Philosophical Magazine & Journal of Science,1901,2(11):559-572. [9] FISHER R A.The Use of Multiple Measurements in Taxonomic Problems[J].Annals of Human Genetics,2012,7(7):179-188. [10] BAUDAT G,ANOUAR F.Generalized Discriminant AnalysisUsing a Kernel Approach[J].Neural Computation,2000,12(10):2385-2404. [11] ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290:2323-2326. [12] HINTON G E,OSINDERO S,TEH Y W.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554. [13] FUKUSHIMA K.Neocognitron:A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J].Biological Cybernetics,1980,36(4):193-202. [14] LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324. [15] ALEX K,ILYA S,GEOFFREY E.ImageNet classification with deep convolutional neural networks[C/OL]//ACM ,2017:84-90.https://doi.org/10.1145/3065386. [16] SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014. [17] SZEGEDY C.Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2015:1-9. [18] SZEGEDY C,VANHOUCKEV,IOFFE S,et al.Rethinking the Inception Architecture for Computer Vision[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2016:2818-2826. [19] SZEGEDY C,IOFFE S,VANHOUCKE V,et al.Inception-v4,Inception-ResNet and the Impact of Residual Connections on Learning[C]//AAAI.2017. [20] HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:770-778. [21] HUANG G, LIU Z,VAN DER MAATEN L,et al.DenselyConnected Convolutional Networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:2261-2269. [22] IANDOLA F N,MOSKEWICZ M W,ASHRAF K,et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <1MB model size[J].arXiv:abs/1602.07360. [23] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J].ArXiv,abs/1704.04861. [24] ZHANG X , ZHOU X, LIN Met al.ShuffleNet:An Extremely Efficient Convolutional Neural Network for Mobile Devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018:6848-6856. [25] GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative Adversarial Networks.June 2014[J/OL].http://arxiv.org/abs/1406.2661,2014. [26] NG A.Sparese Autoencoder[J].CS294A Lecture Notes,2011,72(2011):1-19. [27] LIN X,ZHU C,ZHANG Q,et al.3D Keypoint Detection Basedon Deep Neural Network with Sparse Autoencoder[J/OL]. 2016.https://www.semanticscholar.org/paper/3D-Keypoint-Detection-Based-on-Deep-Neural-Network-Lin-Zhu/f0226fd05ff951ca63d318ec71cca02925a887b9. [28] MENG Q ,CATCHPOOLE D ,SKILLICOM D,et al.Relational autoencoder for feature extraction[C]//2017 International Joint Conference on Neural Networks(IJCNN).2017:364-371. [29] AN N,DING H,YANG J,et al.Deep ensemble learning forAlzheimers disease classification[J/OL].Journal of Biomedical Informatics,2019.https://www.sciencedirect.com/science/article/pii/S1532046420300393. [30] VINCENT P,LAROCHELLE H,BENGIO Y,et al.Extracting and composing robust features with denoising autoencoders[R].Universite de Montreal,2008. [31] GIDARIS S,KOMODAKIS N.Generating classification weights with gnn denoising autoencoders for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:21-30. [32] BO D,WEI X,JIA W,et al.Stacked convolutional denoising auto-encoders for feature representation[J].IEEE Transactions on Cybernetics,2016,47(4):1017-1027. [33] HE K,CHEN X,XIE S,et al.Masked autoencoders are scalablevision learners[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:16000-16009. [34] CHEN J,HU M,LI B,et al.Efficient self-supervised vision pretraining with local masked reconstruction[J].arXiv:2206.00790,2022. [35] SALAH R,VINCENT P,MULLER X.Contractive auto-en-coders:Explicit invariance during feature extraction[C]//Proceedings of the 28th International Conference on Machine Learning.2011:833-840. [36] GANGULI S,IYER C V K,PANDEY V.Reachability Embeddings:Scalable Self-Supervised Representation Learning from Mobility Trajectories for Multimodal Geospatial Computer Vision[C]//2022 23rd IEEE International Conference on Mobile Data Management(MDM).IEEE,2022:44-53. [37] KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[EB/OL].https://www.ee.bgu.ac.il/~rrtammy/DNN/StudentPresentations/2018/AUTOEN~2.PDF. [38] SOHN K,YAN X,LEE H,et al.Learning Structured Output Representation using Deep Conditional Generative Models[C]//International Conference on Neural Information Processing Systems.MIT Press,2015. [39] LOUIZOS C,SWERSKY K,LI Y,et al.The Variational Fair Autoencoder[J/OL].Computer Science,2015.https://www.semanticscholar.org/paper/The-Variational-Fair-Autoencoder-Louizos-Swersky/cbef7a84a53e19e019e5a05d232eb3c487c0e0c6?p2df. [40] ZHAO S,SONG J,ERMON S.Infovae:Information maximizing variational autoencoders[J].arXiv:1706.02262,2017. [41] RAMACHANDRA G.Least Square Variational Bayesian Au-toencoder with Regularization[J].arXiv:1707.03134,2017. [42] CHEN X,KINGMA D P,SALIMANS T,et al.Variational lossy autoencoder[J].arXiv:1611.02731,2016. [43] SHANG W,SOHN K,AKATA Z,et al.Channel-recurrent varia-tional autoencoders[J].arXiv:1706.03729,2017. [44] CAI L,GAO H,JI S.Multi-Stage Variational Auto-Encoders for Coarse-to-Fine Image Generation.CoRR abs/1705.07202 (2017)[J].arXiv:1705.07202,2017. [45] VAN DEN OORD A,VINYALS O.Neural discrete representation learning[J/OL].Advances in Neural Information Processing Systems,2017,30.https://proceedings.neurips.cc/paper/2017/hash/7a98af17e63a0ac09ce2e96d03992fbc-Abstract.html. [46] RAZAVI A,VAN DEN OORD A,VINYALS O.Generating diverse high-fidelity images with vq-vae-2[J/OL].Advances in Neural Information Processing Systems,2019,32.https://proceedings.neurips.cc/paper/2019/hash/5f8e2fa1718d1bbcadf1cd9c7a54fb8c-Abstract.html. [47] RUECKERT F L.CR-VAE:Contrastive Regularization on Va-riational Autoencoders for Preventing Posterior Collapse[J].arXiv:2309.02968,2023. [48] DENTON E L,CHINTALA S,FERGUS R,et al.Deep generative image models using a laplacian pyramid of adversarial networks[C]//Advances in Neural Information Processing Systems.2015:1486-1494. [49] RADFORD A,METZ L,CHINTALA S.Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks[J/OL].Computer Science,2015.http://arxiv.org/pdf/1511.06434. [50] SALIMANS T,GOODFELLOW I,ZAREMBA W,et al.Im-proved techniques for training gans[J/OL].Advances in Neural Information Processing Systems,2016,29.https://proceedings.neurips.cc/paper_files/paper/2016/hash/8a3363abe792db2d8761d6403605aeb7-Abstract.html. [51] LU S,DONG Z,CAI D,et al.MIM-GAN-based Anomaly Detection for Multivariate Time Series Data[C]//2023 IEEE 98th Vehicular Technology Conference(VTC2023-Fall).IEEE,2023:1-7. [52] WU J,ZHANG C,XUE T,et al.Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling[J/OL].Advances in Neural Information Processing Systems,2016,29.https://proceedings.neurips.cc/paper/2016/hash/44f683a84163b3523afe57c2e008bc8c-Abstract.html. [53] YANG W X,YAN Y,CHEN S,et al.Multi-scale Generative Adversarial Network for Person Re-identification under Occlusion[J].Journal of Software,2020,31(7):1943-195. [54] SUN H,ZHU T,CHANG W,et al.Generative Adversarial Networks Unlearning[J].arXiv:2308.09881,2023. [55] ATHREYA S,RADHACHANDRAN A,IVEZI? V,et al.Ultrasound Image Enhancement using CycleGAN and Perceptual Loss[J].arXiv:2312.11748,2023. [56] MIRZA M,OSINDERO S.Conditional generative adversarialnets[J].arXiv:1411.1784,2014. [57] REED S,AKATA Z,MOHAN S,et al.Learning What andWhere to Draw[J/OL].New Republic,2016.https://proceedings.neurips.cc/paper/2016/hash/a8f15eda80c50adb0e71943adc8015cf-Abstract.html. [58] ZHANG H,XU T,LI H,et al.Stackgan:Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5907-5915. [59] BOUROU A,BOYER T,DAUPIN K,et al.PhenDiff:Revealing Invisible Phenotypes with Conditional Diffusion Models[J].ar-Xiv:2312.08290,2023. [60] LI J,GUO Y M,YU T Y,et al.Multi-target Category Adversarial Example Generating Algorithm Based on GAN[J].Computer Science,2022,49(2):83-91. [61] ARJOVSKY M,CHINTALA S,BOTTOU L.Wasserstein GAN[J].arXiv:1701.07875,2017. [62] MESCHEDER L,NOWOZIN S,GEIGER A.Adversarial variational bayes:Unifying variational autoencoders and generative adversarial networks[C]//International Conference on Machine Learning.PMLR,2017:2391-2400. [63] LI J,GUO Y M,YU T Y,et al.Multi-target Category Adversarial Example Generating Algorithm Based on GAN[J].Computer Science,2022,49(2):83-91. [64] SOHL-DICKSTEIN J,WEISS E,MAHESWARANATHANN,et al.Deep unsupervised learning using nonequilibrium thermodynamics[C]//International Conference on Machine Lear-ning.PMLR,2015:2256-2265. [65] HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851. [66] NICHOL A Q,DHARIWAL P.Improved denoising diffusionprobabilistic models[C]//International Conference on Machine Learning.PMLR,2021:8162-8171. [67] DHARIWAL P,NICHOL A.Diffusion models beat gans onimage synthesis[J].Advances in Neural Information Processing Systems,2021,34:8780-8794. [68] WANG Y H ,YAIR S,AARON G,et al.InfoDiffusion:Representation Learning Using Information Maximizing Diffusion Models[C]//ICML.2023. [69] YANG X,WANG X.Diffusion Model as Representation Learner[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:18938-18949. [70] SONG R,LIU Y,MARTIN R R,et al.3d point of interest detection via spectral irregularity diffusion[J].The Visual Computer,2013,29(6):695-705. [71] KRIZHEVSKY A ,HINTON G.Learning multiple layers offeatures from tiny images[J/OL].Handbook of Systemic Autoimmune Diseases,2009,1(4).https://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf. [72] BEEKLY D L, RAMOS E M, LEE W W, et al.The National Alzheimer’s Coordinating Center (NACC) database:The uniform data set[J].Alzheimer Disease & Associated Disorders,2007,21:249-258. [73] HARIHARAN B,GIRSHICK R. Low-shot visual recognitionby shrinking and hallucinating features[J].arXiv:1606.02819,2016. [74] JIA D,WEI D,RICHARD S,et al.ImageNet:A large-scale hie-rarchical image database[C]//CVPR.2009. [75] WELINDER P, BRANSON S,MITA T,et al.Caltech-UCSDBirds 200[R].California Institute of Technology,2010. [76] WANG T,ISOLA P.Understanding contrastive representationlearning through alignment and uniformity on the hypersphere[C]//International Conference on Machine Learning.PMLR,2020:9929-9939. [77] WU Z,XIONG Y,YU S X,et al.Unsupervised Feature Lear-ning via Non-parametric Instance Discrimination[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2018. [78] YE M,ZHANG X,YUEN P C,et al.Unsupervised embedding learning via invariant and spreading instance feature[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6210-6219. [79] HE K,FAN H,WU Y,et al.Momentum contrast for unsuper-vised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9729-9738. [80] CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//International Conference on Machine Learning.PMLR,2020:1597-1607. [81] AARON V D O,LI Y Z ,ORIOL V.Representation learningwith contrastive predictive coding[J].arXiv:1807.03748,2018. [82] CHEN X,FAN H,GIRSHICK R,et al.Improved Baselines with Momentum Contrastive Learning[J].arXiv:2003.04297,2020. [83] CHEN T,KORNBLITH S,SWERSKY K,et al.Big self-supervised models are strong semi-supervised learners[J].Advances in Neural Information Processing Systems,2020,33:22243-22255. [84] OORD A,LI Y,VINYALS O.Representation Learning withContrastive Predictive Coding[J].arXiv:1807.03748v1,2018. [85] HENAFF O.Data-efficient image recognition with contrastivepredictive coding[C]//International Conference on Machine Learning.PMLR,2020:4182-4192. [86] TIAN Y,KRISHNAN D,ISOLA P.Contrastive multiview co-ding[C]//European Conference on Computer Vision.Cham:Springer,2020:776-794. [87] HASSANI K,KHASAHMADI A H.Contrastive multi-viewrepresentation learning on graphs[C]//International Confe-rence on Machine Learning.PMLR,2020:4116-4126. [88] TIAN Y,SUN C,POOLE B,et al.What makes for good views for contrastive learning? [J].Advances in Neural Information Processing Systems,2020,33:6827-6839. [89] CARON M,MISRA I,MAIRAL J,et al.Unsupervised learning of visual features by contrasting cluster assignments[J].Advances in Neural Information Processing Systems,2020,33:9912-9924. [90] LI Y,HU P,LIU Z,et al.Contrastive clustering[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2021:8547-8555. [91] VAN GANSBEKE W,VANDENHENDE S,GEORGOULIS S,et al.Scan:Learning to classify images without labels[C]//European Conference on Computer Vision.Cham:Springer,2020:268-285. [92] CHEN X,XIE S,HE K.An empirical study of training self-supervised vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:9640-9649. [93] CARON M,TOUVRON H,MISRA I,et al.Emerging properties in self-supervised vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:9650-9660. [94] GRILL J B,STRUB F,ALTCH? F,et al.Bootstrap your own latent-a new approach to self-supervised learning[J].Advances in Neural Information Processing Systems,2020,33:21271-21284. [95] ABE F,JOSH A.Understanding self-supervised and contrastive learning with bootstrap your own latent (BYOL)[OL].https://untitled-ai.github.io/understanding-self-supervised-contrastive-learning.html,2020. [96] TIAN Y D , YU L T, CHEN X L,et al.Understanding self-supervisedlearning with dual deep networks[J/OL].2020.http://arxiv.org/abs/2010.00578v2. [97] RICHEMOND P H,GRILL J B,ALTCHÉ F,et al.BYOLworks even without batch statistics[J].arXiv:2010.10241,2020. [98] CHEN X,HE K.Exploring simple siamese representation lear-ning[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2021:15750-15758. [99] ZBONTAR J,JING L,MISRA I,et al.Barlow twins:Self-supervised learning via redundancy reduction[C]//International Conference on Machine Learning.PMLR,2021:12310-12320. [100] HIGGINS I,MATTHEY L,PAL A,et al.beta-vae:Learningbasic visual concepts with a constrained variational framework[J/OL].2016.https://www.semanticscholar.org/paper/beta-VAE%3A-Learning-Basic-Visual-Concepts-with-a-Higgins-Matt-hey/a90226c41b79f8b06007609f39f82757073641e2. [101] BURGESS C P,HIGGINS I,PAL A,et al.Understanding disentangling in $\beta $-VAE[J].arXiv:1804.03599,2018. [102] KIM H,MNIH A.Disentangling by factorising[C]//International Conference on Machine Learning.PMLR,2018:2649-2658. [103] CHEN R T Q,LI X,GROSSE R B,et al.Isolating sources of disentanglement in variational autoencoders[J/OL].Advances in Neural Information Processing Systems,2018,31.https://proceedings.neurips.cc/paper/2018/hash/1ee3dfcd8a0645a25a35977997223d22-Abstract.html. [104] KIM M,WANG Y,SAHU P,et al.Relevance factor VAE:Learning and identifying disentangled factors[J].arXiv:1902.01568,2019. [105] CHEN X,KINGMA D P,SALIMANS T,et al.Variational lossy autoencoder[J].arXiv:1611.02731,2016. [106] ZHAO S,SONG J,ERMON S.Infovae:Balancing learning and inference in variational autoencoders[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:5885-5892. [107] KUMAR A,SATTIGERI P,BALAKRISHNAN A.Variational inference of disentangled latent concepts from unlabeled observations[J].arXiv:1711.00848,2017. [108] HAN Q,CAI Y,ZHANG X.RevColV2:Exploring Disentangled Representations in Masked Image Modeling[J].arXiv:2309.01005,2023. [109] ARTHUR G,OLIVIER B,ALEX S,et al.Measuring statistical dependence with Hilbert-Schmidt norms[C]//Algorithmic Learning Theory.2005:63-77. [110] LOPEZ R,REGIER J,JORDAN M I,et al.Information con-straints on auto-encoding variational bayes[J/OL].Advances in Neural Information Processing Systems,2018,31.https://proceedings.neurips.cc/paper/2018/hash/9a96a2c73c0d477ff2a6da3bf538f4f4-Abstract.html. [111] ESMAEILI B,WU H,JAIN S,et al.Structured disentangled representations[C]//The 22nd International Conference on Artificial Intelligence and Statistics.PMLR,2019:2525-2534. [112] CHEN X,DUAN Y,HOUTHOOFT R,et al.Infogan:Interpretable representation learning by information maximizing ge-nerative adversarial nets[J/OL].Advances in Neural Information Processing Systems,2016,29.https://proceedings.neurips.cc/paper_files/paper/2016/hash/7c9d0b1f96aebd7b5eca8c3edaa19ebb-Abstract.html. [113] SINGH K K,OJHA U,LEE Y J.Finegan:Unsupervised hierarchical disentanglement for fine-grained object generation and discovery[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6490-6499. [114] LI Y,SINGH K K,OJHA U,et al.Mixnmatch:Multifactor disentanglement and encoding for conditional image generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8039-8048. [115] LI X,CHEN L,WANG L,et al.SCGAN:Disentangled Representation Learning by Adding Similarity Constraint on Generative Adversarial Nets[J/OL].IEEE Access,2018:147928-147938.https://ieeexplore.ieee.org/document/8476290/. [116] OJHA U,SINGH K K,LEE Y J.Generating furry cars:Disentangling object shape & Appearance across Multiple Domains[J].arXiv:2104.02052,2021. [117] LARSEN A B L,SØNDERBY S K,LAROCHELLE H,et al.Autoencoding beyond pixels using a learned similarity metric[C]//International Conference on Machine Learning.PMLR,2016:1558-1566. [118] ROSCA M,LAKSHMINARAYANAN B,WARDE-FARLEYD,et al.Variational approaches for auto-encoding generative adversarial networks[J].arXiv:1706.04987,2017. [119] BASS C,DA SILVA M,SUDRE C,et al.Icam:Interpretableclassification via disentangled representations and feature attribution mapping[J].Advances in Neural Information Processing Systems,2020,33:7697-7709. [120] LIU Z,LUO P ,WANG X.,et al.Deep learning face attributes in the wild[C]//ICCV.2015. [121] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013. [122] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [123] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [124] BALTRUAITIS T,AHUJA C,MORENCY L P.Multimodal machine learning:A survey and taxonomy[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(2):423-443. [125] SUN C,MYERS A,VONDRICK C,et al.Videobert:A jointmodel for video and language representation learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:7464-7473. [126] LI L H,YATSKAR M,YIN D,et al.VisualBERT:A Simple and Performant Baseline for Vision and Language[J/OL]. 2019.https://zhuanlan.zhihu.com/p/535357931. [127] CHEN Y C,LI L,YU L,et al.UNITER:UNiversal Image-TExt Representation Learning[C]//European Conference on Computer Vision.Cham:Springer,2020. [128] SU W,ZHU X,CAO Y,et al.Vl-bert:Pre-training of generic visual-linguistic representations[J].arXiv:1908.08530,2019. [129] LI G,DUAN N,FANG Y,et al.Unicoder-VL:A Universal Encoder for Vision and Language by Cross-Modal Pre-Training[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(7):11336-11344. [130] ALBERTI C,LING J,COLLINS M,et al.Fusion of DetectedObjects in Text for Visual Question Answering[C]//Procee-dings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019. [131] HUANG Z,ZENG Z,LIU B,et al.Pixel-bert:Aligning image pixels with text by deep multi-modal transformers[J].arXiv:2004.00849,2020. [132] HUANG Z,ZENG Z,HUANG Y,et al.Seeing out of the box:End-to-end pre-training for vision-language representation lear-ning[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2021:12976-12985. [133] KIM W,SON B,KIM I.Vilt:Vision-and-language transformerwithout convolution or region supervision[C]//International Conference on Machine Learning.PMLR,2021:5583-5594. [134] LI X,YIN X,LI C,et al.Oscar:Object-semantics aligned pre-training for vision-language tasks[C]//European Conference on Computer Vision.Cham:Springer,2020:121-137. [135] ZHANG P ,LI X ,HU X ,et al.VinVL:Making Visual Representations Matter in Vision-Language Models[J/OL].2021.https://ieeexplore.ieee.org/document/9577951. [136] HU X ,YIN X ,LIN K ,et al.VIVO:Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training[J/OL].2020.http://arxiv.org/abs/2009.13682. [137] LU J,BATRA D,PARIKH D,et al.Vilbert:Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[J/OL].Advances in Neural Information Processing Systems,2019,32.https://proceedings.neurips.cc/paper_files/paper/2019/hash/c74d97b01eae257e44aa9d5bade97baf-Abstract.html. [138] TAN H,BANSAL M.Lxmert:Learning cross-modality encoder representations from transformers[J].arXiv:1908.07490,2019. [139] LU J,GOSWAMI V,ROHRBACH M,et al.12-in-1:Multi-Task Vision and Language Representation Learning[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2020. [140] SUN Y,WANG S,LI Y,et al.Ernie:Enhanced representation through knowledge integration[J].arXiv:1904.09223,2019. [141] YU F,TANG J,YIN W,et al.Ernie-vil:Knowledge enhancedvision-language representations through scene graphs[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:3208-3216. [142] LI C,YAN M,XU H,et al.Semvlp:Vision-language pre-trai-ning by aligning semantics at multiple levels[J].arXiv:2103.07829,2021. [143] LEE K H,CHEN X,HUA G,et al.Stacked cross attention for image-text matching[C]//Proceedings of the European Confe-rence on Computer Vision (ECCV).2018:201-216. [144] FAGHRI F,FLEET D J,KIROS J R,et al.Vse++:Improving visual-semantic embeddings with hard negatives[J].arXiv:1707.05612,2017. [145] RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763. [146] JIA C,YANG Y,XIA Y,et al.Scaling up visual and vision-language representation learning with noisy text supervision[C]//International Conference on Machine Learning.PMLR,2021:4904-4916. [147] LI Y,LIANG F,ZHAO L,et al.Supervision exists everywhere:A data efficient contrastive language-image pre-training paradigm[J].arXiv:2110.05208,2021. [148] CHEN Y C,LI L,YU L,et al.Uniter:Universal image-text representation learning[C]//European Conference on Computer Vision.Cham:Springer,2020:104-120. [149] LI J,SELVARAJU R,GOTMARE A,et al.Align before fuse:Vision and language representation learning with momentum distillation[J].Advances in Neural Information Processing Systems,2021,34:9694-9705. [150] WANG W,BAO H,DONG L,et al.Vlmo:Unified vision-language pre-training with mixture-of-modality-experts[J].arXiv:2111.02358,2021. [151] YANG H H,AMARI S I.Adaptive online learning algorithms for blind separation:maximum entropy and minimum mutual information[J].Neural Computation,1997,9(7):1457-1482. [152] EASTWOOD C,WILLIAMS C K I.A framework for the quantitative evaluation of disentangled representations[C]//International Conference on Learning Representations.2018. [153] DO K,TRAN T.Theory and Evaluation Metrics for Learning Disentangled Representations[J].arXiv:1908.09961,2019. [154] JIAO X,YIN Y,SHANG L,et al.Tinybert:Distilling bert for natural language understanding[J].arXiv:1909.10351,2019. |
[1] | DONG Chao-ying, XU Xin, LIU Ai-jun, CHANG Jing-hui. New Routing Methods of LEO Satellite Networks [J]. Computer Science, 2020, 47(12): 285-290. |
[2] | YU Yuan-yuan, CHAO Wen-han, HE Yue-ying, LI Zhou-jun. Cross-language Knowledge Linking Based on Bilingual Topic Model and Bilingual Embedding [J]. Computer Science, 2019, 46(1): 238-244. |
|