Computer Science ›› 2020, Vol. 47 ›› Issue (2): 1-9.doi: 10.11896/jsjkx.190600180

• Database & Big Data & Data Science • Previous Articles     Next Articles

Survey on Representation Learning of Complex Heterogeneous Data

JIAN Song-lei,LU Kai   

  1. (College of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2019-06-28 Online:2020-02-15 Published:2020-03-18
  • About author:JIAN Song-lei,born in 1991,Ph.D,assis-tant research fellow,is member of China Computer Federation (CCF).Her main research interests include representation learning,machine learning and complex network analysis;LU Kai,born in 1973,research fellow,Ph.D supervisor,is member of China Computer Federation (CCF).His main research interests include parallel and distributed system software,operating systems and machine learning.
  • Supported by:
    This work was supported by National Key Research and Development Program of China (2018YFB0803501), National High-level Personnel for Defense Technology Program (2017-JCJQ-ZQ-013), National Natural Science Foundation of China (61902405) and Hunan Province Science Foundation (2017RS3045).

Abstract: With the coming of the eras of artificial intelligence and big data,various complex heterogeneous data emerge continuously,becoming the basis of data-driven artificial intelligence methods and machine learning models.The quality of data representation directly affects the performance of following learning algorithms.Therefore,it is an important research area for representing useful complex heterogeneous data for machine learning.Firstly,multiple types of data representations were introduced and the challenges of representation learning methods were proposed.Then,according to the data modality,the data were categorized into singe-type data and multi-type data.For single-type data,the research development and typical representation learning algorithms for categorical data,network data,text data and image data were introduced respectively.Further,the multi-type data compounded by multiple single-type data were detailed,including the mixed data containing both categorical features and continuous features,the attributed network data containing node content and topological network,cross-domain data derived from different domains and the multimodal data containing multiple modalities.And based on these data,the research development and state-of-the-art representation learning models were introduced.Finally,the development trends on representation learning of complex heterogeneous data were discussed.

Key words: Attributed network, Categorical data, Cross-domain data, Machine learning, Multimodal data, Representation learning

CLC Number: 

  • TP181
[1]BENGIO Y,COURVILLE A,VINCENT P.Representation learning:A review and new perspectives[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1798-1828.
[2]HARRIS Z S.Distributional structure[J].Word,1954,10(2/3):146-162.
[3]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[4]GROVER A,LESKOVEC J.node2vec:Scalable feature learning for networks[C]∥Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.ACM,2016:855-864.
[5]BENJIO Y.Deep learning of representations for unsupervised and transfer learning[C]∥Proceedings of ICML Workshop on Unsupervised and Transfer Learning.2012:17-36.
[6]GOODFELLOW I J,COURVILLE A,BENGIO Y.Spike-and-slab sparse coding for unsupervised feature discovery[J].arXiv:1201.3382,2012.
[7]COHEN P,WEST S G,AIKEN L S.Applied multiple regression/correlation analysis for the behavioral sciences[M].Psychology Press,2014.
[8]BENJIO Y,LECUN Y.Scaling learning algorithms towards AI[J].Large-scale Kernel Machines,2007,34(5):1-41.
[9]JOLLIFFE I.Principal component analysis[M].Springer Berlin Heidelberg,2011.
[10]AIZAWA A.An information-theoretic perspective of tf-idf mea-sures[J].Information Processing & Management,2003,39(1):45-65.
[11]AHMAD A,DEY L.A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set[J].Pattern Recognition Letters,2007,28(1):110-118.
[12]IENCO D,PENSA R G,MEO R.From context to distance: Learning dissimilarity for categorical data clustering[J].ACM Transactions on Knowledge Discovery from Data (TKDD),2012,6(1):1.
[13]JIA H,CHEUNG Y,LIU J.A new distance metric for unsupervised learning of categorical data[J].IEEE Transactions on Neural Networks and Learning Systems,2015,27(5):1065-1079.
[14]WANG C,CHI C H,ZHOU W,et al.Coupled interdependent attribute analysis on mixed data[C]∥Twenty-Ninth AAAI Conference on Artificial Intelligence.2015.
[15]JIAN S,CAO L,PANG G,et al.Embedding-based Representation of Categorical Data by Hierarchical Value Coupling Lear-ning[C]∥IJCAI.2017:1937-1943.
[16]JIAN S,PANG G,CAO L,et al.CURE:Flexible Categorical Data Representation by Hierarchical Coupling Learning[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(5):853-866.
[17]ZHANG K,WANG Q,CHEN Z,et al.From categorical to numerical:Multiple transitive distance learning and embedding[C]∥Proceedings of the 2015 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathema-tics,2015:46-54.
[18]BALASUBRAMANIAN M,SCHWARTZ E L.The isomap algorithm and topological stability[J].Science,2002,295(5552):7.
[19]YIN M,GAO J,LIN Z.Laplacian regularized low-rank representation and its applications[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,38(3):504-517.
[20]HE X,NIYOGI P.Locality preserving projections[C]∥Ad-vances in Neural Information Processing Systems.2004:153-160.
[21]CAO S,LU W,Xu Q.Grarep:Learning graph representations with global structural information[C]∥Proceedings of the 24th ACM International on Conference on Information and Know-ledge Management.ACM,2015:891-900.
[22]PEROZZI B,AlRFOU R,SKIENA S.Deepwalk:Online learning of social representations[C]∥Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2014:701-710.
[23]TANG J,QU M,WANG M,et al.Line:Large-scale information network embedding[C]∥Proceedings of the 24th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2015:1067-1077.
[24]WANG D,CUI P,ZHU W.Structural deep network embedding[C]∥Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:1225-1234.
[25]CAO S,LU W,XU Q.Deep neural networks for learning graph representations[C]∥Thirtieth AAAI Conference on Artificial Intelligence.2016.
[26]CUI P,WANG X,PEI J,et al.A survey on network embedding[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(5):833-852.
[27]CAI H,ZHENG V W,CHANG K C C.A comprehensive survey of graph embedding:Problems,techniques,and applications[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(9):1616-1637.
[28]DEERWESTER S,DUMAIS S T,FURANAS G W,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407.
[29]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(1):993-1022.
[30]HOFMANN T.Probabilistic latent semantic indexing[C]∥ACM SIGIR Forum.ACM,2017,51(2):211-218.
[31]WILSON A T,CHEW P A.Term weighting schemes for latent dirichlet allocation[C]∥Human Language Technologies:The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics.Association for Computational Linguistics,2010:465-473.
[32]BENGIO Y,DUCHARME R,Vincent P,et al.A neural probabilistic language model[J].Journal of Machine Learning Research,2003,3(Feb):1137-1155.
[33]PENNINGTON J,SOCHER R,MANNING C.Glove:Global vectors for word representation[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing (EMNLP).2014:1532-1543.
[34]PETERS M E,NEUMANN M,IYYER M,et al.Deep contextua-lized word representations[J].arXiv:1802.05365,2018.
[35]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[36]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]∥Advances in Neural Information Processing Systems.2017:5998-6008.
[37]YANG Z,DAI Z,YANG Y,et al.XLNet:Generalized Autoregressive Pretraining for Language Understanding[J].arXiv:1906.08237.
[38]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[39]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[40]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[41]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[42]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[43]SZEGEDY C,IOFFE S,VANHOUCKE V,et al.Inception-v4,inception-resnet and the impact of residual connections on lear-ning[C]∥Thirty-First AAAI Conference on Artificial Intelligence.2017.
[44]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[45]Al-SAFFAR A A M,TAO H,TALAB M A.Review of deep convolution neural network in image classification[C]∥2017 International Conference on Radar,Antenna,Microwave,Electronics,and Telecommunications (ICRAMET).IEEE,2017:26-31.
[46]CHEN C,QI F.Review on Development of Convolutional Neural Network and Its Application in Computer Vision[J].Journal of Computer Science,2019,46(3):63-73.
[47]DAVID G,AVERBUCH A.SpectralCAT:Categorical spectral clustering of numerical and nominal data[J].Pattern Recognition,2012,45(1):416-433.
[48]CAO L.Coupling learning of complex interactions[J].Information Processing & Management,2015,51(2):167-186.
[49]CAO L,OU Y,PHILIP S Y.Coupled behavior analysis with applications[J].IEEE Transactions on Knowledge and Data Engineering,2011,24(8):1378-1392.
[50]WEI M,CHOW T,CHAN R.Clustering heterogeneous data with k-means by mutual information-based unsupervised feature transformation[J].Entropy,2015,17(3):1535-1548.
[51]HUANG Z.Clustering large data sets with mixed numeric and categorical values[C]∥Proceedings of the 1st Pacific-asiaConference on Knowledge Discovery and Data Mining (PAKDD).1997:21-34.
[52]CHEN J Y,HE H H.A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data[J].Information Sciences,2016,345:271-293.
[53]JI J,PANG W,ZHOU C,et al.A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data[J].Know-ledge-Based Systems,2012,30:129-135.
[54]JIA H,CHEUNG Y M.Subspace clustering of categorical and numerical data with an unknown number of clusters[J].IEEE Transactions on Neural Networks and Learning Systems,2017,29(8):3308-3325.
[55]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
[56]BALDI P.Autoencoders,unsupervised learning,and deep architectures[C]∥Proceedings of ICML Workshop on Unsupervised and Transfer Learning.2012:37-49.
[57]YANG L,JIN R.Distance metric learning:A comprehensive survey[J].Michigan State Universiy,2006,2(2):4.
[58]FROME A,SINGER Y,SHA F,et al.Learning globally-consistent local distance functions for shape-based image retrieval and classification[C]∥2007 IEEE 11th International Conference on Computer Vision.IEEE,2007:1-8.
[59]CHECHIK G,SHARMA V,SHALIT U,et al.Large scale online learning of image similarity through ranking[J].Journal of Machine Learning Research,2010,11(3):1109-1135.
[60]JIAN S,HU L,CAO L,et al.Metric-based auto-instructor for learning mixed data representation[C]∥Thirty-Second AAAI Conference on Artificial Intelligence.2018.
[61]LI J,ZHU J,ZHANG B.Discriminative deep random walk for network classification[C]∥Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016,1:1004-1013.
[62]TU C,ZHANG W,LIU Z,et al.Max-margin deepwalk:Dis-criminative learning of network representation[C]∥IJCAI.2016:3889-3895.
[63]PAN S,WU J,ZHU X,et al.Tri-party deep network representation[C]∥Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence.AAAI Press,2016:1895-1901.
[64]DAI H,DAI B,SONG L.Discriminative embeddings of latent variable models for structured data[C]∥International Confe-rence on Machine Learning.2016:2702-2711.
[65]WEI X,XU L,CAO B,et al.Cross view link prediction by lear-ning noise-resilient representation consensus[C]∥Proceedings of the 26th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2017:1611-1619.
[66]NIEPERT M,AHMED M,KUTZKOV K.Learning convolu-tional neural networks for graphs[C]∥International Conference on Machine Learning.2016:2014-2023.
[67]ZHANG D,YIN J,ZHU X,et al.User Profile Preserving Social Network Embedding[C]∥IJCAI.2017:3378-3384.
[68]YANG C,LIU Z,ZHAO D,et al.Network representation lear-ning with rich text information[C]∥Twenty-Fourth Internatio-nal Joint Conference on Artificial Intelligence.2015.
[69]ZHANG Z,YANG H,BU J,et al.ANRL:Attributed Network Representation Learning via Deep Neural Networks[C]∥IJCAI.2018:3155-3161.
[70]TU C,LIU H,LIU Z,et al.Cane:Context-aware network embedding for relation modeling[C]∥Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.2017:1722-1731.
[71]GAO H,HUANG H.Deep Attributed Network Embedding.[C]∥IJCAI.2018:3364-3370.
[72]JIAN S,HU L,CAO L,et al.Evolutionarily learning multi-aspect interactions and influences from network structure and node content[C]∥AAAI-19.2019.
[73]CHANG S,HAN W,TANG J,et al.Heterogeneous network embedding via deep architectures[C]∥Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2015:119-128.
[74]ALHARBI B,ZHANG X.Learning from your network of friends:a trajectory representation learning model based on online social ties[C]∥2016 IEEE 16th International Conference on Data Mining (ICDM).IEEE,2016:781-786.
[75]ZHANG Q,WANG H.Not all links are created equal:An adaptive embedding approach for social personalized ranking[C]∥Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2016:917-920.
[76]ODENA A,OLAH C,SHLENS J.Conditional image synthesis with auxiliary classifier gans[C]∥Proceedings of the 34th International Conference on Machine Learning.JMLR,2017:2642-2651.
[77]HIGGINS I,MATTHEY L,PAL A,et al.Beta-vae:Learning basic visual concepts with a constrained variational framework[C]∥International Conference on Learning Representations.2017,3.
[78]CHEN X,DUAN Y,HOUTHOOFT R,et al.Infogan:Inter-pretable representation learning by information maximizing ge-nerative adversarial nets[C]∥Advances in Neural Information Processing Systems.2016:2172-2180.
[79]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]∥Advances in Neural Information Processing Systems.2014:2672-2680.
[80]RADFORD A,METZ L,CHINTALA S.Unsupervised repre-sentation learning with deep convolutional generative adversarial networks[J].arXiv:1511.06434,2015.
[81]REZENDE D J,MOHAMED S,WIERSTRA D.Stochastic Backpropagation and Approximate Inference in Deep Generative Models[C]∥International Conference on MachineLear-ning.2014:1278-1286.
[82]LIU Y C,YEH Y Y,FU T C,et al.Detach and adapt:Learning cross-domain disentangled deep representation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8867-8876.
[83]GONZALEZ-GARCIA A,VAN DE WEIJER J,BENGIO Y.Ima-ge-to-image translation for cross-domain disentanglement[C]∥Advances in Neural Information Processing Systems.2018:1287-1298.
[84]LIU A H,LIU Y C,YEH Y Y,et al.A unified feature disentangler for multi-domain image translation and manipulation[C]∥Advances in Neural Information Processing Systems.2018:2590-2599. JIANS,HU L,CAO L,et al.Representation Learning with Multiple Lipschitz-constrained Alignments on Partially-labeled Cross-domain Data ∥34th AAAI Conference on Artificial Intelligence.2020.
[86]BALTRUŠAITIS T,AHUJA C,MORENCY L P.Multimodal machine learning:A survey and taxonomy[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(2):423-443.
[87]TAIGMAN Y,POLYAK A,WOLF L.Unsupervised cross-do-main image generation[J].arXiv:1611.02200,2016.
[88]VENDROV I,KIROS R,FIDLER S,et al.Order-embeddings of images and language[J].arXiv:1511.06361,2015.
[89]ANTOL S,AGRAWAL A,LU J,et al.Vqa:Visual question an-swering[C]∥Proceedings of the IEEE International Conference on Computer Vision.2015:2425-2433.
[90]OUYANG W,CHU X,WANG X.Multi-source deep learning for human pose estimation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:2329-2336.
[91]ZHANG H,HU Z,DENG Y,et al.Learning concept taxonomies from multi-modal data[J].arXiv:1606.09239,2016.
[92]ZHANG J,PENG Y,YUAN M.Unsupervised generative adversarial cross-modal hashing[C]∥Thirty-Second AAAI Confe-rence on Artificial Intelligence.2018.
[93]VUKOTIC V,RAYMOND C,GRAVIER G.Bidirectional joint representation learning with symmetrical deep neural networks for multimodal and crossmodal applications[C]∥Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval.ACM,2016:343-346.
[94]WANG W,OOI B C,YANG X,et al.Effective multi-modal retrieval based on stacked auto-encoders[J].Proceedings of the VLDB Endowment,2014,7(8):649-660.
[95]FROME A,CORRADO G S,SHLENS J,et al.Devise:A deep visual-semantic embedding model[C]∥Advances in Neural Information Processing Systems.2013:2121-2129.
[96]KIROS R,SALAKHUTDINOV R,ZEMEL R S.Unifying vi-sual-semantic embeddings with multimodal neural language models[J].arXiv:1411.2539,2014.
[97]WANG L,LI Y,HUANG J,et al.Learning two-branch neural networks for image-text matching tasks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(2):394-407.
[98]WU F,LU X,SONG J,et al.Learning of multimodal representations with random walks on the click graph[J].IEEE Transactions on Image Processing,2015,25(2):630-642.
[99]KANG C,XIANG S,LIAO S,et al.Learning consistent feature representation for cross-modal multimedia retrieval[J].IEEE Transactions on Multimedia,2015,17(3):370-381.
[100]NIAN F,BAO B K,LI T,et al.Multi-Modal Knowledge Representation Learning via Webly-Supervised Relationships Mining[C]∥Proceedings of the 25th ACM international conference on Multimedia.ACM,2017:411-419.
[1] SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[2] HUANG Li, ZHU Yan, LI Chun-ping. Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning [J]. Computer Science, 2022, 49(9): 76-82.
[3] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[4] LENG Dian-dian, DU Peng, CHEN Jian-ting, XIANG Yang. Automated Container Terminal Oriented Travel Time Estimation of AGV [J]. Computer Science, 2022, 49(9): 208-214.
[5] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[6] LI Yao, LI Tao, LI Qi-fan, LIANG Jia-rui, Ibegbu Nnamdi JULIAN, CHEN Jun-jie, GUO Hao. Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network [J]. Computer Science, 2022, 49(8): 257-266.
[7] LI Zong-min, ZHANG Yu-peng, LIU Yu-jie, LI Hua. Deformable Graph Convolutional Networks Based Point Cloud Representation Learning [J]. Computer Science, 2022, 49(8): 273-278.
[8] ZHANG Guang-hua, GAO Tian-jiao, CHEN Zhen-guo, YU Nai-wen. Study on Malware Classification Based on N-Gram Static Analysis Technology [J]. Computer Science, 2022, 49(8): 336-343.
[9] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[10] CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[11] LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
[12] ZHAO Lu, YUAN Li-ming, HAO Kun. Review of Multi-instance Learning Algorithms [J]. Computer Science, 2022, 49(6A): 93-99.
[13] WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[14] XIAO Zhi-hong, HAN Ye-tong, ZOU Yong-pan. Study on Activity Recognition Based on Multi-source Data and Logical Reasoning [J]. Computer Science, 2022, 49(6A): 397-406.
[15] HUANG Pu, DU Xu-ran, SHEN Yang-yang, YANG Zhang-jing. Face Recognition Based on Locality Regularized Double Linear Reconstruction Representation [J]. Computer Science, 2022, 49(6A): 407-411.
Full text



No Suggested Reading articles found!