计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 51-58.doi: 10.11896/jsjkx.200900194

• 人工智能* 上一篇    下一篇

多源异构用户生成内容的融合向量化表示学习

纪南巡, 孙晓燕, 李祯其   

  1. 中国矿业大学信息与控制工程学院 江苏 徐州221008
  • 收稿日期:2020-09-27 修回日期:2021-01-08 出版日期:2021-10-15 发布日期:2021-10-18
  • 通讯作者: 孙晓燕(xysun78@126.com)
  • 作者简介:jinanxun@cumt.edu.cn
  • 基金资助:
    国家自然科学基金(61876184)

Fusion Vectorized Representation Learning of Multi-source Heterogeneous User-generated Contents

JI Nan-xun, SUN Xiao-yan, LI Zhen-qi   

  1. School of Information and Control Engineering,China University of Mining and Technology,Xuzhou,Jiangsu 221008,China
  • Received:2020-09-27 Revised:2021-01-08 Online:2021-10-15 Published:2021-10-18
  • About author:JI Nan-xun,born in 1994,postgraduate.His main research interests include na-tural language processing and machine learning.
    SUN Xiao-yan,born in 1978,Ph.D professor.Her main research interests include interactive evolutionary computation,big data and intelligence optimization.
  • Supported by:
    National Natural Science Foundation of China(61876184).

摘要: 随着移动网络和APPs的发展,包含用户评价、标记、打分、图像和视频等多源异构数据的用户生成内容(User Generated Contents,UGC)成为提高个性化服务质量的重要依据,对这些数据的融合和表示学习成为其应用的关键。对此,提出一种面向多源文本和图像的融合表示学习。采用Doc2vec和LDA模型,给出多源文本的向量化表示,采用深度卷积网络获取与评价文本相关的图像特征;给出多源文本向量化表示的多策略融合机制,以及文本和图像卷积融合的表示学习。将所提算法应用于亚马逊含UGC内容的商品数据集上,基于UGC向量化表示物品的分类准确率说明了该算法的可行性和有效性。

关键词: 表示学习, 短文本, 多源异构, 融合, 用户生成内容

Abstract: With the development of mobile networks and APPs,user generated contents (UGC) containing multi-source heterogeneous data such as evaluations,markings,scoring,images and videos are greatly valuable information for improving the quality of personalized services.The representation learning of fusion and vectorization on the multi-source heterogeneous UGC is the most critical issue for the successful application.Motivated by this,we propose a representation learning method for effectively fusing and vectorizing the comments and image data.We utilize the Doc2vec and LDA models to sufficiently extract the features of the multi-source comments.The images correlated with the comments are represented with deep convolutional network.A hybrid vectorized representation learning for fusing comments and a convolution strategy for integrating images and comments are presented.The feasibility and effectiveness of the proposed method is demonstrated by applying it to typical Amazon public data sets with heterogeneous UGC,in which the vectorized multi-source heterogeneous UGC is taken as the representation of each product and the classification accuracy of the products are compared.

Key words: Fusion, Multi-source heterogeneous, Representation learning, Short text, User generated contents

中图分类号: 

  • TP391
[1]WANG J J,MA Y Q,CHEN S T,et al.Fragmentation know-ledge processing and networked artificial intelligence[J].Scientia Sinica Informations,2017,47(2):171-192.
[2]HUA B L,LI G J.Discussion on Theory and Application ofMulti-Source Information Fusion in Big Data Environment[J].Library and Information Service,2015,59(16):5-10.
[3]ZHU Z T J.A Multi-source Heterogeneous Vector Space Data Integration Scheme Based on GeoJSON[C]//26th International Conference on Geoinformatics.IEEE,2018:1-4.
[4]TEZGIDER M,YLDZ B,AYDN G.Improving Word Representation by Tuning Word2Vec Parameters with Deep Learning Model[C]//International Conference on Artificial Intelligence and Data Processing(IDPA).IEEE,2018:1-7.
[5]WANG X,LIAO Y,ZHU J,et al.A Low-Dimensional Representation Learning Method for Text Classification and Clustering[C]//IEEE Fifth International Conference on Data Science in Cyberspace (DSC).IEEE,2020:214-217.
[6]CHU Y,FENG C,GUO C.Social-Guided Representation Lear-ning for Images via Deep Heterogeneous Hypergraph Embedding[C]//IEEE International Conference on Multimedia and Expo (ICME).IEEE,2018:1-6.
[7]ZHONG P,GONG Z,LI S,et al.Learning to Diversify Deep Belief Networks for Hyperspectral Image Classification[J].IEEE Transactions on Geoscience and Remote Sensing,2017,55(6):3516-3530.
[8]HUA Y,GUO J,ZHAO H.Deep Belief Networks and DeepLearning[C]//International Conference on Intelligent Computing and Internet of Things (ICIT).IEEE,2015:1-4.
[9]KENTER T,DE RIJKE M.Short Text Similarity with Word Embeddings[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management.2015:1411-1420.
[10]YE J M,LUO D X,CHEN S.Short-text Sentiment EnhancedAchievement Prediction Method for Online Learners[J].Acta Automatica Sinica,2020,46(9):1927-1940.
[11]ZHANG Q,GAO Z M,LIU J Y.Research of Weibo Short Text Classification Based on Word2vec[J].Netinfo Security,2017(1):57-62.
[12]ZHANG P,HE Z S.Using data-driven feature enrichment of text representation and ensemble technique for sentence-level polarity classification[J].Journal of Information Science.2015,41(4):531-549.
[13]LAI S W,XU L H,LIU K,et al.Recurrent Convolutional Neural Networks for Text Classification[C]//Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence.2015:2267-2273.
[14]CHEN Q,YAO L,YANG J.Short text classification based on LDA topic model[C]//International Conference on Audio,Language and Image Processing (ICALIP).IEEE,2016:749-753.
[15]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:1103-1114.
[16]WANG Y Y.Relationship Between Linear Convolution and Circular Convolution of Discrete Sequence[J].Sichuan University of Arts and Science Journal,2015,25(5):32-35.
[17]WANG J H,LIU X Q,LI R X.Summary of Understanding and Calculation of Discrete Linear Convolution[J].Science & Technology Vision,2016(27):300-304.
[18]YANG Y,WANG J,YANG Y.Improving SVM classifier with prior knowledge in microcalcification detection1[C]//The International Conference on Image Processing (ICIP).IEEE,2012:2837-2840.
[19]JOELSSON S R,BENEDIKTSSON J A,SVEINSSON J R.Feature Selection for Morphological Feature Extraction using Random Forests[C]//Norwegian Signal Processing Symposium.IEEE,2006:10-13.
[20]NAPA K K,VIGNESWARI M,KRISHNA M V,et al.An Optimized Random Forest Classifier for Diabetes Mellitus[M]// Emerging Technologies in Data Mining and Information Security.Berlin:Springer,2018:765-773.
[21]PATIL S,KULKARNI U.Accuracy Prediction for Distributed Decision Tree using Machine Learning Approach[C]//Procee-dings of the Third International Conference on Trends in Electronics and Informatics (ICOEI).IEEE,2019:1365-1371.
[22]RADHIKA P R,NAIR R A S,VEENA G.A Comparative Studyof Lung Cancer Detection using Machine Learning Algorithms[C]//IEEE International Conference on Electrical,Computer and Communication Technologies (ICECCT).IEEE,2019:1-4.
[23]SINGH G,KUMAR B,GAUR L,et al.Comparison betweenMultinomial and Bernoulli Naïve Bayes for Text Classification[C]//International Conference on Automation,Computational and Technology Management (ICACTM).IEEE,2019:593-596.
[24]ZHANG D,WANG J,ZHAO X,et al.A Bayesian Hierarchical Model for Comparing Average F1 Scores[C]//International Conference on Data Mining.IEEE,2015:589-598.
[1] 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲.
基于无监督集群级的科技论文异质图节点表示学习方法
Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level
计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[2] 吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇.
基于异质信息网的短文本特征扩充方法
Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network
计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241
[3] 曹晓雯, 梁美玉, 鲁康康.
基于细粒度语义推理的跨媒体双路对抗哈希学习模型
Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model
计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011
[4] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[5] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[6] 吴子仪, 李邵梅, 姜梦函, 张建朋.
基于自注意力模型的本体对齐方法
Ontology Alignment Method Based on Self-attention
计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[7] 秦琪琦, 张月琴, 王润泽, 张泽华.
基于知识图谱的层次粒化推荐方法
Hierarchical Granulation Recommendation Method Based on Knowledge Graph
计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111
[8] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[9] 魏恺轩, 付莹.
基于重参数化多尺度融合网络的高效极暗光原始图像降噪
Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising
计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179
[10] 沈祥培, 丁彦蕊.
多检测器融合的深度相关滤波视频多目标跟踪算法
Multi-detector Fusion-based Depth Correlation Filtering Video Multi-target Tracking Algorithm
计算机科学, 2022, 49(8): 184-190. https://doi.org/10.11896/jsjkx.210600004
[11] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[12] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[13] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[14] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[15] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!