计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 51-58.doi: 10.11896/jsjkx.200900194

• 人工智能* 上一篇    下一篇

多源异构用户生成内容的融合向量化表示学习

纪南巡, 孙晓燕, 李祯其   

  1. 中国矿业大学信息与控制工程学院 江苏 徐州221008
  • 收稿日期:2020-09-27 修回日期:2021-01-08 出版日期:2021-10-15 发布日期:2021-10-18
  • 通讯作者: 孙晓燕(xysun78@126.com)
  • 作者简介:jinanxun@cumt.edu.cn
  • 基金资助:
    国家自然科学基金(61876184)

Fusion Vectorized Representation Learning of Multi-source Heterogeneous User-generated Contents

JI Nan-xun, SUN Xiao-yan, LI Zhen-qi   

  1. School of Information and Control Engineering,China University of Mining and Technology,Xuzhou,Jiangsu 221008,China
  • Received:2020-09-27 Revised:2021-01-08 Online:2021-10-15 Published:2021-10-18
  • About author:JI Nan-xun,born in 1994,postgraduate.His main research interests include na-tural language processing and machine learning.
    SUN Xiao-yan,born in 1978,Ph.D professor.Her main research interests include interactive evolutionary computation,big data and intelligence optimization.
  • Supported by:
    National Natural Science Foundation of China(61876184).

摘要: 随着移动网络和APPs的发展,包含用户评价、标记、打分、图像和视频等多源异构数据的用户生成内容(User Generated Contents,UGC)成为提高个性化服务质量的重要依据,对这些数据的融合和表示学习成为其应用的关键。对此,提出一种面向多源文本和图像的融合表示学习。采用Doc2vec和LDA模型,给出多源文本的向量化表示,采用深度卷积网络获取与评价文本相关的图像特征;给出多源文本向量化表示的多策略融合机制,以及文本和图像卷积融合的表示学习。将所提算法应用于亚马逊含UGC内容的商品数据集上,基于UGC向量化表示物品的分类准确率说明了该算法的可行性和有效性。

关键词: 用户生成内容, 表示学习, 多源异构, 融合, 短文本

Abstract: With the development of mobile networks and APPs,user generated contents (UGC) containing multi-source heterogeneous data such as evaluations,markings,scoring,images and videos are greatly valuable information for improving the quality of personalized services.The representation learning of fusion and vectorization on the multi-source heterogeneous UGC is the most critical issue for the successful application.Motivated by this,we propose a representation learning method for effectively fusing and vectorizing the comments and image data.We utilize the Doc2vec and LDA models to sufficiently extract the features of the multi-source comments.The images correlated with the comments are represented with deep convolutional network.A hybrid vectorized representation learning for fusing comments and a convolution strategy for integrating images and comments are presented.The feasibility and effectiveness of the proposed method is demonstrated by applying it to typical Amazon public data sets with heterogeneous UGC,in which the vectorized multi-source heterogeneous UGC is taken as the representation of each product and the classification accuracy of the products are compared.

Key words: User generated contents, Representation learning, Multi-source heterogeneous, Fusion, Short text

中图分类号: 

  • TP391
[1]WANG J J,MA Y Q,CHEN S T,et al.Fragmentation know-ledge processing and networked artificial intelligence[J].Scientia Sinica Informations,2017,47(2):171-192.
[2]HUA B L,LI G J.Discussion on Theory and Application ofMulti-Source Information Fusion in Big Data Environment[J].Library and Information Service,2015,59(16):5-10.
[3]ZHU Z T J.A Multi-source Heterogeneous Vector Space Data Integration Scheme Based on GeoJSON[C]//26th International Conference on Geoinformatics.IEEE,2018:1-4.
[4]TEZGIDER M,YLDZ B,AYDN G.Improving Word Representation by Tuning Word2Vec Parameters with Deep Learning Model[C]//International Conference on Artificial Intelligence and Data Processing(IDPA).IEEE,2018:1-7.
[5]WANG X,LIAO Y,ZHU J,et al.A Low-Dimensional Representation Learning Method for Text Classification and Clustering[C]//IEEE Fifth International Conference on Data Science in Cyberspace (DSC).IEEE,2020:214-217.
[6]CHU Y,FENG C,GUO C.Social-Guided Representation Lear-ning for Images via Deep Heterogeneous Hypergraph Embedding[C]//IEEE International Conference on Multimedia and Expo (ICME).IEEE,2018:1-6.
[7]ZHONG P,GONG Z,LI S,et al.Learning to Diversify Deep Belief Networks for Hyperspectral Image Classification[J].IEEE Transactions on Geoscience and Remote Sensing,2017,55(6):3516-3530.
[8]HUA Y,GUO J,ZHAO H.Deep Belief Networks and DeepLearning[C]//International Conference on Intelligent Computing and Internet of Things (ICIT).IEEE,2015:1-4.
[9]KENTER T,DE RIJKE M.Short Text Similarity with Word Embeddings[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management.2015:1411-1420.
[10]YE J M,LUO D X,CHEN S.Short-text Sentiment EnhancedAchievement Prediction Method for Online Learners[J].Acta Automatica Sinica,2020,46(9):1927-1940.
[11]ZHANG Q,GAO Z M,LIU J Y.Research of Weibo Short Text Classification Based on Word2vec[J].Netinfo Security,2017(1):57-62.
[12]ZHANG P,HE Z S.Using data-driven feature enrichment of text representation and ensemble technique for sentence-level polarity classification[J].Journal of Information Science.2015,41(4):531-549.
[13]LAI S W,XU L H,LIU K,et al.Recurrent Convolutional Neural Networks for Text Classification[C]//Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence.2015:2267-2273.
[14]CHEN Q,YAO L,YANG J.Short text classification based on LDA topic model[C]//International Conference on Audio,Language and Image Processing (ICALIP).IEEE,2016:749-753.
[15]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:1103-1114.
[16]WANG Y Y.Relationship Between Linear Convolution and Circular Convolution of Discrete Sequence[J].Sichuan University of Arts and Science Journal,2015,25(5):32-35.
[17]WANG J H,LIU X Q,LI R X.Summary of Understanding and Calculation of Discrete Linear Convolution[J].Science & Technology Vision,2016(27):300-304.
[18]YANG Y,WANG J,YANG Y.Improving SVM classifier with prior knowledge in microcalcification detection1[C]//The International Conference on Image Processing (ICIP).IEEE,2012:2837-2840.
[19]JOELSSON S R,BENEDIKTSSON J A,SVEINSSON J R.Feature Selection for Morphological Feature Extraction using Random Forests[C]//Norwegian Signal Processing Symposium.IEEE,2006:10-13.
[20]NAPA K K,VIGNESWARI M,KRISHNA M V,et al.An Optimized Random Forest Classifier for Diabetes Mellitus[M]// Emerging Technologies in Data Mining and Information Security.Berlin:Springer,2018:765-773.
[21]PATIL S,KULKARNI U.Accuracy Prediction for Distributed Decision Tree using Machine Learning Approach[C]//Procee-dings of the Third International Conference on Trends in Electronics and Informatics (ICOEI).IEEE,2019:1365-1371.
[22]RADHIKA P R,NAIR R A S,VEENA G.A Comparative Studyof Lung Cancer Detection using Machine Learning Algorithms[C]//IEEE International Conference on Electrical,Computer and Communication Technologies (ICECCT).IEEE,2019:1-4.
[23]SINGH G,KUMAR B,GAUR L,et al.Comparison betweenMultinomial and Bernoulli Naïve Bayes for Text Classification[C]//International Conference on Automation,Computational and Technology Management (ICACTM).IEEE,2019:593-596.
[24]ZHANG D,WANG J,ZHAO X,et al.A Bayesian Hierarchical Model for Comparing Average F1 Scores[C]//International Conference on Data Mining.IEEE,2015:589-598.
[1] 王营丽, 姜聪聪, 冯小年, 钱铁云. 时间感知的兴趣点推荐方法[J]. 计算机科学, 2021, 48(9): 43-49.
[2] 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究[J]. 计算机科学, 2021, 48(9): 50-58.
[3] 郑苏苏, 关东海, 袁伟伟. 融合不完整多视图的异质信息网络嵌入方法[J]. 计算机科学, 2021, 48(9): 68-76.
[4] 赫晓慧, 邱芳冰, 程淅杰, 田智慧, 周广胜. 基于边缘特征融合的高分影像建筑物目标检测[J]. 计算机科学, 2021, 48(9): 140-145.
[5] 张新峰, 宋博. 一种基于改进三元组损失和特征融合的行人重识别方法[J]. 计算机科学, 2021, 48(9): 146-152.
[6] 官铮, 邓扬琳, 聂仁灿. 光谱重建约束非负矩阵分解的高光谱与全色图像融合[J]. 计算机科学, 2021, 48(9): 153-159.
[7] 黄晓生, 徐静. 基于PCANet的非下采样剪切波域多聚焦图像融合[J]. 计算机科学, 2021, 48(9): 181-186.
[8] 张晓宇, 王彬, 安卫超, 阎婷, 相洁. 基于融合损失函数的3D U-Net++脑胶质瘤分割网络[J]. 计算机科学, 2021, 48(9): 187-193.
[9] 谢良旭, 李峰, 谢建平, 许晓军. 基于融合神经网络模型的药物分子性质预测[J]. 计算机科学, 2021, 48(9): 251-256.
[10] 赵金龙, 赵中英. 基于异质信息网络表示学习与注意力神经网络的推荐算法[J]. 计算机科学, 2021, 48(8): 72-79.
[11] 乔颖婧, 高保禄, 史瑞雪, 刘璇, 王朝辉. 融合Tamura纹理特征的改进FCM脑MRI图像分割算法[J]. 计算机科学, 2021, 48(8): 111-117.
[12] 叶中玉, 吴梦麟. 融合时序监督和注意力机制的脉络膜新生血管分割[J]. 计算机科学, 2021, 48(8): 118-124.
[13] 王施云, 杨帆. 基于U-Net特征融合优化策略的遥感影像语义分割方法[J]. 计算机科学, 2021, 48(8): 162-168.
[14] 田嵩旺, 蔺素珍, 杨博. 基于多判别器的多波段图像自监督融合方法[J]. 计算机科学, 2021, 48(8): 185-190.
[15] 李琳, 刘学亮, 赵烨, 纪平. 结合乐高滤波器和SSD的低光照图像融合检测方法[J]. 计算机科学, 2021, 48(7): 213-218.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[2] 施超,谢在鹏,柳晗,吕鑫. 基于稳定匹配的容器部署策略的优化[J]. 计算机科学, 2018, 45(4): 131 -136 .
[3] 冉正,罗蕾,晏华,李允. AUTOSAR可运行实体-任务自动映射方法研究[J]. 计算机科学, 2018, 45(4): 190 -195 .
[4] 秦克云,林洪. 决策形式背景属性约简的关系[J]. 计算机科学, 2018, 45(4): 257 -259 .
[5] 王正理,谢添,何琨,金燕. 考虑时间因素的0-1背包调度问题[J]. 计算机科学, 2018, 45(4): 53 -59 .
[6] 李慧,周林,辛文波. 基于双层规划的网络化防空作战编队结构优化[J]. 计算机科学, 2018, 45(4): 266 -272 .
[7] 赵利博,刘奇,付方玲,何凌. 基于小波变换和倒谱分析的腭裂高鼻音等级自动识别[J]. 计算机科学, 2018, 45(4): 278 -284 .
[8] 邓霞, 常乐, 梁俊斌, 蒋婵. 移动机会网络组播路由的研究进展[J]. 计算机科学, 2018, 45(6): 19 -26 .
[9] 崔一辉, 宋伟, 彭智勇, 杨先娣. 基于差分隐私的多源数据关联规则挖掘方法[J]. 计算机科学, 2018, 45(6): 36 -40 .
[10] 张昱, 高克宁, 于戈. 一种融合节点属性信息的社会网络链接预测方法[J]. 计算机科学, 2018, 45(6): 41 -45 .