计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 51-58.doi: 10.11896/jsjkx.200900194
纪南巡, 孙晓燕, 李祯其
JI Nan-xun, SUN Xiao-yan, LI Zhen-qi
摘要: 随着移动网络和APPs的发展,包含用户评价、标记、打分、图像和视频等多源异构数据的用户生成内容(User Generated Contents,UGC)成为提高个性化服务质量的重要依据,对这些数据的融合和表示学习成为其应用的关键。对此,提出一种面向多源文本和图像的融合表示学习。采用Doc2vec和LDA模型,给出多源文本的向量化表示,采用深度卷积网络获取与评价文本相关的图像特征;给出多源文本向量化表示的多策略融合机制,以及文本和图像卷积融合的表示学习。将所提算法应用于亚马逊含UGC内容的商品数据集上,基于UGC向量化表示物品的分类准确率说明了该算法的可行性和有效性。
中图分类号:
[1]WANG J J,MA Y Q,CHEN S T,et al.Fragmentation know-ledge processing and networked artificial intelligence[J].Scientia Sinica Informations,2017,47(2):171-192. [2]HUA B L,LI G J.Discussion on Theory and Application ofMulti-Source Information Fusion in Big Data Environment[J].Library and Information Service,2015,59(16):5-10. [3]ZHU Z T J.A Multi-source Heterogeneous Vector Space Data Integration Scheme Based on GeoJSON[C]//26th International Conference on Geoinformatics.IEEE,2018:1-4. [4]TEZGIDER M,YLDZ B,AYDN G.Improving Word Representation by Tuning Word2Vec Parameters with Deep Learning Model[C]//International Conference on Artificial Intelligence and Data Processing(IDPA).IEEE,2018:1-7. [5]WANG X,LIAO Y,ZHU J,et al.A Low-Dimensional Representation Learning Method for Text Classification and Clustering[C]//IEEE Fifth International Conference on Data Science in Cyberspace (DSC).IEEE,2020:214-217. [6]CHU Y,FENG C,GUO C.Social-Guided Representation Lear-ning for Images via Deep Heterogeneous Hypergraph Embedding[C]//IEEE International Conference on Multimedia and Expo (ICME).IEEE,2018:1-6. [7]ZHONG P,GONG Z,LI S,et al.Learning to Diversify Deep Belief Networks for Hyperspectral Image Classification[J].IEEE Transactions on Geoscience and Remote Sensing,2017,55(6):3516-3530. [8]HUA Y,GUO J,ZHAO H.Deep Belief Networks and DeepLearning[C]//International Conference on Intelligent Computing and Internet of Things (ICIT).IEEE,2015:1-4. [9]KENTER T,DE RIJKE M.Short Text Similarity with Word Embeddings[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management.2015:1411-1420. [10]YE J M,LUO D X,CHEN S.Short-text Sentiment EnhancedAchievement Prediction Method for Online Learners[J].Acta Automatica Sinica,2020,46(9):1927-1940. [11]ZHANG Q,GAO Z M,LIU J Y.Research of Weibo Short Text Classification Based on Word2vec[J].Netinfo Security,2017(1):57-62. [12]ZHANG P,HE Z S.Using data-driven feature enrichment of text representation and ensemble technique for sentence-level polarity classification[J].Journal of Information Science.2015,41(4):531-549. [13]LAI S W,XU L H,LIU K,et al.Recurrent Convolutional Neural Networks for Text Classification[C]//Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence.2015:2267-2273. [14]CHEN Q,YAO L,YANG J.Short text classification based on LDA topic model[C]//International Conference on Audio,Language and Image Processing (ICALIP).IEEE,2016:749-753. [15]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:1103-1114. [16]WANG Y Y.Relationship Between Linear Convolution and Circular Convolution of Discrete Sequence[J].Sichuan University of Arts and Science Journal,2015,25(5):32-35. [17]WANG J H,LIU X Q,LI R X.Summary of Understanding and Calculation of Discrete Linear Convolution[J].Science & Technology Vision,2016(27):300-304. [18]YANG Y,WANG J,YANG Y.Improving SVM classifier with prior knowledge in microcalcification detection1[C]//The International Conference on Image Processing (ICIP).IEEE,2012:2837-2840. [19]JOELSSON S R,BENEDIKTSSON J A,SVEINSSON J R.Feature Selection for Morphological Feature Extraction using Random Forests[C]//Norwegian Signal Processing Symposium.IEEE,2006:10-13. [20]NAPA K K,VIGNESWARI M,KRISHNA M V,et al.An Optimized Random Forest Classifier for Diabetes Mellitus[M]// Emerging Technologies in Data Mining and Information Security.Berlin:Springer,2018:765-773. [21]PATIL S,KULKARNI U.Accuracy Prediction for Distributed Decision Tree using Machine Learning Approach[C]//Procee-dings of the Third International Conference on Trends in Electronics and Informatics (ICOEI).IEEE,2019:1365-1371. [22]RADHIKA P R,NAIR R A S,VEENA G.A Comparative Studyof Lung Cancer Detection using Machine Learning Algorithms[C]//IEEE International Conference on Electrical,Computer and Communication Technologies (ICECCT).IEEE,2019:1-4. [23]SINGH G,KUMAR B,GAUR L,et al.Comparison betweenMultinomial and Bernoulli Naïve Bayes for Text Classification[C]//International Conference on Automation,Computational and Technology Management (ICACTM).IEEE,2019:593-596. [24]ZHANG D,WANG J,ZHAO X,et al.A Bayesian Hierarchical Model for Comparing Average F1 Scores[C]//International Conference on Data Mining.IEEE,2015:589-598. |
[1] | 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲. 基于无监督集群级的科技论文异质图节点表示学习方法 Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level 计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196 |
[2] | 吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇. 基于异质信息网的短文本特征扩充方法 Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network 计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241 |
[3] | 曹晓雯, 梁美玉, 鲁康康. 基于细粒度语义推理的跨媒体双路对抗哈希学习模型 Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model 计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011 |
[4] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[5] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[6] | 吴子仪, 李邵梅, 姜梦函, 张建朋. 基于自注意力模型的本体对齐方法 Ontology Alignment Method Based on Self-attention 计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190 |
[7] | 秦琪琦, 张月琴, 王润泽, 张泽华. 基于知识图谱的层次粒化推荐方法 Hierarchical Granulation Recommendation Method Based on Knowledge Graph 计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111 |
[8] | 陈晶, 吴玲玲. 多源异构环境下的车联网大数据混合属性特征检测方法 Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment 计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273 |
[9] | 魏恺轩, 付莹. 基于重参数化多尺度融合网络的高效极暗光原始图像降噪 Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising 计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179 |
[10] | 沈祥培, 丁彦蕊. 多检测器融合的深度相关滤波视频多目标跟踪算法 Multi-detector Fusion-based Depth Correlation Filtering Video Multi-target Tracking Algorithm 计算机科学, 2022, 49(8): 184-190. https://doi.org/10.11896/jsjkx.210600004 |
[11] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[12] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[13] | 陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079 |
[14] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[15] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
|