Started in January,1974(Monthly)
Supervised and Sponsored by Chongqing Southwest Information Co., Ltd.
ISSN 1002-137X
CN 50-1075/TP
CODEN JKIEBK
Editors
    Content of Database & Big Data & Data Science in our journal
        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Sequential Recommendation Based on Multi-space Attribute Information Fusion
    WANG Zihong, SHAO Yingxia, HE Jiyuan, LIU Jinbao
    Computer Science    2024, 51 (3): 102-108.   DOI: 10.11896/jsjkx.230600078
    Abstract92)      PDF(pc) (2941KB)(180)       Save
    The goal of sequential recommendation is to model users' dynamic interests from their historical behaviors,and hence to make recommendations related to the users' interests.Recently,attribute information has been demonstrated to improve the performance of sequential recommendation.Many efforts have been made to improve the performance of sequential recommendation based on attribute information fusion,and have achieved success,but there are still some deficiencies.First,they do not explicitly model user preferences for attribute information or only model one attribute information preference vector,which cannot fully express user preferences.Second,the fusion process of attribute information in existing works does not consider the in-fluence of user personalized information.Aiming at the above-mentioned deficiencies,this paper proposes sequential recommendation based on multi-space attribute information fusion(MAIF-SR),and proposes a multi-space attribute information fusion framework,fuse attribute information sequence in different attri-bute information spaces and model user preferences for different attribute information,fully expressing user preferences using multi-dimensional interests.A personalized attribute attention mechanism is designed to introduce user personalized information during the fusion process,enhance the personalized effect of the fusion information.Experimental results on two public data sets and one industrial private data set show that MAIF-SR is superior to other comparative sequential recommendation models based on attribute information fusion.
    Reference | Related Articles | Metrics
    MMOS:Memory Resource Sharing Methods to Support Overselling in Multi-tenant Databases
    XU Haiyang, LIU Hailong, YANG Chaoyun, WANG Shuo, LI Zhanhuai
    Computer Science    2024, 51 (2): 27-35.   DOI: 10.11896/jsjkx.231000141
    Abstract95)      PDF(pc) (3501KB)(1066)       Save
    This paper presents an oversold memory resource sharing method for multi-tenant databases in an online analysis and processing scenario.The current static resource allocation strategy,which assigns a fixed resource quota to each tenant,leads to suboptimal resource utilization.To enhance resource utilization and platform revenue,it is important to share unused free resources among tenants without impacting their performance.While existing resource sharing methods for multi-tenant databases primarily focus on CPU resources,there is a lack of memory resource sharing methods that support overselling.To address this gap,the paper introduces a novel approach MMOS that accurately forecasts the memory requirements interval of each tenant and dynamically adjusts their resource allocation based on the upper limit of the interval.This allows for efficient management of free memory resources,enabling support for more tenants and achieving memory overselling while maintaining optimal performance.Experimental results demonstrate the effectiveness of the proposed method in dynamically changing tenant load scenarios.With different resource pools,the number of supported tenants can be increased by 2~2.6 times,leading to a significant increase in peak resource utilization by 175%~238%.Importantly,the proposed method ensures that the business and performance of each tenant remain unaffected.
    Reference | Related Articles | Metrics
    Multivariate Time Series Classification Algorithm Based on Heterogeneous Feature Fusion
    QIAO Fan, WANG Peng, WANG Wei
    Computer Science    2024, 51 (2): 36-46.   DOI: 10.11896/jsjkx.230100135
    Abstract160)      PDF(pc) (3986KB)(1228)       Save
    With the advance of big data and sensors,multivariable time series classification has been an important problem in data mining.Multivariate time series are characterized by high dimensionality,complex inter-dimensional relations,and variable data forms,which makes the classification methods generate huge feature spaces,and it is difficult to select discriminative features,resulting in low accuracy and hindering the interpretability.Therefore,a multivariate time series classification algorithm based on heterogeneous feature fusion is proposed in this paper.The proposed algorithm integrates time-domain,frequency-domain,and interval-based features.Firstly,a small number of representative features of different types are extracted for each dimension.Then,features of all dimensions are fused by multivariable feature transformation to learn the classifier.For univariate feature extraction,the algorithm generates different types of feature candidates based on tree structure,and then a clustering algorithm is designed to aggregate redundant and similar features to obtain a small number of representative features,which effectively reduces the number of features and enhances the interpretation of the method.In order to verify the effectiveness of the algorithm,expensive experiments are conducted on the public UEA dataset,and the proposed algorithm is compared with the existing multivariate time series classification methods.The results prove that the proposed algorithm is more accurate than the comparison methods,and the feature fusion is reasonable.What’s more,the interpretability of classification results is showed by case study.
    Reference | Related Articles | Metrics
    Fusion Model of Housekeeping Service Course Recommendation Based on Knowledge Graph
    ZOU Chunling, ZHU Zhengzhou
    Computer Science    2024, 51 (2): 47-54.   DOI: 10.11896/jsjkx.221200149
    Abstract107)      PDF(pc) (3638KB)(1065)       Save
    Housekeeping service practitioners’ demand for online learning of housekeeping service courses has increased.How-ever,the existing online learning websites of housekeeping service courses have few resources,insufficient systematic courses and no course recommendation function,which makes the threshold of online learning for housekeeping service practitioners become higher.Based on the analysis of the existing online learning websites of housekeeping service courses,this paper proposes to construct the knowledge graph of housekeeping service courses,and integrates the knowledge graph of housekeeping service courses with the recommendation algorithm,and designs an R-RippleNet recommendation model for housekeeping service courses that combines the rules of deep learning technology and the water-wave preference propagation.The objects used by R-RippleNet model include old students and new students.The old students make course recommendation based on the water wave preference propagation model,while the new students make course recommendation based on the rule model.Experimental results show that the AUC value of old trainees using R-RippleNet model is 95%,ACC value is 89%,F1 value is 89%,the mean of the overall accuracy rate of new trainees using R-RippleNet model is 77%,the mean of NDCG is 93%.
    Reference | Related Articles | Metrics
    Knowledge Graph and User Interest Based Recommendation Algorithm
    XU Tianyue, LIU Xianhui, ZHAO Weidong
    Computer Science    2024, 51 (2): 55-62.   DOI: 10.11896/jsjkx.221200169
    Abstract125)      PDF(pc) (2466KB)(1041)       Save
    In order to solve the problems of cold start and data sparsity in the collaborative filtering recommendation algorithm,the knowledge graph with rich semantic information and path information is introduced in this paper.Based on its graph structure,the recommendation algorithm which applies graph neural network to knowledge graph is favored by researchers.The core of the recommendation algorithm is to obtain item features and user features,however,research in this area focuses on better expressing item features and ignoring the representation of user features.Based on the graph neural network,a recommendation algorithm based on knowledge graph and user interest is proposed.The algorithm constructs user interest by introducing an independent user interest capture module,learning user historical information and modeling user interest,so that it is well represented in both users and items.Experimental results show that on the MovieLens dataset,the recommendation algorithm based on knowledge graph and user interest realizes the full use of data,has good results and promotes the accuracy of recommendation.
    Reference | Related Articles | Metrics
    Time Series Clustering Method Based on Contrastive Learning
    YANG Bo, LUO Jiachen, SONG Yantao, WU Hongtao, PENG Furong
    Computer Science    2024, 51 (2): 63-72.   DOI: 10.11896/jsjkx.221200038
    Abstract94)      PDF(pc) (4208KB)(1038)       Save
    It is difficult to intuitively define the similarity between time series by deep clustering methods which rely heavily on complex feature extraction networks and clustering algorithms.Contrastive learning can define the interval similarity of time series from the perspective of positive and negative sample data and jointly optimize feature extraction and clustering.Based on the contrastive learning,this paper proposes a time series clustering model that does not rely on complex representation networks.In order to solve the problem that the existing time series data enhancement methods cannot describe the transformation invariance of time series,this paper proposes a new data enhancement method that captures the similarity of sequences while ignoring the time domain characteristics of data.The proposed clustering model constructs positive and negative sample pairs by setting diffe-rent shape transformation parameters,learns feature representation,and uses cross-entropy loss to maximize the similarity of positive sample pairs and minimize negative sample pairs at the instance-level and cluster-level comparison.The proposed model can jointly learn feature representation and cluster assignment in end-to-end fashion.Extensive experiments on 32 datasets in UCR show that the proposed model can obtain equal or better performance than existing methods without relying on a specific representation learning network.
    Reference | Related Articles | Metrics
    Logical Regression Click Prediction Algorithm Based on Combination Structure
    GUO Shangzhi, LIAO Xiaofeng, XIAN Kaiyi
    Computer Science    2024, 51 (2): 73-78.   DOI: 10.11896/jsjkx.230100052
    Abstract88)      PDF(pc) (2183KB)(964)       Save
    With the rapid development of the Internet and advertising platforms,in the face of massive advertising information,in order to improve the user click rate,an improved logical regression click prediction algorithm,logical regression of combination structure(LRCS) based on composite structure is proposed.The algorithm is based on different types of features,which may have different audiences.First,FM is used to combine features to generate two types of combined features.Secondly,a kind of feature combination is used as clustering algorithm for clustering.Finally,another type of feature combination is input into the segmented GBDT+logical regression combination model generated by clustering for prediction.Through multi angle verification in two public datasets,and compared with other commonly used click prediction algorithms,it shows that LRCS has a certain performance improvement in click prediction.
    Reference | Related Articles | Metrics
    Fuzzy Systems Based on Regular Vague Partitions and Their Approximation Properties
    PENG Xiaoyu, PAN Xiaodong, SHEN Hanhan, HE Hongmei
    Computer Science    2024, 51 (2): 79-86.   DOI: 10.11896/jsjkx.221100229
    Abstract73)      PDF(pc) (1930KB)(990)       Save
    This paper is devoted to investigating the approximation problem of fuzzy systems based on different fuzzy basis functions.Firstly,the multi-dimensional regular vague partitions are established based on one-dimensional regular vague partitions and overlap functions,and the fuzzy systems are designed by taking the elements in the partition as the fuzzy basis functions.With the help of the Weierstrass approximation theorem,the conclusion that the fuzzy systems are universal approximators is obtained,and the corresponding approximation error bounds are presented.Secondly,this paper proposes the polynomial,exponential and logarithmic fuzzy systems,and gives their approximation error bounds with the parameters of membership functions.Finally,experiments are designed to compare the approximation capability of different fuzzy systems.Experimental results further verify the correctness of the theoretical analysis.
    Reference | Related Articles | Metrics
    Label Noise Filtering Framework Based on Outlier Detection
    XU Maolong, JIANG Gaoxia, WANG Wenjian
    Computer Science    2024, 51 (2): 87-99.   DOI: 10.11896/jsjkx.221100264
    Abstract79)      PDF(pc) (6215KB)(1034)       Save
    Noise is an important factor affecting the reliability of machine learning models,and label noise has more decisive in-fluence on model training than feature noise.Reducing label noise is a key step in classification tasks.Filtering noise is an effective way to deal with label noise,and it neither requires estimating the noise rate nor relies on any loss function.However,most filtering algorithms may cause overcleaning phenomenon.To solve this problem,a label noise filtering framework based on outlier detection is proposed firstly,and a label noise filtering algorithm via adaptive nearest neighbor clustering(AdNN) is then presented.AdNN transforms the label noise detection into the outlier detection problem.It considers samples in each category separately,and all outliers will be identified.Samples belong to outliers will be ignored according to relative density,and real label noise belong to outliers will be found and removed by defined noise factor.Experiments on some synthetic and benchmark datasets show that the proposed noise filtering method can not only alleviate the overcleaning phenomenon,but also obtain good noise filtering effect and classification prediction performance.
    Reference | Related Articles | Metrics
    Non-negative Matrix Factorization Parallel Optimization Algorithm Based on Lp-norm
    HUANG Lulu, TANG Shuyu, ZHANG Wei, DAI Xiangguang
    Computer Science    2024, 51 (2): 100-106.   DOI: 10.11896/jsjkx.230300040
    Abstract56)      PDF(pc) (2443KB)(981)       Save
    Non-negative matrix factorization algorithm is an important tool for image clustering,data compression and feature extraction.Traditional non-negative matrix factorization algorithms mostly use Euclidean distance to measure reconstruction error,which has shown its effectiveness in many tasks,but still has the problems of suboptimal clustering results and slow convergence.To solve these problems,the loss function of non-negative matrix factorization is reconstructed by Lp-norm to obtain better clustering results by adjusting the coefficient p.Based on the collaborative optimization theory and Majorization-Minimization algorithm,this paper uses the particle swarm optimization to solve the non-negative matrix factorization problem of reconstruction in parallel.The feasibility and effectiveness of the proposed method is verified in real datasets,and the experimental results show that the proposed algorithm significantly improves program execution efficiency and outperforms the traditional non-negative matrix decomposition algorithm in a series of evaluation metrics.
    Reference | Related Articles | Metrics
    Survey of Inferring Information Diffusion Networks
    WANG Yuchen, GAO Chao, WANG Zhen
    Computer Science    2024, 51 (1): 99-112.   DOI: 10.11896/jsjkx.230500127
    Abstract132)      PDF(pc) (2237KB)(1895)       Save
    Information diffusion can be modeled as a stochastic process over a network.However,the topology of an underlying diffusion network and the pathways of spread are often not visible in real-world scenarios.Therefore,the inference of diffusion networks becomes critical in the analysis and understanding of the diffusion process,tracking the pathways of spread,and even predicting future contagion events.There has been a surge of interest in diffusion network inference over the past few years.This paper investigates and summarizes the representative research in the field of diffusion network inference.Finally,this paper analyzes the existing problems of diffusion network inference and provides a new perspective on this field.
    Reference | Related Articles | Metrics
    Generation Algorithm of Temporal Networks with Anchor Communities
    ZHENG Shuwen, WANG Chaokun
    Computer Science    2024, 51 (1): 113-123.   DOI: 10.11896/jsjkx.231000153
    Abstract114)      PDF(pc) (4552KB)(1982)       Save
    Algorithms for network analysis tasks require synthetic graph datasets to evaluate their effectiveness and efficiency.Real-world graph data not only possess topological features such as community structures,but also contain temporal information revealing evolutionary semantics.Nodes of real-world communities may interact with each other within a specific anchor time window.However,existing graph generation methods suffer from some limitations.Most of them concentrate on either static community structures or temporal graphs without community structures,appearing weak in generating communities active during an anchor time period.To surmount their weakness,this paper introduces the concept of anchor community to depict frequent interactions between a group of nodes within an anchor time window.Then it proposes an algorithm to synthesize general temporal networks based on the distribution probability generation model,and further proposes an efficient generation algorithm of temporal networks with anchor communities(GTN-AC),allowing configuration input such as anchor time windows as well as specified distributions of degree and timestamp.Extensive experimental results indicate that compared with other baseline methods,GTN-AC has a faster generation speed while ensuring preferable generation quality.
    Reference | Related Articles | Metrics
    Parallel Transaction Execution Models Under Permissioned Blockchains
    DONG Hao, ZHAO Hengtai, WANG Ziyao, YUAN Ye, ZHANG Aoqian
    Computer Science    2024, 51 (1): 124-132.   DOI: 10.11896/jsjkx.230800201
    Abstract65)      PDF(pc) (1602KB)(1792)       Save
    Most existing permissioned blockchain systems adopt serial transaction execution methods,which cannot take advantage of the high performance of multi-core processors.This serial method will be a performance bottleneck in permissioned blockchains with high performance consensus algorithms.To reduce execution time of transactions in permissioned block-chains with order-execute-validate architecture,two transaction concurrency models are proposed.First,an address table-based parallel execution model is proposed that maps the read and write sets of transactions to the address table through static analysis and constructs a scheduling graph using the address table to achieve parallel execution of transactions without data conflicts.Second,a parallel execution model based on a multi-version timestamp ordering algorithm is proposed,in which the leader node uses a multi-version timestamp ordering algorithm to pre-execute transactions in parallel and stores the scheduling graph into the block in the form of transaction dependency triplets.All validation nodes schedule via transaction dependency triplets to achieve parallel execution of transactions under the premise of consistency.Finally,the two parallel transaction execution models designed in this paper are implemented in Tendermint,and a performance experiment during the transaction execution phase and a performance experiment with multiple nodes are conducted.Experimental results show that the above models reduce the transaction execution time by 68.6% and 28.5% with a single node and 8 threads,and increase the blockchain throughput by about 43.4% and 19.5% with 4 peer nodes and 8 threads per node,respectively.
    Reference | Related Articles | Metrics
    Interest Capturing Recommendation Based on Knowledge Graph
    JIN Yu, CHEN Hongmei, LUO Chuan
    Computer Science    2024, 51 (1): 133-142.   DOI: 10.11896/jsjkx.230500133
    Abstract192)      PDF(pc) (2531KB)(1929)       Save
    As a kind of auxiliary information,knowledge graph can provide more context information and semantic association information for the recommendation system,thereby improving the accuracy and interpretability of the recommendation.By mapping items into knowledge graphs,recommender systems can inject external knowledge learned from knowledge graphs into user and item representations,thereby enhancing user and item representations.However,when learning user preferences,the know-ledge graph recommendation based on graph neural network mainly utilizes knowledge information such as attribute and relationship information in the knowledge graph through project entities.Since user nodes are not directly connected to the knowledge graph,different relational and attribute information are semantically independent and lack correlation regarding user preferences.It is difficult for the recommendation based on the knowledge graph to accurately capture user’s fine-grained preferences based on the information in the knowledge graph.Therefore,to address the difficulty in capturing users’ fine-grained interests,this paper proposes an interest-capturing recommendation algorithm based on a knowledge graph(KGICR).The algorithm leverages the relational and attribute information in knowledge graphs to learn user interests and improve the embedding representations of users and items.To fully utilize the relational information in the knowledge graph,a relational interest module is designed to learn users’ fine-grained interests in different relations.This module represents each interest as a combination of relation vectors in the knowledge graph and employs a graph convolutional neural network to transfer user interests in the user-item graph and the knowledge graph to learn user and item embedding representations.Furthermore,an attribute interest module is also designed to learn users’ fine-grained interests in different attributes.This module matches users and items with similar attributes by splitting and embedding and uses a similar method to the relational interest module for message propagation.Finally,experiments are conducted on two benchmark datasets,and the experimental results demonstrate the effectiveness and feasibility of the proposed method.
    Reference | Related Articles | Metrics
    Pre-training of Heterogeneous Graph Neural Networks for Multi-label Document Classification
    WU Jiawei, FANG Quan, HU Jun, QIAN Shengsheng
    Computer Science    2024, 51 (1): 143-149.   DOI: 10.11896/jsjkx.230600079
    Abstract83)      PDF(pc) (2057KB)(1864)       Save
    Multi-label document classification aims to associate document instances with relevant labels,which has received increasing research attention in recent years.Existing multi-label document classification methods attempt to explore the fusion of information beyond the text,such as document metadata or label structure.However,these methods either simply use the semantic information of metadata or do not consider the long-tail distribution of labels,thereby ignoring higher-order relationships between documents and their metadata and the distribution pattern of labels,which affects the accuracy of multi-label document classification.Therefore,this paper proposes a new multi-label document classification method based on the pre-training of hete-rogeneous graph neural networks.The method constructs a heterogeneous graph based on documents and their metadata,adopts two contrastive pre-training methods to capture the relationship between documents and their metadata,and improves the accuracy of multi-label document classification by balancing the problem of long-tail distribution of labels through a loss function.Experimental results on the benchmark dataset show that the proposed method outperforms Transformer BertXML and MATCH by 8%,4.75%,1.3%,respectively.
    Reference | Related Articles | Metrics
    Geo-sensory Time Series Prediction Based on Joint Model of Auto Regression and Deep NeuralNetwork
    DONG Hongbin, HAN Shuang, FU Qiang
    Computer Science    2023, 50 (11): 41-48.   DOI: 10.11896/jsjkx.230500231
    Abstract100)      PDF(pc) (1856KB)(2423)       Save
    Geo-sensory time series contain complex and dynamic semantic spatio-temporal correlations and geographic spatio-temporal correlations.Although a variety of existing deep learning models have been developed for time series prediction,few of them focus on capturing multi-type of spatial-temporal correlations within geo-sensory time series.In addition,it is challenging to si-multaneously predict the future values of multiple sensors at a certain time step.To address these issues and challenges,this paper proposes a joint model of autoregression and deep neural network(J-ARDNN) to achieve the multi-objective prediction task of geo-sensory time series.In this model,the spatial module is proposed to capture the multi-type spatial correlations between diffe-rent series,the temporal module introduces the temporal convolutional network to extract the temporal dependencies within a single series.Moreover,the autoregression model is introduced to improve the robustness of the J-ARDNN prediction model.To prove the superiority and effectiveness of the J-ARDNN model,the proposed model is evaluated in three real-world datasets from different fields.Experimental results show that the proposed model can achieve better prediction performance than state-of-the-art contrast models.
    Reference | Related Articles | Metrics
    Rumor Detection Model on Social Media Based on Contrastive Learning with Edge-inferenceAugmentation
    LIU Nan, ZHANG Fengli, YIN Jiaqi, CHEN Xueqin, WANG Ruijin
    Computer Science    2023, 50 (11): 49-54.   DOI: 10.11896/jsjkx.221000043
    Abstract135)      PDF(pc) (1741KB)(2515)       Save
    In recent years,in order to deal with various social problems which are caused by the wide spreading of rumors,researchers have developed many deep learning-based rumor detection methods.Although these methods improve detection performance by learning the high-level representation of rumor from its propagation structure,they still suffer the problem of lower reliability and cumulative errors effect,due to the ignoring of edges’ uncertainty when constructing the propagation network.To address such a problem,this paper proposes the edge-inference contrastive learning(EIC) model.EICL first constructs a propagation graph based on timestamps of retweets(comments) for a given message.Then,it augments the event propagation graph to capture the edge uncertainty of the propagation structure by a newly designed edge-weight adjustment strategy.Finally,it employs the contrastive learning technique to solve the sparsity problem of the original dataset and improve the model generalization.Experimental results show that the accuracy of EICL is improved by 2.0% and 3.0% on Twitter15 and Twitter16,respectively,compared with other state-of-the-art baselines,which demonstrate that it can significantly improve the performance of rumor detection on social media.
    Reference | Related Articles | Metrics
    Parallel Mining Algorithm of Frequent Itemset Based on N-list and DiffNodeset Structure
    ZHANG Yang, WANG Rui, WU Guanfeng, LIU Hongyi
    Computer Science    2023, 50 (11): 55-61.   DOI: 10.11896/jsjkx.221000011
    Abstract230)      PDF(pc) (1869KB)(2427)       Save
    Frequent itemset mining is a basic problem of data mining and plays an important role in many data mining applications.In order to solve the problems of the parallel frequent itemset mining algorithm(MrPrePost) in big data environment,such as algorithm efficiency degradation,unbalanced load of computing nodes and redundant search,this paper proposes a parallel frequent itemset mining algorithm(PFIMND),which is based on N-lists and DiffNodeset.Firstly,according to the advantages of N-list and DiffNodeset data structures,the data set sparsity estimation function(SE) is designed,and one of them is selected to store data according to the data set sparsity.Secondly,the computational estimation function(CE) is proposed to estimate the load of each item in the frequent 1-item set F-list,and the load is evenly grouped according to the computational cost.Finally,the set enumeration tree is used as the search space.In order to avoid combination explosion and redundant search problems,the superset pruning strategy and the pruning strategy based on width first searches are designed to generate the final mining results.Experimental results show that compared with the similar algorithm(HP-FIMND),the effect of PFIMND algorithm in mining frequent itemsets on Susy dataset is improved by 12.3%.
    Reference | Related Articles | Metrics
    Clustering Method Based on Contrastive Learning for Multi-relation Attribute Graph
    XIE Zhuo, KANG Le, ZHOU Lijuan, ZHANG Zhihong
    Computer Science    2023, 50 (11): 62-70.   DOI: 10.11896/jsjkx.220900166
    Abstract286)      PDF(pc) (2715KB)(4496)       Save
    In the real world,there are many complex graph data which includes multiple relations between nodes,namely multi-relation attribute graph.Graph clustering is one of the approaches for mining similar information from graph data.However,most existing graph clustering methods assume that only single type of relation exists between nodes.Even for those that considering the multi-relation of a graph,they use only node attributes for training,or regard graph representation learning and clustering as two completely independent processes.Recently,Deep Graph Infomax(DGI) has shown promising results on many downstream tasks.But there are two major limitations for DGI.Firstly,DGI does not fully explore the various relations among nodes.Secondly,DGI does not jointly optimize the graph representation learning and clustering tasks,resulting in suboptimal clustering results.To address the above-mentioned problems,this paper proposes a novel framework,called clustering method based on contrastive learning for multi-relation attribute graph(CCLMAG),for learning the node embedding suitable for clustering in a unsupervised way.To be more specific,1)The community-level mutual information mechanism is applied to solve the problem of ignoring cluster information by DGI;2)the Embedding Fusion Module is augmented to aggregate the embedding of nodes in different relationships;3)the clustering optimization module is added to link the graph representation learning and clustering so that the learned node representation is more suitable for the clustering task,thus enhancing the interpretability of the clustering results.Extensive experimental results on three multi-relation attribute graph datasets and a real-world futures dataset demonstrate the superiority of CCLMAG compared with the state-of-the-art methods.
    Reference | Related Articles | Metrics
    Study on Short Text Clustering with Unsupervised SimCSE
    HE Wenhao, WU Chunjiang, ZHOU Shijie, HE Chaoxin
    Computer Science    2023, 50 (11): 71-76.   DOI: 10.11896/jsjkx.220900214
    Abstract140)      PDF(pc) (2238KB)(252)       Save
    Traditional shallow text clustering methods face challenges such as limited context information,irregular use of words,and few words with actual meaning when clustering short texts,resulting in sparse embedding representations of the text and difficulty in extracting key features.To address these issues,a deep clustering model SSKU(SBERT SimCSE Kmeans Umap) incorporating simple data augmentation methods is proposed in the paper.The model uses SBERT to embed short texts and fine-tunes the text embedding model using the unsupervised SimCSE method in conjunction with the deep clustering KMeans algorithm to improve the embedding representation of short texts to make them suitable for clustering.To improve the sparse features of short text and optimize the embedding results,Umap manifold dimension reduction method is used to learn the local manifold structure.Using K-Means algorithm to cluster the dimensionality-reduced embeddings,and the clustering results are obtained.Extensive experiments are carried out on four publicly available short text datasets,such as StackOverFlow and Biomedical, and compared with the latest deep clustering algorithms.The results show that the proposed model exhibits good clustering performance in terms of both accuracy and standard mutual information evaluation metrics.
    Reference | Related Articles | Metrics
    Graph Clustering Algorithm Based on Node Clustering Complexity
    ZHENG Wenping, WANG Fumin, LIU Meilin, YANG Gui
    Computer Science    2023, 50 (11): 77-87.   DOI: 10.11896/jsjkx.230600003
    Abstract258)      PDF(pc) (4558KB)(2457)       Save
    Graph clustering is an important task in the analysis of complex networks,which can reveal the community structure within a network.However,clustering complexity of nodes varies throughout the network.To address this issue,a graph clustering algorithm based on node clustering complexity(GCNCC)is proposed.It calculates the clustering complexity of nodes and assigns pseudo-labels to nodes with low clustering complexity.Then it uses these pseudo-labels as supervised information to lower the clustering complexity of other nodes to obtain the community structure of the network.The GCNCC algorithm consists of three main modules:node representation,node clustering complexity assessment,and graph clustering.The node representation module represents nodes in a low-dimensional space to maintain the clustering of nodes,the node clustering complexity assessment module identifies low clustering complexity nodes,and assigns them pseudo-labels,which can be used to update the clustering complexity of other nodes.The graph clustering module uses label propagation to spread the pseudo-labels from nodes with low clustering complexity to those with high clustering complexity.Compared with 9 classic algorithms on 3 real citation networks and 3 biological datasets,the proposed GCNCC performed well in terms of ACC,NMI,ARI,and F1.
    Reference | Related Articles | Metrics
    Time-aware Transformer for Traffic Flow Forecasting
    LIU Qidong, LIU Chaoyue, QIU Zixin, GAO Zhimin, GUO Shuai, LIU Jizhao, FU Mingsheng
    Computer Science    2023, 50 (11): 88-96.   DOI: 10.11896/jsjkx.221000201
    Abstract259)      PDF(pc) (3039KB)(2543)       Save
    As a key part of intelligent transportation systems,traffic flow forecasting faces the challenge of long-term prediction inaccuracy.The key factor is that the traffic flow has complicated spatial and temporal correlations.Recently,the emerging success of Transformer has shown promising results in time series analysis.However,there are two obstacles when applying Transformer to traffic flow forecasting:1)it's difficult for the static attention mechanisms to capture the dynamic changes of traffic flow along the space and time dimensions;2)the autoregressive decoder in transformer could cause error accumulation problem.To address the above problems,this paper proposes a time-aware Transformer(TAformer) for traffic flow forecasting.Firstly,it proposes a time-aware attention mechanism that can customize attention calculation solution according to the time features,so as to estimate the spatial and temporal dependencies more accurately.Secondly,it discards the teacher forcing mechanism during the training phase and proposes a non-autoregressive inference method to avoid the problem of error accumulation.Finally,extensive experiments on two real traffic datasets show that the proposed method can effectively capture the spatial-temporal dependence of traffic flow.Compared with the state-of-the-art baseline method,the proposed method improves the performance of long-term prediction by 2.09%~4.01%.
    Reference | Related Articles | Metrics
    Self-supervised Action Recognition Based on Skeleton Data Augmentation and Double Nearest Neighbor Retrieval
    WU Yushan, XU Zengmin, ZHANG Xuelian, WANG Tao
    Computer Science    2023, 50 (11): 97-106.   DOI: 10.11896/jsjkx.230500158
    Abstract95)      PDF(pc) (2753KB)(2403)       Save
    Traditional self-supervised methods based on skeleton data often take different data augmentation of a sample as positive examples,and the rest of the samples are regarded as negative examples,which makes the ratio of positive and negative samples seriously unbalanced,and limits the usefulness of samples with the same semantic information.In order to solve the above problems,this paper proposes a double nearest neighbor retrieval action recognition algorithm named DNNCLR,in which positive samples are not limited by data augmentation.First,a new joint level spatial data augmentation,namely Bodypart augmentation,is designed based on the physical connection of human joints.The input skeleton sequence is randomly replaced with a normal distribution array to obtain high-level semantic embedding.Secondly,in order to avoid the limitation of positive samples by data augmentation,a more reasonable double nearest neighbor retrieval(DNN) positive sample augmentation strategy is proposed,and further,a double nearest neighbor retrieval contrastive loss(DNN Loss) is proposed.Specifically,by using support sets for global retrieval,the search range of the positive sample set is expanded to new data points that cannot be covered by ordinary data augmentation.In the negative sample set,there are positive samples that have been misjudged,which are skeleton samples with the same semantic information but from different videos.Therefore,by using nearest neighbor retrieval again,these potential positive examples are searched from the negative sample set to further expand the positive sample set,and the double nearest neighbor retrieval contrastive loss is further proposed,forcing the model to learn more general feature representations,making the model optimization more reasonable.Finally,the DNNCLR algorithm is applied to the AimCLR model to obtain the AimDNNCLR model,and the model is evaluated linearly on the NTU-RGB+D dataset.Compared with the first line model,the proposed method has an average improvement of 3.6% in accuracy.
    Reference | Related Articles | Metrics
    Community Discovery Algorithm for Attributed Networks Based on Bipartite Graph Representation
    ZHAO Xingwang, XUE Jinfang
    Computer Science    2023, 50 (11): 107-113.   DOI: 10.11896/jsjkx.221000226
    Abstract251)      PDF(pc) (1487KB)(2394)       Save
    Community discovery in attributed networks is an important research content in network data analysis.To improve the accuracy of community discovery,most existing algorithms perform low-dimensional representation of attributed networks by fusing topological and attributed information,and then perform community discovery based on low-dimensional features.Such algorithms,however,are typically based on deep learning models for representation learning,which lack interpretability.Therefore,in order to improve the accuracy and interpretability of community discovery results,this paper proposes a community discovery algorithm for attributed networks based on bipartite graph representation.Firstly,the topological and attributed information of the attributed networks are used to calculate the probability of each node serving as a representative point in the network,and a certain proportion of nodes are chosen as representative points.Secondly,based on the topological structure and node attributes,the distances of each node to the representative points are calculated to construct a bipartite graph.Finally,based on the bipartite graph,the result is obtained by using the spectral clustering algorithm for community discovery.Experiments are carried out on artificial and real attributed networks to compare and analyze the proposed algorithm and the existing algorithms.In terms of evaluation indices such as normalized mutual information and adjusted rand index,experimental results show that the proposed algorithm outperforms the existing algorithms.
    Reference | Related Articles | Metrics
    Road Network Topology-aware Trajectory Representation Learning
    CHEN Jiajun, CHEN Wei, ZHAO Lei
    Computer Science    2023, 50 (11): 114-121.   DOI: 10.11896/jsjkx.221000058
    Abstract194)      PDF(pc) (2889KB)(2448)       Save
    The approaches developed for task trajectory representation learning(TRL) on road networks can be divided into the following two categories,i.e.,recurrent neural network(RNN) and long short-term memory (LSTM) based sequence models,and the self-attention mechanism based learning models.Despite the significant contributions of these studies,they still suffer from the following problems.(1)The methods designed for road network representation learning in existing work ignore the transition probability between connected road segments and cannot fully capture the topological structure of the given road network.(2)The self-attention mechanism based learning models perform better than sequence models on short and medium trajectories but underperform on long trajectories,as they fail to character the long-term semantic features of trajectories well.Motivated by these findings,this paper proposes a new trajectory representation learning model,namely trajectory representation learning on road networks via masked sequence to sequence network(TRMS).Specifically,the model extends the traditional algorithm DeepWalk with a probability-aware walk to fully capture the topological structure of road networks,and then utilizes the Masked Seq2Seq learning framework and self-attention mechanism in a unified manner to capture the long-term semantic features of tra-jectories.Finally,experiments on the real-world datasets demonstrate that TRMS outperforms the state-of-the-art methods in embedding short,medium,and long trajectories.
    Reference | Related Articles | Metrics
    NeuronSup:Deep Model Debiasing Based on Bias Neuron Suppression
    NI Hongjie, LIU Jiawei, ZHENG Haibin, CHEN Yipeng, CHEN Jinyin
    Computer Science    2023, 50 (11): 122-131.   DOI: 10.11896/jsjkx.220900169
    Abstract246)      PDF(pc) (3193KB)(2426)       Save
    With the wide application of deep learning,researchers not only focus on the classification performance of the model,but also need to pay attention to whether the decision of the model is fair and credible.A deep learning model with decision bias may cause great negative effects,so how to maintain the classification accuracy and improve the decision fairness of the model is very important.At present,many methods have been proposed to improve the individual fairness of the model,but there are still shortcomings in the debiasing effect,the availability of the debiased model,and the debiasing efficiency.To this end,this paper analyzes the abnormal activation of neurons when there is individual bias in the deep model,and proposes a model debiasing me-thod NeuronSup based on the inhibition of biased neurons,which has the advantages of significantly reducing individual bias,less impact on the performance of the main task,and low time complexity.To be specific,the concept of bias neuron is first proposed based on the phenomenon that some neurons in the deep model are abnormally activated due to individual bias.Then,the bias neurons are found by using discrimination samples,and the individual bias of the deep model is greatly reduced by suppressing the abnormal activation of bias neurons.And the main task performance neurons are determined according to the maximum weight edge of each neuron.By keeping the main task performance neuron parameters of the deep model unchanged,the influence of debiasing operation on the classification performance of the deep model could be reduced.Because NeuronSup only debiases specific neurons in the deep model,the time complexity is lower and the efficiency is higher.Finally,debiasing experiments on three real datasets with six sensitive attributes,compared with five contrasting algorithms,NeuronSup reduces the individual fairness index THEMIS more than 50%,and at the same time,the impact of the debiasing operation on the classification accuracy of the deep model is reduced to less than 3%,which verifies the effectiveness of NeuronSup in reducing individual bias while ensuring the classification ability of deep model.
    Reference | Related Articles | Metrics
    Natural Noise Filtering Algorithm for Point-of-Interest Recommender Systems
    ZHU Jun, HAN Lixin, ZONG Ping, XU Yiqing, XIA Ji’an, TANG Ming
    Computer Science    2023, 50 (11): 132-142.   DOI: 10.11896/jsjkx.230400045
    Abstract127)      PDF(pc) (4795KB)(2420)       Save
    The inherent natural noise in the original dataset of recommender systems(RSs) causes error and interference to re-commendation algorithms.Existing studies pay more attention to the malicious noise represented by various security attacks.The natural noise which is more subtle and difficult to deal with has rarely been documented.Most researches about natural noise are conducted for conventional RSs.However,the data feature and the causes and forms of natural noise in point-of-interest(POI) RSs are all different from those in conventional RSs.To filter the natural noise for POI RSs,a novel natural noise filtering method(NFDC) based on dispersion quantification and clustering distance analysis is proposed.The dispersion of a subset of the original check-in dataset is defined and calculated to indicate the data-driven uncertainty,and the accuracy metric F1 is adopted to represent the prediction-driven uncertainty.The measures of dispersion and accuracy metric vectors are empirically categorized to identify the proportion of the potential noise.The fuzzy C-means-based denoi-sing algorithm is performed to analyze the similarity of user behavior patterns and then screen the potentially noisy points based on clustering distance analysis.A customized rule is designed to further verify and delete the natural noise.Extensive experiments are conducted on two real-world location-based social network datasets,Brightkite and Gowalla.The datasets processed by NFDC and the other four benchmark algorithms are respectively input into five representative POI recommendation algorithms for comparison.Experimental results show that NFDC effectively filters the natural noise and provides reliable input for RSs.Compared with the highest accuracy supported by other denoi-sing methods,the accuracy in NFDC-processed Brightkite and Gowalla datasets is respectively improved by 15.95% and 5.00% on average.
    Reference | Related Articles | Metrics
    Contrastive Clustering with Consistent Structural Relations
    XU Jie, WANG Lisong
    Computer Science    2023, 50 (9): 123-129.   DOI: 10.11896/jsjkx.220700288
    Abstract328)      PDF(pc) (2489KB)(639)       Save
    As a basic unsupervised learning task,clustering aims to divide unlabeled and mixed images into semantically similar classes.Some recent approaches focus on the ability of the model to discriminate between different semantic classes by introducing data augmentation,using contrastive learning methods to learn feature representations and cluster assignments,which may lead to situations that feature embeddings from samples with the same semantic class are separated.Aiming at the above problems,a comparative clustering method with consistent structural relations(CCR) is proposed,which performs comparative learning at the instance level and cluster level,and adds consistency constraints at the relationship level.So that the model can learn more information of ‘positive data pair' and reduce the impact of cluster embedding being separated.Experimental results show that CCR obtains better results than the unsupervised clustering methods in recent years on the image benchmark dataset.The average accuracy on the CIFAR-10 and STL-10 datasets improves by 1.7% compared to the best methods in the same experimental settings and improves by 1.9% on the CIFAR-100 dataset.
    Reference | Related Articles | Metrics
    Graph Similarity Search with High Efficiency and Low Index
    QIU Zhen, ZHENG Zhaohui
    Computer Science    2023, 50 (9): 130-138.   DOI: 10.11896/jsjkx.220700105
    Abstract177)      PDF(pc) (2297KB)(634)       Save
    Graph similarity search is to search the graph set that is similar to query graph under a measurement,which adopts the “filtering-verification” framework.Aiming at the problems of the existing methods,such as the untight lower bound and the large index space,an improved graph similarity search algorithm(Z-Index) based on query graph partition with multi-level filtering and low index space is proposed.Firstly,the pre-candidate set is obtained by global coarse-grained filtering.Secondly,a query graph partitioning algorithm based on extension probability is proposed,and a hierarchical filtering mechanism is adopted to further shrink the candidate set,so as to enhance the tightness of the lower bound.Finally,the sequence similarity difference is introduced to compute the sparsity of the data contribution.Then partition compression and difference compression algorithm are proposed to construct “zero” index structure,so as to reduce the index space.Experimental results show that Z-Index algorithm has a tighter lower bound,and the candidate set size of Z-Index can be reduced about 50%.Moreover,the algorithm execution time is greatly reduced,and it still shows great scalability in the case of tiny index space.
    Reference | Related Articles | Metrics
    Human Mobility Pattern Prior Knowledge Based POI Recommendation
    YI Qiuhua, GAO Haoran, CHEN Xinqi, KONG Xiangjie
    Computer Science    2023, 50 (9): 139-144.   DOI: 10.11896/jsjkx.220900114
    Abstract361)      PDF(pc) (2574KB)(601)       Save
    Point of interest(POI) recommendation is a fundamental task in location-based social networks,which provides users with personalized place recommendations.However,the current point of interest recommendation is mostly based on learning the user's check-in history at the point of interest in the social network and the user relationship network for recommendation,and the travel rules of urban crowds cannot be effectively used.To solve the above problem,firstly,a human mobility pattern extraction(HMPE) framework is proposed,which takes advantage of graph neural network to extract human mobility pattern.Then attention mechanism is introduced to capture the spatio-temporal information of urban traffic pattern.By establishing downstream tasks and designing upsampling modules,HMPE restores representation vectors to task objectives.An end-to-end framework is built to complete pre-training of human mobility pattern extraction module.Secondly,the human mobility tecommendation(HMRec)algorithm is proposed,which introduces the prior knowledge of crowd movement patterns,so that the recommendation results are more in line with human travel intentions in cities.Extensive experiments show that HMRec is superior to baseline mo-dels.Finally,the existing problems and future research directions of interest point recommendation are discussed.
    Reference | Related Articles | Metrics
      First page | Prev page | Next page | Last page Page 1 of 13, 363 records