Started in January,1974(Monthly)
Supervised and Sponsored by Chongqing Southwest Information Co., Ltd.
ISSN 1002-137X
CN 50-1075/TP
CODEN JKIEBK
Editors
    Content of Big Data & Data Science in our journal
        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Attention-based Multi-scale Distillation Anomaly Detection
    QIAO Hong, XING Hongjie
    Computer Science    2024, 51 (6A): 230300223-11.   DOI: 10.11896/jsjkx.230300223
    Abstract44)      PDF(pc) (4867KB)(83)       Save
    In the anomaly detection method based on knowledge distillation,the teacher network is much larger than the student network,so that the obtained feature representation has different visual fields corresponding to the image at the same position.In order to solve this problem,the structure of student network and teacher network can be the same.However,However,in the testing phase,the same student network and teacher network will lead to too small difference in their feature representation,which will affect the performance of anomaly detection.In order to solve this problem,ECA based multi-scale knowledge distillation anomaly detection(ECA-MSKDAD) is proposed,and a relative distance loss function is proposed based on data enhancement operation.The pre-trained network is used as the teacher network,and the network with the same network structure as the teacher network is used as the student network.In the training stage,the data enhancement operation is adopted for the training samples to expand the scale of the training set,and the efficient channel attention(ECA) module is introduced into the student network to increase the difference between the teacher network and the student network,increase the reconstruction error of the abnormal data and improve the detection performance of the model.In addition,the relative distance loss function is used to transfer the relationship between data from the teacher network to the student network,and the network parameters of the student network are optimized.Experiments on MVTec AD show that compared with nine related methods,the proposed method achieves better performance in anomaly detection and anomaly localization.
    Reference | Related Articles | Metrics
    Hierarchical Traffic Flow Prediction Model Based on Graph Autoencoder and GRU Network
    ZHAO Ziqi, YANG Bin, ZHANG Yuanguang
    Computer Science    2024, 51 (6A): 230400148-6.   DOI: 10.11896/jsjkx.230400148
    Abstract78)      PDF(pc) (3354KB)(95)       Save
    Accurate traffic flow prediction information not only provides traffic administrator with a strong foundation for traffic decisions,but also eases congestion.In traffic flow forecasting tasks,obtaining valid spatiotemporal characteristics of the traffic flow is a prerequisite to ensure the effectiveness of the forecast.Most of the existing methods use data from future moments for supervised learning,and the extracted features have limitations.To address the problem that existing prediction models cannot fully exploit the spatiotemporal characteristics of traffic flows,this paper proposes a hierarchical traffic prediction model based on an improved graph autoencoder and gated recurrent unit.The graph attention autoencoder is first used to deeply explore the spatial characteristics of the traffic flow in an unsupervised manner,and then the gated recurrent unit is used to extract temporal features.The hierarchical structure uses separate training for learning spatio-temporal dependencies,aiming to capture the naturally existing spatial topological features of the road network and make it compatible with traffic flow prediction tasks at different time steps.Extensive experiments demonstrate that the proposed GAE-GRU model achieves excellent performance in traffic prediction tasks on different datasets,with MAE,RMSE and MAPE outperforming the baseline model.
    Reference | Related Articles | Metrics
    Imbalanced Data Oversampling Method Based on Variance Transfer
    ZHENG Yifan, WANG Maoning
    Computer Science    2024, 51 (6A): 230400198-6.   DOI: 10.11896/jsjkx.230400198
    Abstract55)      PDF(pc) (2961KB)(85)       Save
    Resampling is an important method to solve imbalanced data classification problem.However,when the size of data set is very small,undersampling will lose important information of the data set,so oversampling is the research focus of imbalanced data classification.Although the existing oversampling methods partially solve the problem of imbalance between classes,they essentially do not introduce additional information to minority class,and there is still a risk of overfitting.To solve these problems,VTO,an oversampling method based on variance migration of the majority class,is proposed in this paper.In this method,a shift vector is extracted from majority class,and the feature weight matrix of the minority class and the majority class is used for adjustment.Furthermore,the shift vectors filtered by the confidence conditions are superimposed to the center of the minority class,so as to introduce the majority class variance in the generation process of new minority class samples,then enrich the minority class feature space.In order to verify the effectiveness of the proposed algorithm,decision tree is used as classification model to train on 6 KEEL data sets.Compared with SMOTEENN and other over-sampling methods,with F-score and PR-AUC values as evaluation indexes,the results show that VTO is more advantageous in dealing with imbalanced data classification.
    Reference | Related Articles | Metrics
    Cancer Subtype Prediction Based on Similar Network Fusion Algorithm
    ZHANG Xiaoxi, LI Dongxi
    Computer Science    2024, 51 (6A): 230500006-7.   DOI: 10.11896/jsjkx.230500006
    Abstract40)      PDF(pc) (3298KB)(84)       Save
    Mining the interaction relationship between genes from gene expression data and construct gene regulatory network is one of the important research topics in bioinformatics.However,the current popular neural network only considers the interaction and association between genes in its architecture,and does not consider the interaction and association between patients.Therefore,a cancer subtype prediction model based on the fusion algorithm of weighted gene similarity network and sample similarity network,namely WGCSS,is proposed in this paper.In this method,the fusion of feature space and sample space information is realized,and the interaction between genes and samples is considered,and the graph convolutional network is used for prediction.Aggregating information in two spaces will lead to a serious oversmoothing problem.Therefore,a residual layer is introduced in the model to alleviate the oversmoothing problem.This method can make the prediction of cancer subtypes more accurate by aggregating the data information in the two spaces.To verify the generalization performance of the method,datasets of invasive breast carcinoma(BRCA),glioblastoma multiforme(GBM),and LUNG(LUNG) are used for analysis,and the resulting high classification accuracy demonstrates the superiority of the method.Survival analysis is also performed on three types of data sets,and it is proved that the method has significant differences in the survival curves of cancer subtypes in three cancer datasets.
    Reference | Related Articles | Metrics
    K-step Reachability Query Algorithm for Large Graphs
    TONG Zhengnan, BU Tianming
    Computer Science    2024, 51 (6A): 230500031-10.   DOI: 10.11896/jsjkx.230500031
    Abstract45)      PDF(pc) (2449KB)(99)       Save
    The k-step reachable query is used to answer whether there exists a path of length not exceeding k between two points in agiven directed acyclic graph(DAG).To address the problems of large index size and low query processing efficiency of existing methods,this paper proposes a multiplicative index built on a large graph based on tree cover to improve index query efficiency,and combines GRAIL algorithm and the improved FELINE algorithm for pruning the point pairs of inherently unreachable queries.The paper conducts experimental tests based on 19 real datasets and compares with existing algorithms in three metrics:index size,index time,and query time.The experimental results verify the efficiency of the proposed algorithm in this paper.
    Reference | Related Articles | Metrics
    Study on Communication Simulation of Online Hot Search Topics Based on SEIR Model
    YIN Yanyan, WANG Keke, TIAN Jiaojiao, LI Mo, XUE Yaxin, LU Chunyu, ZHAO Yunpeng
    Computer Science    2024, 51 (6A): 230500107-6.   DOI: 10.11896/jsjkx.230500107
    Abstract66)      PDF(pc) (2902KB)(110)       Save
    Online hot search topics have the phenomenon of dissemination and diffusion.The current research on online hot search topics mainly focuses on the evaluation of communication effects,prediction of communication trends,social impact evaluation and public opinion guidance,while the research on online hot search topics fails to reveal the impact of communication dynamics parameters on the communication process.In this paper,the SEIR model is used to construct the dynamic model of the online hot search topic propagation,and the influence of the network average degree,the distrust probability,the immediate transmission probability after contact,the infection rate,the cure rate and the recurrence rate on the model are analyzed.
    Reference | Related Articles | Metrics
    User Interest Recognition Method Incorporating Category Labels and Topic Information
    KANG Zhiyong, LI Bicheng, LIN Huang
    Computer Science    2024, 51 (6A): 230500169-8.   DOI: 10.11896/jsjkx.230500169
    Abstract64)      PDF(pc) (2114KB)(109)       Save
    The discovery of social media user interest is of great significance in information overload alleviation,personalized recommendation,and positive guidance of information dissemination.Existing research of interest recognition fails to consider the help of topic information and corresponding category labels information for model learning text features at the same time.Therefore,a user interest recognition method incorporating category labels and topic information is proposed.Firstly,semantic features of text and label sequences are extracted separately by using the BERT pre-trained model,BiLSTM model,and multi-head self-attention mechanism.Then,a label attention mechanism is introduced to make the model pay more attention to the words related to the text’s corresponding category label.Secondly,text topic features are obtained by using the LDA topic model and Word2Vec model.Subsequently,a gating mechanism is designed for feature fusion to enable the model to adaptively merge multiple features,thereby realizing text interest classification.Finally,the number of texts published by users in each interest category is counted,and the interest category with the highest count is determined as users’ interest recognition results.To verify the effectiveness of the proposed method,a Weibo users’ interest recognition dataset is constructed.Experimental results show that the model achieves optimal performance in Weibo text classification and user interest recognition tasks.
    Reference | Related Articles | Metrics
    CTGANBoost:Credit Fraud Detection Based on CTGAN and Boosting
    ZHUO Peiyan, ZHANG Yaona, LIU Wei, LIU Zijin, SONG You
    Computer Science    2024, 51 (6A): 230600199-7.   DOI: 10.11896/jsjkx.230600199
    Abstract67)      PDF(pc) (2382KB)(94)       Save
    In the financial industry,credit fraud detection is an important task,which can reduce a lot of economic losses for banks and consumer institutions.However,there are problems of class imbalance and overlapping features of positive and negative samples in credit data,which lead to low sensitivity of minority class recognition and low data discrimination.To address these pro-blems,a CTGANBoost method is proposed for credit fraud detection.First,in each Boosting iteration of AdaBoost,the conditional tabular generative adversarial network(CTGAN) method based on class label information constraint is introduced to learn feature distribution for minority class data augmentation.Secondly,based on the enhanced data set synthesized by CTGAN,a weight normalization method is designed to ensure that the distribution characteristics and relative weights of the original data set are maintained during the sample weighting process.Experimental results on three open source datasets show that CTGANBoost outperforms other mainstream credit fraud detection methods,with AUC values increase by 0.5%~2.0% and F1 values increase by 0.6%~1.8%,which verifies the effectiveness and generalization ability of CTGANBoost method.
    Reference | Related Articles | Metrics
    Improved K-means Photovoltaic Energy Data Cleaning Method Based on Autoencoder
    PENG Bo, LI Yaodong, GONG Xianfu
    Computer Science    2024, 51 (6A): 230700070-5.   DOI: 10.11896/jsjkx.230700070
    Abstract60)      PDF(pc) (2046KB)(115)       Save
    The development of smart grids has brought about a massive amount of energy data,and data quality is the foundation for tasks such as data value mining.However,during the collection and transmission process of large-scale photovoltaic energy data from multiple sources,it is inevitable to encounter abnormal data,thus requiring data cleaning.Currently,traditional statistical machine learning-based data cleaning models have certain limitations.This paper proposes an improved K-means clustering model based on the Transformer autoencoder structure for energy big data cleaning.It adaptively determines the number of clusters using the elbow method and utilizes autoencoder networks to compress and reconstruct data within clusters,thereby detecting and recovering abnormal data.Additionally,the proposed model employs the multi-head attention mechanism of Transformer to learn the relevant features among the data,enhancing the screening capability for abnormal data.Experimental results on a publicly available photovoltaic power generation dataset demonstrate that,compared to other methods,the proposed model achieves better performance in detecting abnormal data,with a screening accuracy of over 96%.Moreover,it is capable of recovering abnormal data to a certain extent,providing effective support for the application of energy big data.
    Reference | Related Articles | Metrics
    Study on Industrial Defect Augmentation Data Filtering Based on OOD Scores
    YIN Xudong, CHEN Junyang, ZHOU Bo
    Computer Science    2024, 51 (6A): 230700111-7.   DOI: 10.11896/jsjkx.230700111
    Abstract65)      PDF(pc) (4415KB)(78)       Save
    In deep learning-based industrial defect detection,data augmentation plays a crucial role in mitigating the scarcity of defect data.However,the effective selection of augmented data from a vast pool of candidates remains an unexplored area,hampering the performance enhancement of industrial detection models.To address this issue,this study focuses on the research of industrial defect augmentation data filtering based on out-of-distribution(OOD) scores.The proposed approach involves the generation of industrial enhancement data using the pix2pix network.Subsequently,OOD scores are computed using a deep ensemble-based scoring method,which facilitates the grouping of augmented data based on their OOD scores.Furthermore,the distribution of the augmented data is analyzed through dimensionality reduction and projection views.Finally,defect detection of the grouped augmented data is performed using object detection algorithms,while investigating the impact of the out-of-distribution degree on the quality of the augmented data through the accuracy gain of the object detection model.Experimental results demonstrate a substantial difference in the distribution between industrial defect augmented data with higher OOD scores and the training data.Incorporating this subset of augmented data for training data expansion enhances the generalization of the model and significantly improves the detection accuracy of the object detection algorithm.
    Reference | Related Articles | Metrics
    Study on Client Selection Strategy and Dataset Partition in Federated Learning Basedon Edge TB
    ZHOU Tianyang, YANG Lei
    Computer Science    2024, 51 (6A): 230800046-6.   DOI: 10.11896/jsjkx.230800046
    Abstract79)      PDF(pc) (3558KB)(125)       Save
    Federated learning is one of the applications of distributed machine learning in reality.In view of the heterogeneity in Federated learning,based on FedProx algorithm,this paper proposes a client selection strategy that preferentially selects the client with large near end items.The effect is better than the common client selection strategy that selects the client with large local loss value,which can effectively improve the Rate of convergence of FedProx algorithm under heterogeneous data and systems,and improve the accuracy within limited aggregation times.According to the hypothesis of heterogeneous data in federated learning,a set of heterogeneous data partition process is designed,and the heterogeneous federated dataset based on the real image dataset is obtained as the experimental dataset.Using the open-source distributed machine learning framework Edge-TB as the experimental testing platform and the heterogeneous partitioned Cifar10 as the dataset,the experiment proves that,using the new client selection strategy,the accuracy of the improved FedProx algorithm improves by 14.96%,and the communication overhead reduces by 6.3% compared to the original algorithm in a limited number of aggregation round.Compared with the SCAFFOLD algorithm,the accuracy is improved by 3.6%,communication overhead is reduced by 51.7%,and training time is reduced by 15.4%.
    Reference | Related Articles | Metrics
    Study on Three-level Short Video User Portrait Based on Improved Topic Model Method
    HUANG Yumin, ZHAO Chanchan
    Computer Science    2024, 51 (6A): 230800093-7.   DOI: 10.11896/jsjkx.230800093
    Abstract50)      PDF(pc) (3055KB)(128)       Save
    Aiming at the problem of how to quickly extract accurate user interests from massive short video data,user data and interactive data,a three-level label user portrait construction method based on topic model is proposed.Based onthe topic construction method,the video topic words obtained by the fused LDA and GSDMM topic models are used as user interest expression vectors.Firstly,an LDA filter is built to eliminate the topic-independent text information by comparing the threshold,so as to reduce the scale of the text and reduce the influence of non-main corpus on the generation of interest expression vector.Then,the construction method of the feature word weight matrix combining semantic information and context information is proposed.The Bi-GRU neural network is used to calculate the context feature of the word vector as the context feature,and the word frequency weight calculated by the TF-IDF algorithm is used as the semantic feature.Combining context and semantic features to expand the meaning of feature words.Finally,the GSDMM model with interest weight distribution is used to learn the feature vector weight matrix,and the user interest tag generation and the interest weight correction under the influence of different user preferences are realized.Experiments show that this method can represent user portraits more completely and accurately,which is better than single topic construction method,and performs well in clustering effect.By constructing a complete user portrait,the user’s pain points could be accurately grasp,so as to provide services for subsequent personalized recommendation.
    Reference | Related Articles | Metrics
    Ontology-driven Study on Information Structuring of Aeronautical Information Tables
    LAI Xin, LI Sining, LIANG Changsheng, ZHANG Hengyan
    Computer Science    2024, 51 (6A): 230800150-7.   DOI: 10.11896/jsjkx.230800150
    Abstract51)      PDF(pc) (3804KB)(82)       Save
    The aeronautical information publication(AIP) is the main carrier recommended by ICAO to present aeronautical information of all countries,in which a large amount of aeronautical data and aeronautical operation restriction information exists in the form of table information.In order to achieve intelligent querying of AIP and to facilitate the extraction and utilization of static data within it,it is necessary to perform feature extraction and structural processing on the tabular information within AIP.In this paper,an ontology-driven structured extraction method for aeronautical information tabular data is proposed,taking tabular data in AIP as the research object.Firstly,the ontology framework of aeronautical information is constructed to realize a unified and standardized description of domain knowledge.Secondly,the layout structure of form documents is studied and preprocessed using Document AI,and the feature entity extraction is verified and analyzed using random forest algorithm and conditional random field model(CRF).Experimental results show that the proposed method can effectively extract the feature entities in AIP,and provide reference for the in-depth mining of static data in the field of aeronautical information.
    Reference | Related Articles | Metrics
    RM-RT2NI:A Recommendation Model with Review Timeliness and Trusted Neighbor Influence
    HAN Zhigeng, ZHOU Ting, CHEN Geng, FU Chunshuo, CHEN Jian
    Computer Science    2024, 51 (6A): 230800160-7.   DOI: 10.11896/jsjkx.230800160
    Abstract60)      PDF(pc) (2500KB)(82)       Save
    While recommendation models based on matrix factorization can handle high-dimensional rating data,they are prone to challenges posed by data sparsity in ratings.Recommendation models that incorporate both ratings and reviews alleviate the sparsity issue by incorporating latent user preferences and item attribute information embedded in reviews.However,these models often neglect the review timeliness and the trusted neighbor influence during feature extraction,resulting in limited acquisition of comprehensive user and item characteristics.In order to enhance accuracy further,a novel recommendation model named RM-RT2NI is proposed,which integrates the review timeliness and the trusted neighbor influence.Built upon the rating matrix,this model employs matrix factorization to extract shallow features representing user preferences and item attributes.It employs cloud mo-deling,a refined user similarity assessment model,and a newly constructed credibility assessment model to capture the trusted neighbor influence.Leveraging the textual content of reviews,BERT is utilized to obtain latent representations of individual reviews.Bi-directional GRU is employed to capture inter-review relationships,while an attention mechanism incorporating timeliness is introduced to evaluate the timeliness contribution of each review,thus deriving deep features for users and items.Subsequently,the shallow and deep user features,along with the credibility-enhanced neighboring influence features,are fused to form comprehensive user representations.Similarly,shallow and deep item features are merged with this fused representation to gene-rate comprehensive item representations.These representations are then fed into a fully connected neural network to predict user-item ratings.Experimental evaluation is conducted on five publicly available datasets.The results demonstrate that,in comparison to seven baseline models,RM-RT2NI exhibits superior rating prediction accuracy,yielding an average RMSE reduction of 3.0657%.
    Reference | Related Articles | Metrics
    Diversified Recommendation Based on Light Graph Convolution Networks and ImplicitFeedback Enhancement
    HUANG Chungan, WANG Guiping, WU Bo, BAI Xin
    Computer Science    2024, 51 (6A): 230900038-11.   DOI: 10.11896/jsjkx.230900038
    Abstract62)      PDF(pc) (3648KB)(87)       Save
    In recent years,researchers have been striving to improve the accuracy of recommendation systems while ignoring the critical impact of diversity on user satisfaction.Most current diversifiedrecommendation algorithms impose diversity constraints after the accuracy candidate list generated by traditional post-processing algorithms.However,this decoupled design consistently results in a sub-optimal system.Meanwhile,although the effectiveness of recommendation algorithms using graph convolution networks(GCN) in improving recommendation accuracy has been demonstrated,the applicability and diversity design for recommendation remain neglected.In addition,recommendation algorithms employing a single explicit user feedback of purchasing inevitably fall into “recommendation overload”.Therefore,an end-to-end diversified light graph convolution networks recommendation(DLGCRec) is proposed to overcome these drawbacks.Firstly,GCN is simplified to light graph convolution networks(LGCN) to be suitable for recommendation,and LGCN is utilized to push diversity upstream to the recommendation process of accuracy match.Then,in the sampling phase of LGCN,diversity-boosted negative sampling that introduces user implicit feedback is utilized to explore the user’s diversified preferences.Finally,a multi-layer feature fusion strategy is utilized to capture the complete feature embedding of the nodes to enhance the recommendation performance.Experimental results on real datasets validate the effectiveness of DLGCRec in applying in recommendations and enhancing diversity.Further ablation studies confirm that DLGCRec effectively mitigates the accuracy-diversity dilemma.
    Reference | Related Articles | Metrics
    Data Augmentation for Cardiopulmonary Exercise Time Series of Young HypertensivePatients Based on Active Barycenter
    HUANG Fangwan, LU Juhong, YU Zhiyong
    Computer Science    2023, 50 (6A): 211200233-11.   DOI: 10.11896/jsjkx.211200233
    Abstract348)      PDF(pc) (2816KB)(371)       Save
    The gradual rise of precision medicine,such as mining cardiopulmonary exercise time series of young hypertensive patients,can understand the response of different individuals to aerobic exercise training.This helps to improve the efficiency of hypertension management plan and achieve aerobic exercise intervention more effectively.One of the bottlenecks in this study is that it is difficult to obtain sufficient sample data.To solve the above problem,this paper adopts the weighted dynamic-time-warping barycenter averaging algorithm(WDBA) to realize data augmentation of time series,focusing on the barycenter selection and the weight assignment.In this paper,the concept of active barycenter is introduced for the first time,and the selection strategies of representative barycenter and diversity barycenter are proposed to improve the effect of data augmentation.Furthermore,aiming at the shortcomings of the existing weight assignment strategies,a random strategy with decreasing distance is proposed to further improve the generalization ability of the model by avoiding the synthesis of duplicate samples.Experimental results show that the accuracy of predicting the efficacy of aerobic exercise intervention in young hypertensive patients can be further improved by considering both the barycenter selection and the weight assignment for data augmentation in the background of this study.
    Reference | Related Articles | Metrics
    Explainable Constraint Mechanism for Modeling Temporal Sentiment Memory in Sequential Recommendation
    ZHENG Lin, LIN Yixuan, ZHOU Donglin, ZHU Fuxi
    Computer Science    2023, 50 (6A): 220100066-8.   DOI: 10.11896/jsjkx.220100066
    Abstract478)      PDF(pc) (3165KB)(435)       Save
    In recent years,the research of sequential recommendation has developed rapidly in the recommendation field,existing methods are good at capturing users’ sequential behavior to achieve preference prediction.Among them,some advanced methods integrate users’ sentiment information to guide behavior mining.However,the advanced sentiment-based models do not consider mining relations between multi-category user sentiment sequences.Moreover,such methods cannot intuitively explain the contribution of temporal sentiments to user preferences.To make up for the above shortcomings,this paper first attempts to store temporal sentiments in the form of memory and impose constraints on them.Specifically,this research proposes two mechanisms including sentiment self-constraint and sentiment mutual-constraint to explore the associations between multiple categories of sentiments and assist user behaviors in completing sequential recommendations.Furthermore,the proposed memory framework is able to record users’ temporal sentiment attention,so that it can provide a certain degree of intuitive explanation on the basis of accurately predicting users’ temporal preference.Experimental results show that our approach outperforms existing state-of-the-art sequential methods,and it has better explainable effects than the sentiment-based sequential recommendation models.
    Reference | Related Articles | Metrics
    Study on Multibeam Sonar Elevation Data Prediction Based on Improved CNN-BP
    XIONG Haojie, WEI Yi
    Computer Science    2023, 50 (6A): 220100161-4.   DOI: 10.11896/jsjkx.220100161
    Abstract361)      PDF(pc) (2363KB)(306)       Save
    In order to establish an accurate multibeam sonar elevation data prediction model and solve the problem of the accuracy of air-squared prediction of artificial reefs,a multibeam sonar elevation data prediction method based on a combined model of improved convolutional neural network(CNN) and BP neural network is proposed.First,the improved CNN is used to extract topographic trend features by full convolutional operation of the elevation data,and then input to BP to further explore the internal topographic trend change pattern,so as to achieve the prediction of multibeam sonar elevation data.Experiments are conducted with multibeam sonar elevation data from a submarine ranch and cross-validated using the null square volume of artificial reefs.Finally,it is compared with the traditional kriging,BP,GA-BP,and PSO-BP models.The results show that the improved CNN-BP model performs the best prediction results on multibeam sonar elevation data and artificial reef air-square volume,which verifies the feasibility,reliability and high accuracy of the proposed method.
    Reference | Related Articles | Metrics
    Analysis of Academic Network Based on Graph OLAP
    YANG Heng, ZHU Yan
    Computer Science    2023, 50 (6A): 220100237-5.   DOI: 10.11896/jsjkx.220100237
    Abstract361)      PDF(pc) (2894KB)(329)       Save
    In recent years,academia has gradually accumulated a large amount of data.As an effective method for representing and analyzing big data,network structure has rich dimensions and can model a large amount of data in real life.Graph online analytic processing(Graph OLAP) technology inherits the related ideas of traditional OLAP technology,allowing users to analyze multi-dimensional network data from different angles and granularities.However,most of the existing graph OLAP technologies revolve around the construction of data cubes,and most of the related operations are simple extensions of traditional OLAP technologies on graph data,and the built models have weak ability to mine the topology of the network itself.To this end,the aca-demic network constellation model and related graph OLAP analysis algorithms are firstly designed,which more clearly highlights the topological structure information of academic networks and improves the analysis ability of graph OLAP.Secondly,the corresponding materialization strategy is proposed,which effectively improves the efficiency of graph OLAP analysis.
    Reference | Related Articles | Metrics
    Local Community Detection Algorithm for Attribute Networks Based on Multi-objective Particle Swarm Optimization
    ZHOU Zhiqiang, ZHU Yan
    Computer Science    2023, 50 (6A): 220200015-6.   DOI: 10.11896/jsjkx.220200015
    Abstract222)      PDF(pc) (2651KB)(309)       Save
    Community structure is an important feature in complex networks,and the goal of local community detection is to query a community subgraph containing a set of seed nodes.Traditional local community detection algorithms usually use the topology of the network for community query,ignoring the rich node attribute information in the network.A local community detection algorithm based on multi-objective particle swarm optimization is proposed for realistic and widespread attribute networks.Firstly,attribute relationship edges are constructed based on the attribute similarity between nodes and their multi-order neighbours,and topological relationship edges are obtained by weighting the network structure based on the motif information,followed by sampling the two relationship edges around the core nodes using a random walk algorithm to obtain alternative node sets.Based on this,the alternative node sets are iteratively filtered by a multi-objective particle swarm optimization algorithm to obtain a topologically tight and attribute-homogeneous community structure.Experimental results on real datasets show that the proposed method improves the performance of local community detection.
    Reference | Related Articles | Metrics
    Spatial-Temporal Graph-CoordAttention Network for Traffic Forecasting
    LIU Jiansong, KANG Yan, LI Hao, WANG Tao, WANG Hailing
    Computer Science    2023, 50 (6A): 220200042-7.   DOI: 10.11896/jsjkx.220200042
    Abstract364)      PDF(pc) (2713KB)(386)       Save
    Traffic prediction is an important research component of urban intelligent transportation systems to make our travel more efficient and safer.Accurately predicting traffic flow remains a huge challenge due to complex temporal and spatial depen-dencies.In recent years,graph convolutional network(GCN) has shown great potential for traffic prediction,but GCN-based mo-dels tend to focus on capturing temporal and spatial dependencies,ignoring the dynamic correlation between temporal and spatial dependencies and failing to integrate them well.In addition,previous approaches use real-world static traffic networks to construct spatial adjacency matrices,which may ignore the dynamic spatial dependencies.To overcome these limitations and improve the performance of the model,a novel spatial-temporal Graph-CoordAttention network(STGCA) is proposed.Specifically,the spatial-temporal synchronization module is proposed to model the spatial-temporal dependence of the crossing relations at different moments.Then,a dynamic graph learning scheme is proposed to mine potential graph information based on data correlation between traffic flows.Compared with the existing baseline models on four publicly available datasets,STGCA exhibits excellent perfor-mance.
    Reference | Related Articles | Metrics
    Recommendation Model Based on Decision Tree and Improved Deep & Cross Network
    KE Haiping, MAO Yijun, GU Wanrong
    Computer Science    2023, 50 (6A): 220300084-7.   DOI: 10.11896/jsjkx.220300084
    Abstract177)      PDF(pc) (2920KB)(396)       Save
    Feature mining is a key step to learn the interaction between users and items in the recommendation algorithm model,which is of great significance to improve the accuracy of the recommendation model.Among the existing feature mining models,although the linear logistic regression model is simple and can achieve good fitting effect,its generalization ability is weak,and the model has a large demand for feature parameters.Deep & Cross network can effectively realize the cross extraction of features,but its representation ability of data features is still insufficient.Therefore,by introducing the idea of multiple residual structure and cross coding,an improved recommendation model of Deep & Cross network based on decision tree is proposed.Firstly,it designs a tree structure based on GBDT algorithm to construct enhanced features,which strengthens the deep mining of the model on potential features.Secondly,the input parameter dimension of the embedded layer of the model is amplified and optimized.Finally,the improved Deep & Cross network recommendation model is used for recommendation prediction.This design can not only break the limitations of existing models in generalization ability,but also keep the feature parameters simple and strengthen their representation ability,so as to effectively mine the hidden associations of users and improve the accuracy of recommendation.Experimental results based on the public test data set show that the prediction effect of the proposed model is better than the exis-ting feature interaction methods.
    Reference | Related Articles | Metrics
    Dynamic Neighborhood Density Clustering Algorithm Based on DBSCAN
    ZHANG Peng, LI Xiaolin, WANG Liyan
    Computer Science    2023, 50 (6A): 220400127-7.   DOI: 10.11896/jsjkx.220400127
    Abstract235)      PDF(pc) (3072KB)(347)       Save
    The traditional density clustering algorithms do not consider the attribute difference between data points in the clustering process,but treat all data points as homogenous points.Based on the traditional DBSCAN algorithm,a dynamic neighborhood--density based spatial clustering of applications with noise(DN-DBSCAN) is proposed.When it is working,each point’s neighborhood radius is determined by the properties of itself,so the neighborhood radius is dynamic changing.Thus,different influences on datasets produced by points with different properties is reflected in the clustering results,making the density clustering algorithm has more practical meaning and can be more reasonable to solve practical problems.On the basis of example analysis,the DN-DBSCAN algorithm is applied to solve the urban agglomeration division problem in the Yangtze river delta,and the results of DBSCAN algorithm,OPTICS algorithm and DPC algorithm are compared and analyzed.The results show that DN-DBSCAN algorithm can reasonably classify urban agglomerations in the Yangtze river delta according to the different attributes of each city with an accuracy of 95%,which is much higher than the accuracy of 85%,85% and 88% of the other three algorithms respectively,indicating that it has a better ability to solve practical problems.
    Reference | Related Articles | Metrics
    Temporal Hierarchical Data Management Based on Nested Intervals Scheme in Relational Database
    YANG Zhenkai, CAO Yibing, ZHAO Xinke, ZHENG Jingbiao
    Computer Science    2023, 50 (6A): 220500290-5.   DOI: 10.11896/jsjkx.220500290
    Abstract162)      PDF(pc) (2361KB)(281)       Save
    Temporal hierarchical data is a kind of hierarchical data characterized by time dimension description and is used to model the hierarchical structure that changes over time.Compared with management methods for common hierarchical data,there are still problems in temporal hierarchical data management such as the complexity of storage scheme design and inefficiency of query and update.To solve the above problems,a temporal hierarchical data management method based on nested intervals scheme is proposed.4 types of change in hierarchical data are firstly analyzed from the perspective of the node change,based on which the storage and query capabilities of multi-version nodes in a rational database are then realized by extending the time labels.Finally,the abundantly gapped nested intervals scheme(AGNIS) is put forward to solve the problem of data insertion inefficiency in common nested intervals scheme.Experiments based on the data of Chinese administrative division and its adjustment from 2021 to 2022 show that the proposed method can implement the storage of historical hierarchical data and the query of hie-rarchical snapshot at any time,with a high efficiency in data query and update operation.
    Reference | Related Articles | Metrics
    Improved Forest Optimization Feature Selection Algorithm for Credit Evaluation
    HUANG Yuhang, SONG You, WANG Baohui
    Computer Science    2023, 50 (6A): 220600241-6.   DOI: 10.11896/jsjkx.220600241
    Abstract305)      PDF(pc) (1795KB)(255)       Save
    Credit evaluation is a key problem in finance,which predicts whether a user is at risk of defaulting and thus reduces bad debt losses.One of the key challenges in credit evaluation is the presence of a large number of invalid or redundant features in the dataset.To solve this problem,an improved feature selection using forest optimization algorithm(IFSFOA) is proposed.It addresses the shortcomings of the original algorithm FSFOA by using a cardinality check-based initialization strategy instead of randomized initialization in the initialization phase to improve the algorithm’s search capability;using a multi-level variation strategy in the local seeding phase to optimize the local search capability and solve the problems of restricted search space and localization of FSFOA;using a greedy selection strategy to select high-quality trees and eliminate low-quality trees when updating the candidate forest.In updating the candidate forest,we use the greedy selection strategy to select high-quality trees and eliminate low-quality trees,and converge the search dispersion process.Finally,the results show that IFSFOA outperforms FSFOA and more efficient feature selection algorithms proposed in recent years in terms of classification ability and dimension reduction ability,and validates the effectiveness of IFSFOA by setting up comparison experiments on public credit evaluation datasets covering low,medium and high dimensions.
    Reference | Related Articles | Metrics
    GDLIN:A Learned Index By Gradient Descent
    CHEN Shanshan, GAO Jun, MA Zhenyu
    Computer Science    2023, 50 (6A): 220600256-6.   DOI: 10.11896/jsjkx.220600256
    Abstract169)      PDF(pc) (2402KB)(343)       Save
    In the era of big data,data access speed is an important indicator to measure the performance of large-scale storage systems.Index is one of the main technologies to improve data access performance in database system.In recent years,learned index(LI) is proposed,which uses machine learning models instead of traditional B+-tree indexes,leverages pattern about the under-lying data distribution to train the models and optimize the indirect search of data query into the direct search of function calculation,learned index can speed up queries and reduce the size of an index.However,the fitting effect of LI is general,and it assumes that the data is static and read-only,it does not support modification operations such as insertion.This paper presents GDLIN,a novel form of a learned index,which uses gradient descent algorithm to fit the data.Gradient descent algorithm can reduce the error between the predict position and the actual position,which can reduce the cost of local research.Besides,GDLIN recursive calls the construction algorithm until only one model is created,which makes full use of keys’ distribution,and avoids the increase of the size of index with the data volume.In addition,GDLIN uses the sorted linked list to address the problem of data insertion.Experiment results demonstrate GDLIN improves the lookup throughput by 2.1× compared with the traditional B+-trees without insertion.Besides,GDLIN improves the lookup performance by 1.08× compared with the LI when the factor of insertion is 0.5.
    Reference | Related Articles | Metrics
    City Traffic Flow Prediction Method Based on Dynamic Spatio-Temporal Neural Network
    MENG Xiangfu, XU Ruihang
    Computer Science    2023, 50 (6A): 220600266-7.   DOI: 10.11896/jsjkx.220600266
    Abstract425)      PDF(pc) (2489KB)(353)       Save
    Traffic flow forecasting is of great importance to urban road planning,traffic safety issues and building smart cities.However,most existing traffic prediction models cannot capture the dynamic spatio-temporal correlation of traffic data well enough to obtain satisfactory prediction results.To address this problem,a dynamic spatio-temporal neural network-based city traffic flow prediction method is proposed to solve the traffic flow prediction problem.First,by modelling the nearest cycle dependence,daily cycle dependence and weekly cycle dependence of the traffic data,a 3D convolutional neural network is used on each component to extract the high-dimensional features of urban traffic.Then,an improved residual structure is used to capture the correlation between remote area pairs and the prediction area,and a fusion of spatial attention and temporal attention mechanisms is used to capture the dynamic correlation between traffic flows in different time periods in different areas.Finally,the outputs of the three components are weighted and fused using a parameter matrix-based approach to obtain the prediction results.Experiments on two publicly available datasets,TaxiBJ and BikeNYC,show that the proposed model outperforms the mainstream traffic forecasting models.
    Reference | Related Articles | Metrics
    Anomaly Detection of Time-series Based on Multi-modal Feature Fusion
    ZHANG Guohua, YAN Xuefeng, GUAN Donghai
    Computer Science    2023, 50 (6A): 220700094-7.   DOI: 10.11896/jsjkx.220700094
    Abstract459)      PDF(pc) (2243KB)(580)       Save
    Effective anomaly detection of multivariate time series is important for data mining analysis.However,most of the exi-sting detection methods are based on single modality,they cannot effectively utilize the distribution information of time series in multi-modal space.For multi-modal features,there is no effective adaptive fusion method and extraction method of spatial-temporal dependence.In this paper,a time series anomaly detection method based on multi-modal feature fusion is proposed.The multi-modal feature adaptive fusion module is established,it can adaptively fuse the multi-modal features through convolution network and soft selection mode.The spatial-temporal attention module is proposed,it is composed of temporal attention and spatial attention.It extracts spatial-temporal dependence of the multi-modal features and outputs the spatial-temporal attention vector.Then the model prediction results are obtained based on the spatial-temporal attention vector.By learning the distribution of normal samples,anomaly detection result is obtained according to the error measure between the predicted values and the real values.The proposed method is compared with other state-of-the-art models on four public datasets,and results demonstrate its effectiveness.
    Reference | Related Articles | Metrics
    Review on Methods and Applications of Text Fine-grained Emotion Recognition
    WANG Xiya, ZHANG Ning, CHENG Xin
    Computer Science    2023, 50 (6A): 220900137-7.   DOI: 10.11896/jsjkx.220900137
    Abstract444)      PDF(pc) (1927KB)(506)       Save
    Emotional information contained in massive texts on the Internet expresses public views and attitudes.How to identify and utilize emotional resources has become the focus of research in various fields.By combing the relevant theories and literature on fine-grained emotion recognition,this paper summarizes the classification methods and application scenarios,and discusses the technical challenges and practical gaps.Through analysis,it is found that fine-grained emotion recognition methods mainly include emotion lexicon,traditional machine learning and neural network learning,which are mostly used in business analysis and public opinion management.In view of the future research trend,firstly,the real-time updating of online emotion words,domain lexicon construction and semantic analysis technology can be studied.Secondly,how to improve the automatic classification of training data and build a semi-supervised learning model need to be further discussed.In addition,the research of business analysis and public opinion management can explore the integration of aspect extraction and emotion recognition.This paper summarizes and comments on emotion recognition technology and its application,which can provide a reference for the subsequent research.
    Reference | Related Articles | Metrics
    Tripartite Evolutionary Game Analysis of Medical Data Sharing Under Blockchain Architecture
    YANG Jian, WANG Kaixuan
    Computer Science    2023, 50 (6A): 221000080-7.   DOI: 10.11896/jsjkx.221000080
    Abstract532)      PDF(pc) (3024KB)(454)       Save
    To promote the development of health and medical big data and actively promote the safe sharing of medical data,this paper constructs a tripartite evolutionary game model of the system manager,data provider and data demander based on the blockchain architecture.Firstly,prospect theory is combined with evolutionary game,and the parameters of traditional evolutio-nary game are improved by the prospect value function.Secondly,the possibility of game equilibrium and its evolution trend are discussed.Finally,the influence of different factors on the decision-making of each participant in medical data sharing under blockchain architecture is discussed through numerical simulation.The results show that the choice of initial strategy has a signi-ficant influence on the stability of game strategy.The evolution of the system can be accelerated by improving the regulatory bene-fits of the system manager,reducing the perceived losses of the data provider,and improving the compensation of the data demander for actively reporting non-compliance behaviors,thus enhancing the trust of all participants and promoting the formation of trust relationships.
    Reference | Related Articles | Metrics
      First page | Prev page | Next page | Last page Page 1 of 7, 182 records