Started in January,1974(Monthly)
Supervised and Sponsored by Chongqing Southwest Information Co., Ltd.
ISSN 1002-137X
CN 50-1075/TP
CODEN JKIEBK
Editors
    Content of Databωe & Big Data & Data Science in our journal
        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Application Mode and Challenges of Vehicular Big Data
    GE Yu-ming, HAN Qing-wen, WANG Miao-qiong, ZENG Ling-qiu, LI Lu
    Computer Science    2020, 47 (6): 59-65.   DOI: 10.11896/jsjkx.191200165
    Abstract338)      PDF(pc) (1450KB)(1571)       Save
    With the technical evolution of connected vehicles,people-vehicle-road-cloud are all connected,and a large number of application services emerge which cover many parts such as manufacturing,connected vehicle products,vehicle service market and intelligenttravel service.The core of these applications is big data of vehicles.The effective utilization of vehicular big data may be an important breakthrough in the transformation and upgrading of automotive industry in the future.To promote the application of vehicular big data in connected vehicles,related works are reviewed in this paper.According to the application demands,this paperstarts from the connotation and architecture of vehicular big data,and analyzes the characteristic of data sources and corresponding applications,such as manufacturing,connected vehicle products and vehicle service market,etc.Then the key technologies of vehicular big data are discussed from four aspects,which are data collection,data processing and analysis,computing resource and privacy protection.Based on a comprehensive analysis fits development status of policy and technology,this paper anticipates the future application trend of vehicular big data.
    Reference | Related Articles | Metrics
    Big Data Decomposition-Fusion and Its Intelligent Acquisition
    LIU Ji-qin, SHI Kai-quan
    Computer Science    2020, 47 (6): 66-73.   DOI: 10.11896/jsjkx.191000072
    Abstract282)      PDF(pc) (1484KB)(724)       Save
    The concepts of big data decomposition-fusion and big data distance generated by big data decomposition and fusion are given.By using these concepts,an union-intersection decomposition theorem of big data,an intersection-union decomposition theorem of big data and their attribute conjunction relation are given.Intelligent generation theorems and the distance relationship of big data fusion are given.A recognition criterion of big data decomposition-fusion,an intelligent algorithm and algorithm process of big data decomposition-fusion acquisition are given.The application of these theoretical results in big data decomposition-fusion intelligent acquisition is presented.In this paper,the new characteristics of ∧-type big data are given,∧-type big data is obtained by using P-sets model.
    Reference | Related Articles | Metrics
    Chinese Short Text Summarization Generation Model Based on Semantic-aware
    NI Hai-qing, LIU Dan, SHI Meng-yu
    Computer Science    2020, 47 (6): 74-78.   DOI: 10.11896/jsjkx.190600006
    Abstract610)      PDF(pc) (1482KB)(1506)       Save
    The text summary generation technology can summarize the key information from the massive data and effectively solve the problem of information overload.At present,the sequence-to-sequence model is widely used in the field of English text abstraction generation,but there is no in-depth study on this model in the field of Chinese text abstraction.In the conventional sequence-to-sequence model,the decoder applies the hidden state of each word output by the encoder as the overall semantic information through the attention mechanism,nevertheless the hidden state of each word which encoder outputs only in consideration of the front and back words of current word,which results in the generated summary missing the core information of the source text.To solve this problem,a semantic-aware based Chinese short text summarization generation model called SA-Seq2Seq is proposed,which uses the sequence-to-sequence model with attention mechanism.The model SA-Seq2Seq applies the pre-training model called BERT to introduce source text in the encoder so that each word contains the overall semantic information and uses gold summary as the target semantic information in the decoder to calculate the semantic inconsistency loss,thus ensuring the semantic integrity of the generated summary.Experiments are carried out on the dataset using the Chinese short text summary dataset LCSTS.The experimental results show that the model SA-Seq2Seq on the evaluation metric ROUGE is significantly improved compared to the benchmark model,and its ROUGE-1,ROUGE-2 and ROUGE-L scores increase by 3.4%,7.1% and 6.1% respectively in the dataset that is processed based on character and increase by 2.7%,5.4% and 11.7% respectively in the dataset that is processed based on word.So the SA-Seq2Seq model can effectively integrate Chinese short text and ensure the fluency and consistency of the generated summary,which can be applied to the Chinese short text summary generation task.
    Reference | Related Articles | Metrics
    Noisy Label Classification Learning Based on Relabeling Method
    YU Meng-chi, MU Jia-peng, CAI Jian, XU Jian
    Computer Science    2020, 47 (6): 79-84.   DOI: 10.11896/jsjkx.190600041
    Abstract535)      PDF(pc) (1956KB)(1214)       Save
    The integrity of sample labels has a significant impact on the accuracy of supervised learning algorithms.However,in real data,due to the unprofessional and random nature of the labeling process,the label of the dataset is inevitably polluted by noise,i.e.the assigned label of sample is different from its real label.In order to reduce the negative impact of noise labels on the classification accuracy of classifiers,this paper proposes a noise label correction approach.It firstly identifies the noise label data by applying the base classifier to classify the samples and estimating the noise rate to identify noisy label data,and then uses the base classifier to relabel the noisy samples.As a result,the noisy samples are relabeled to obtain a sample dataset in which the noisy samples are corrected.Experiments on synthetic datasets and real datasets show that the relabel algorithm has a certain improvement effect on classification results under different base classifiers and different types of noise rate interference.Compared with the base classifier,the accuracy of relabel algorithm is improved by about 5% in the synthetic dataset,while in the high noise environment of CIFAR and MNIST datasets,the F1 score of the proposed algorithm is 7% higher than that of Elk08 and Nat13 on average,and is improved by 53% compared with base classifier.
    Reference | Related Articles | Metrics
    Data Composition View Positioning Update Approach with Incremental Logs
    ZHANG Yuan-ming, LI Meng-ni, HUANG Lang-you, LU Jia-wei, XIAO Gang
    Computer Science    2020, 47 (6): 85-91.   DOI: 10.11896/jsjkx.190500085
    Abstract218)      PDF(pc) (2398KB)(611)       Save
    Data resources stored in different units and departments in cloud environment are cross-domain,heterogeneous and complex.As a unified data model for cross-origin and heterogeneous data sources,data service can publish data sources in the form of services,and generate data composition view by composing several data services according to users’ data requirements.Since the data sources are autonomous,it becomes a key issue to update data composition view in real time with minimal cost.This paper proposes a data composition view positioning update approach based on incremental logs.The latest data changes of data sources are captured according to incremental logs,and then the attributes and tuples in data composition view are indexed.The index numbers of different tuples can be calculated with positioning attributes.The corresponding tuple update operations can be performed according to data changes’ type.A log-based update data acquisition algorithm and a data composition view positioning update algorithm are presented.The proposed approach has been evaluated in a cross-origin heterogeneous elevator data service system by using datasets from multiple departments.When the proportion of the number of changed tuples is much smaller than the total number of tuples,the update efficiency of positioning update approach is much higher than existing methods.When the number of attributes of the data composition view is larger,the update efficiency of the positioning update approach is much higher than existing methods.
    Reference | Related Articles | Metrics
    Robust Low Rank Subspace Clustering Algorithm Based on Projection
    XING Yu-hua, LI Ming-xing
    Computer Science    2020, 47 (6): 92-97.   DOI: 10.11896/jsjkx.190500074
    Abstract420)      PDF(pc) (1863KB)(896)       Save
    With the advent of the era of big data,how to effectively cluster,analyze and effectively use massive amounts of high-dimensional data has become a hot research topic.When the traditional clustering algorithms are used to process high-dimensional data,the accuracy and stability of the clustering results are low.The subspace clustering algorithm can reduce the feature space of the original data to form different feature subsets,reduce the influence of uncorrelated features between data on clustering results.It can mine the information that is difficult to display in high-dimensional data,and has significant advantages in processing high-dimensional data.Aiming at the limitations of existing graph-based subspace clustering algorithms in dealing with unknown type noise and solving complex convex problems,based on subspace clustering algorithm,combined with spatial projection theory,this paper proposes a projection-based robust low-rank subspace clustering algorithm.Firstly,the original data is projected,the noise of the projection space is eliminated by coding and the missing data is compensated.Then a new method map is used to construct the sparse similarity l2 graph,and finally the subspace clustering is performed on the basis of the l2 graph.The algorithm does not need a priori knowledge of the type of noise,and the l2 graph can well describe the characteristics of high-dimensional data sparsity and spatial dispersion.Three datasets of face recognition are selected as experimental datasets.Firstly,the optimal parameters affecting the clustering effect are determined,and then the algorithm is verified from three aspects:accuracy,robustness and time complexity.The experimental results show that the algorithm has high accuracy,low time complexity and good robustness,when the unknown type of noise is mixed in the datasets of face recognition.
    Reference | Related Articles | Metrics
    Application Research of Improved XGBoost in Imbalanced Data Processing
    SONG Ling-ling, WANG Shi-hui, YANG Chao, SHENG Xiao
    Computer Science    2020, 47 (6): 98-103.   DOI: 10.11896/jsjkx.191200138
    Abstract903)      PDF(pc) (1355KB)(1616)       Save
    When dealing with imbalanced data,traditional classifiers tend to guarantee the accuracy of the majority class and sacrifice the accuracy of the minority class,resulting in a higher error rate of the minority class.Aiming at this problem,an improved XGBoost method for binary imbalanced data is proposed.The main idea is to improve the characters of imbalanced data from three levels,data,features,and algorithms.Firstly,at the data level,Conditional Generative Adversarial Nets (CGAN) learns the distributive information of minority samples and then trains the generator to generate a few supple-mentary samples to adjust the imbalance of the data.Secondly,at the feature level,it uses XGBoost for feature combination to generate new features,and then uses the minimal Redundancy-Maximal Relevance (mRMR) algorithm to screen out a subset of features that are more suitable for imbalanced data classification.Finally,at the algorithm level,it introduces a Focal Loss function for imbalanced data classification to improve XGBoost.The improved XGBoost is trained on the new dataset to obtain the final model.In the experimental stage,G-mean and AUC are selected as the evaluation indicators.The experimental results on 6 sets of KEEL datasets verify the feasibility of the proposed improved method.At the same time,the method is compared with the existing four imbalanced classification models.The experimental results show that the proposed improved method has better classification effect.
    Reference | Related Articles | Metrics
    Enhancer-Promoter Interaction Prediction Based on Multi-feature Fusion
    HU Yu-jia, GAN Wei, ZHU Min
    Computer Science    2020, 47 (5): 64-71.   DOI: 10.11896/jsjkx.191100027
    Abstract419)      PDF(pc) (2893KB)(1657)       Save
    The study of the mechanism of Enhancer-Promoter Interaction is helpful to understand gene regulations,thus revealing specific genes that are relevant to diseases as well as providing new clinical methods and ideas for disease diagnosis and treatment.Compared to traditional biological analysis methods which are always more expensive,time-consuming and more difficult to precisely identify specific interactions due to limited resolution,computational methods to solve biological problems have become a hot research topic in recent years.This method can actively learn sequence features and spatial structures through complex network structures,so as to precisely and accurately predict the interactions of enhancers and promoters.This paper firstly introduces the research status of traditional biological detection methods.Then,from the perspective of sequence features,the application of statistics and deep learning method in the prediction of enhancer - promoter interaction is summarized and sorted out based on the basic idea of multi-feature fusion.Finally,the research hotspots and challenges in this field are summarized and analyzed.
    Reference | Related Articles | Metrics
    Overlapping Community Detection Method Based on Rough Sets and Density Peaks
    ZHANG Qin, CHEN Hong-mei, FENG Yun-fei
    Computer Science    2020, 47 (5): 72-78.   DOI: 10.11896/jsjkx.190400160
    Abstract368)      PDF(pc) (1728KB)(961)       Save
    With the development of the Internet and society,a large number of interrelated and interdependent data is produced in various fields every day,which form various complex networks according to different themes.Mining community structure of complex networks is an important research content,which has extremely important significance in recommendation system,behavior prediction and information spreading.Moreover,overlapping community structure of complex networks exists universally in life,which has practical research significance.In order to detect overlapping communities effectively in complex networks,an overlapping community detection method OCDRD based on rough sets and density peaks is proposed in this paper,in which rough set theory is used to analyze communities and identify overlapping nodes.Firstly,the global similarities among network nodes are obtained by using grey correlation analysis method based on the traditional local similarity measure of network nodes.Then the global similarities among network nodes are converted to distance among nodes.The center nodes of the community are automatically selected by the network structure by applying the idea of density peaks based clustering.Next,the lower approximation,the upper approximation,and the boundary region of the community are defined according to the distance ratio relation among nodes in the network.Finally,the threshold value of distance ratio is adjusted iteratively,and the boundary region of the community is calculated repeatedly in each iteration until the optimal overlapping community structure is obtained.The OCDRD algorithm is compared with other community detection algorithms that have achieved good results in recent years both on LFR benchmark artificial network datasets and real network datasets.By analyzing two common community detection evaluation indexes,NMI and overlapping module degree EQ,the experimental results show that OCDRD algorithm is superior to other community detection algorithms in community partition structure andit is feasible and effective.
    Reference | Related Articles | Metrics
    Stock Volatility Forecast Based on Financial Text Emotion
    ZHAO Cheng, YE Yao-wei, YAO Ming-hai
    Computer Science    2020, 47 (5): 79-83.   DOI: 10.11896/jsjkx.190400145
    Abstract561)      PDF(pc) (4546KB)(1712)       Save
    Emotions in the stock market can reflect investor behavior to a certain extent and influence investors' investment decisions.As a kind of unstructured data,market news can reflect the advantages and disadvantages of the market environment,and become a vital market reference data with stock prices,which can provide effective help for investment decisions effectively.This paper proposes a multidimensional emotional feature vectorization method which can accurately and quickly establish a large amount of news data for massive news data.It uses the support victor machine (SVM) model to predict the impact of financial news on the stock market,and uses bootstrap to mitigate overfitting problems.The results on Shanghai and Shenzhen stock indexes show that compared with the traditional model,the proposed method can improve the prediction accuracy by about 8% and obtain an excess of 6.52% duringthree months,thus proving the effectiveness of the proposed method.
    Reference | Related Articles | Metrics
    Short-term Traffic Flow Prediction Based on DCGRU-RF Model for Road Network
    XIONG Ting, QI Yong, ZHANG Wei-bin
    Computer Science    2020, 47 (5): 84-89.   DOI: 10.11896/jsjkx.190100213
    Abstract392)      PDF(pc) (1790KB)(874)       Save
    With the acceleration of urbanization,the number of motor vehicles in cities in China is increasing rapidly,which makes the existing road network capacity difficult to meet the transportation needs,traffic congestion,environmental pollution and traffic accidents are increasing day by day.Accurate and efficient traffic flow prediction,as the core of ITS,can effectively solve the problems of traffic travel and management.The existing short-term traffic flow prediction researches mainly use the shallow mo-del method,so they cannot fully reflect the traffic flow characteristics.Therefore,this paper proposed a short-term traffic flow prediction method based on DCGRU-RF model for complex traffic network structure.The DCGRU network is used to characte-rize the spatio-temporal correlation features in the traffic flow time series data.After obtaining the dependencies and potential features in the data,the RF model is selected as the predictor,and the nonlinear prediction model is constructed based on the extracted features,and finally getting the prediction result.In this experiment,38 detectors in two urban roads were selected as experimental objects,traffic flow data of five working days were selected,and the proposed model was compared with other common traffic flow prediction models.The experimental results show that DCGRU-RF model can further improve the prediction accuracy,the accuracy can reach 95%.
    Reference | Related Articles | Metrics
    Multi-label Learning Algorithm Based on Association Rules in Big Data Environment
    WANG Qing-song, JIANG Fu-shan, LI Fei
    Computer Science    2020, 47 (5): 90-95.   DOI: 10.11896/jsjkx.190300150
    Abstract485)      PDF(pc) (1446KB)(1234)       Save
    In the traditional single-label mining technology research,each sample belongs to only one label and the labels are mutually exclusive.In the multi-label learning problem,one sample may correspond to multiple labels,and each label is often asso-ciated with each other.At present,the research on the correlation between tags gradually becomes a hot issue in multi-label lear-ning research.Firstly,in order to adapt to the big data environment,the traditional association rule mining algorithm Apriori is parallelized and improved.The Hadoop-based parallelization algorithm Apriori_ING is proposed to realize the generation of the candidate set,the pruning and the support number statistics,and the parallelization.The advantage is that the frequent itemsets and association rules obtained by the Apriori_ING algorithm generate tag sets,and the inference engine based tag set generation algorithm IETG is proposed.Then,the label set is applied to multi-label learning,and a multi-label learning algorithm FreLP is proposed.FreLP uses association rules to generate a set of labels,decomposes the original set of labels into multiple subsets,and then uses the LP algorithm to train the classifier.FreLP was compared with the existing multi-label learning algorithms.Experiment results show that the proposed algorithm can obtain better results under different evaluation indicators.
    Reference | Related Articles | Metrics
    Event Detection Method Based on Node Evolution Staged Optimization
    FU Kun, QIU Qian, ZHAO Xiao-meng, GAO Jin-hui
    Computer Science    2020, 47 (5): 96-102.   DOI: 10.11896/jsjkx.190400072
    Abstract259)      PDF(pc) (1737KB)(692)       Save
    Link prediction technology is an effective method to analyze network evolution,it also provides a new idea to detect social events.Now,most of the event detections using link prediction start from the macroscopic overall network evolution.Although there are a few detection methods that combine node evolution,the stability of them are not good,and the sensitivity to the event are not high enough to accurately detect the occurrence of the event.Therefore,an event detection method based on node evolution staged optimization (NESO_ED) was proposed.Firstly,the stability of event detection is enhanced by a staged optimization method,and an array of node index weights is obtained.Then,according to different rules,the optimal similarity calculation index of the node is selected,so that the node can better quantify the network evolution and improve the sensitivity of event detection.In addition,the changes of indicators that selected by nodes in the process of network evolution were also analyzed.It reveals different effects of events on the evolution of nodes.On real social network VAST,the event detection sensibility of NESO_ED is increased by 227% compared with LinkEvent and 63% compared with NodeED.The stability of NESO_ED is also increased by 66% compared with NodeED,which shows that NESO_ED can detect events more accurately and stably.
    Reference | Related Articles | Metrics
    Imbalance Data Classification Based on Model of Multi-class Neighbourhood Three-way Decision
    XIANG Wei, WANG Xin-wei
    Computer Science    2020, 47 (5): 103-109.   DOI: 10.11896/jsjkx.180601099
    Abstract380)      PDF(pc) (1387KB)(750)       Save
    Imbalance data classification is an important data classification problem,traditional classification algorithm does not have better classification effect for smaller class in imbalance data.Therefore,this paper proposed an algorithm of imbalance data classification based on multi-class neighbourhood three-way decision.In the case of mixed data and multiple classes,traditional three-way decision is firstly generalized,and the multi-class neighbourhood three-way decision model of mixed data is presented.Then,a setting method of self-adaption cost function is given in the model,and based on this method,the algorithm of imbalance data classification of multi-class neighbourhood three-way decision model is proposed.Simulation experiment results show that the proposed classification algorithm has better classification performance for imbalance data.
    Reference | Related Articles | Metrics
      First page | Prev page | Next page | Last page Page 1 of 1, 14 records