Started in January,1974(Monthly)
Supervised and Sponsored by Chongqing Southwest Information Co., Ltd.
ISSN 1002-137X
CN 50-1075/TP
CODEN JKIEBK
Editors
    Content of CCF Big Data 2017 in our journal
        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Research on Deep Reinforcement Learning
    ZHAO Xing-yu, DING Shi-fei
    Computer Science    2018, 45 (7): 1-6.   DOI: 10.11896/j.issn.1002-137X.2018.07.001
    Abstract1098)      PDF(pc) (1307KB)(2751)       Save
    As a new machine learning method,deep reinforcement learning combines deep learning and reinforcement learning,which makes that the agent can perceive the information from high dimensional space,train model and make decision according to the received information.Deep reinforcement learning has been widely researched and used in va-rious fields of daily life because of its universality and effectiveness.Firstly,an overview of the deep reinforcement lear-ning research was given and the basic theory of deep reinforcement learning was introduced.Then value-based algorithms and policy-based algorithms were introduced.After that,the application prospects of deep reinfercement learning were discussed.Finally,the related researches were summarized and prospected.
    Reference | Related Articles | Metrics
    Survey on Performance Optimization Technologies for Spark
    LIAO Hu-sheng, HUANG Shan-shan, XU Jun-gang, LIU Ren-feng
    Computer Science    2018, 45 (7): 7-15.   DOI: 10.11896/j.issn.1002-137X.2018.07.002
    Abstract404)      PDF(pc) (2529KB)(1483)       Save
    In recent years,with the advent of the era of big data,big data processing platform is developing very fast.A large number of big data processing platforms,including Hadoop,Spark,Strom and etc.,have appeared,among which Apache Spark is the most prominent one.With the wide applications of Spark at home and abroad,there are many performance problems to be solved.As the underlying implementation mechanism of Spark is very complex,it is difficult for ordinary users to find performance bottlenecks,let alone further optimization.In light of the above problems,the performance optimization technologies for Sparkwere summarized and analyzed from five aspects,including development principles optimization,memory optimization,configuration parameter optimization,scheduling optimization and shuffle process optimization.Finally,the key problems of Spark optimization technologies were summarized and future research issues were proposed.
    Reference | Related Articles | Metrics
    Collaborative Filtering Recommendation Algorithm Based on Space Transformation
    ZHAO Xing-wang,LIANG Ji-ye,GUO Lan-jie
    Computer Science    2018, 45 (7): 16-21.   DOI: 10.11896/j.issn.1002-137X.2018.07.003
    Abstract385)      PDF(pc) (1713KB)(626)       Save
    In real applications,traditional collaborative filtering recommendation algorithms are usually faced with the problem of computational scalability.To solve this problem,in the framework of item-based collaborative filtering re-commendation,a collaborative filtering recommendation algorithm based on space transformation was proposed in this paper.Concretely speaking,according to the user social network information,the users are firstly divided into different clusters by using the community discovery algorithm.Then,item clusters are found according to the corresponding relationship between users and items in the rating information matrix.And the membership of each item for each item clusters is calculated.The sparse high dimensional rating information matrix is transformed into a low dimensional dense membership matrix,and then the similarities between items are carried on the transformed matrix.The proposed algorithm was compared with other algorithms on the public data set.The experimental results show that the proposed algorithm can significantly improve the computational efficiency while guaranteeing the accuracy of recommendation.
    Reference | Related Articles | Metrics
    Influence of Noisy Features on Internal Validation of Clustering
    YANG Hu, FU Yu, FAN Dan
    Computer Science    2018, 45 (7): 22-30.   DOI: 10.11896/j.issn.1002-137X.2018.07.004
    Abstract284)      PDF(pc) (2929KB)(971)       Save
    Internal validation measures of clustering are extremly essentialin clustering analysis,and they are used to evaluate the effect of clustering results and are indicators to find the optimal cluster number when the true situation of sample is unknown.Although a large number of studies focus on the performance of internal validation measures of clustering and have found that some measures perform better than others,they ignore the influence of noisy features existing in real data.Therefore,it may mislead the selection and application of internal validation measures of clustering.This study selected 10 clustering validation measures to determine the number of clusters of simulation datasets and real datasets,so as to analyze the influence of noisy features on internal validation choosing and clustering results.Results indicate that noisy features among dataset have impact on all internal validation indices of clustering but KL,CH and CCC,and accuracy of the clustering results will decrease along with the increase of noise.
    Reference | Related Articles | Metrics
    Ensemble Learning Method for Imbalanced Data Based on Sample Weight Updating
    CHEN Sheng-ling ,SHEN Si-qi, LI Dong-sheng
    Computer Science    2018, 45 (7): 31-37.   DOI: 10.11896/j.issn.1002-137X.2018.07.005
    Abstract469)      PDF(pc) (1383KB)(1413)       Save
    The problem of imbalanced data is prevalent in various applications of big data and machine learning,like medical diagnosis and abnormal detection.Researchers have proposed or used a number of methods for imbalanced learning,including data sampling(e.g.SMOTE) and ensemble learning(e.g.EasyEnsemble) methods.The oversamp-ling methods in data sampling may have problems such as over-fitting or low classification accuracy of boundary samples,while the under-sampling methods may lead to under-fitting.The Rotation SMOTE algorithm was proposed in this paper incorporating the basic idea of SMOTE,Bagging,Boosting and other algorithms,and SMOTE was used to indirectly increase the weight of minority samples based on the prediction result of the base classifier in the Boosting process.According to the basic idea of Focal Loss,this paper proposed FocalBoost algorithm that directly optimizes the sample weight updating strategy of AdaBoost based on the prediction results of the base classifier.Based on the experiment with multiple evaluation metrics on 11 unbalanced data sets in different application fields,Rotation SMOTE can obtain the highest recall score on all datasets compared with other imbalanced data learning algorithms (including SMOTEBoost and EasyEnsemble),and achieves the best or the second best G-means and F1Score on most datasets,while FocalBoost achieves better performance on 9 of these unbalanced datasets compared to the original AdaBoost.
    Reference | Related Articles | Metrics
    Network Representation Model Based on Multi-architectures and Text Fusion
    LI Jia-yi, ZHAO Yu, WANG Li
    Computer Science    2018, 45 (7): 38-41.   DOI: 10.11896/j.issn.1002-137X.2018.07.006
    Abstract265)      PDF(pc) (2137KB)(868)       Save
    Network representation obtains the vector representations of nodes by deeply learning network structure,and mines the potential information on the network,which is an important method of reducing dimension in social computing.As for TADW,which is a network representation method based on matrix decomposition and combining text and structure,this paper first analyzed and discussed the influence of the location of text attributes matrix on network representation.Then,it proposed a social network representation method that incorporates relationship structure,interaction structure and textual attributes.Experimental results on multiple datasets show that the proposed method outperforms other classical network representation methods in classification tasks.
    Reference | Related Articles | Metrics
    R-tree for Frequent Updates and Multi-user Concurrent Accesses Based on HBase
    WANG Bo-tao,LIANG Wei,ZHAO Kai-li,ZHONG Han-hui,ZHANG Yu-qi
    Computer Science    2018, 45 (7): 42-52.   DOI: 10.11896/j.issn.1002-137X.2018.07.007
    Abstract344)      PDF(pc) (2331KB)(679)       Save
    Application based on location based service (LBS) has entered the era of big data.Traditional location based service techniques face new challenges such as scalability,performance,etc.Cloud computing technology is the basis of big data processing and index is an important way to optimize query.Although there exist a large number of research results,as far as we know,there is no R-tree index which supports frequent updates and multi-user concurrent accesses based on HBase.According to the above characteristics of moving objects,this paper proposed a new R-tree index which supports frequent updates and multi-user concurrent accesses based on HBase.In this new index,the R-tree only indexes the grid which contains the moving object to avoide the problem of frequent updating effectively.Furthermore,based on the organization of HBase data rows and I/O characteristics of data partitions,this paper reorganized the nodes and encoded the grid cells with z-order,which reduce the read and write operations of HBase and improve the query efficiency.Finally,it proposed an optimization strategy for distributed read and write locks based on Zookeeper,which improves the throughput of new indexes.The experimental results show that the query throughput of the proposed strategy is improved by 25%~50% and the update throughput is about the same level in the case of uneven data compared with the grid index.Compared with the index using distri-buted shared locks,the query throughput of the index using distributed read and write locksis increased by nearly 40%.
    Reference | Related Articles | Metrics
      First page | Prev page | Next page | Last page Page 1 of 1, 7 records