Computer Science

Select

Research on Deep Reinforcement Learning

ZHAO Xing-yu, DING Shi-fei

Computer Science 2018, 45 (7): 1-6. DOI: 10.11896／j.issn.1002-137X.2018.07.001

Abstract （1098）

PDF（pc）（1307KB）（2751）

Save

As a new machine learning method,deep reinforcement learning combines deep learning and reinforcement learning,which makes that the agent can perceive the information from high dimensional space,train model and make decision according to the received information.Deep reinforcement learning has been widely researched and used in va-rious fields of daily life because of its universality and effectiveness.Firstly,an overview of the deep reinforcement lear-ning research was given and the basic theory of deep reinforcement learning was introduced.Then value-based algorithms and policy-based algorithms were introduced.After that,the application prospects of deep reinfercement learning were discussed.Finally,the related researches were summarized and prospected.

Reference | Related Articles | Metrics

Select

Survey on Performance Optimization Technologies for Spark

LIAO Hu-sheng, HUANG Shan-shan, XU Jun-gang, LIU Ren-feng

Computer Science 2018, 45 (7): 7-15. DOI: 10.11896／j.issn.1002-137X.2018.07.002

Abstract （404）

PDF（pc）（2529KB）（1483）

Save

In recent years,with the advent of the era of big data,big data processing platform is developing very fast.A large number of big data processing platforms,including Hadoop,Spark,Strom and etc.,have appeared,among which Apache Spark is the most prominent one.With the wide applications of Spark at home and abroad,there are many performance problems to be solved.As the underlying implementation mechanism of Spark is very complex,it is difficult for ordinary users to find performance bottlenecks,let alone further optimization.In light of the above problems,the performance optimization technologies for Sparkwere summarized and analyzed from five aspects,including development principles optimization,memory optimization,configuration parameter optimization,scheduling optimization and shuffle process optimization.Finally,the key problems of Spark optimization technologies were summarized and future research issues were proposed.

Reference | Related Articles | Metrics

Select

Collaborative Filtering Recommendation Algorithm Based on Space Transformation

ZHAO Xing-wang,LIANG Ji-ye,GUO Lan-jie

Computer Science 2018, 45 (7): 16-21. DOI: 10.11896／j.issn.1002-137X.2018.07.003

Abstract （385）

PDF（pc）（1713KB）（626）

Save

In real applications,traditional collaborative filtering recommendation algorithms are usually faced with the problem of computational scalability.To solve this problem,in the framework of item-based collaborative filtering re-commendation,a collaborative filtering recommendation algorithm based on space transformation was proposed in this paper.Concretely speaking,according to the user social network information,the users are firstly divided into different clusters by using the community discovery algorithm.Then,item clusters are found according to the corresponding relationship between users and items in the rating information matrix.And the membership of each item for each item clusters is calculated.The sparse high dimensional rating information matrix is transformed into a low dimensional dense membership matrix,and then the similarities between items are carried on the transformed matrix.The proposed algorithm was compared with other algorithms on the public data set.The experimental results show that the proposed algorithm can significantly improve the computational efficiency while guaranteeing the accuracy of recommendation.

Reference | Related Articles | Metrics

Select

Influence of Noisy Features on Internal Validation of Clustering

YANG Hu, FU Yu, FAN Dan

Computer Science 2018, 45 (7): 22-30. DOI: 10.11896／j.issn.1002-137X.2018.07.004

Abstract （284）

PDF（pc）（2929KB）（971）

Save

Internal validation measures of clustering are extremly essentialin clustering analysis,and they are used to evaluate the effect of clustering results and are indicators to find the optimal cluster number when the true situation of sample is unknown.Although a large number of studies focus on the performance of internal validation measures of clustering and have found that some measures perform better than others,they ignore the influence of noisy features existing in real data.Therefore,it may mislead the selection and application of internal validation measures of clustering.This study selected 10 clustering validation measures to determine the number of clusters of simulation datasets and real datasets,so as to analyze the influence of noisy features on internal validation choosing and clustering results.Results indicate that noisy features among dataset have impact on all internal validation indices of clustering but KL,CH and CCC,and accuracy of the clustering results will decrease along with the increase of noise.

Reference | Related Articles | Metrics

Select

Ensemble Learning Method for Imbalanced Data Based on Sample Weight Updating

CHEN Sheng-ling ,SHEN Si-qi, LI Dong-sheng

Computer Science 2018, 45 (7): 31-37. DOI: 10.11896／j.issn.1002-137X.2018.07.005

Abstract （469）

PDF（pc）（1383KB）（1413）

Save

The problem of imbalanced data is prevalent in various applications of big data and machine learning,like medical diagnosis and abnormal detection.Researchers have proposed or used a number of methods for imbalanced learning,including data sampling(e.g.SMOTE) and ensemble learning(e.g.EasyEnsemble) methods.The oversamp-ling methods in data sampling may have problems such as over-fitting or low classification accuracy of boundary samples,while the under-sampling methods may lead to under-fitting.The Rotation SMOTE algorithm was proposed in this paper incorporating the basic idea of SMOTE,Bagging,Boosting and other algorithms,and SMOTE was used to indirectly increase the weight of minority samples based on the prediction result of the base classifier in the Boosting process.According to the basic idea of Focal Loss,this paper proposed FocalBoost algorithm that directly optimizes the sample weight updating strategy of AdaBoost based on the prediction results of the base classifier.Based on the experiment with multiple evaluation metrics on 11 unbalanced data sets in different application fields,Rotation SMOTE can obtain the highest recall score on all datasets compared with other imbalanced data learning algorithms (including SMOTEBoost and EasyEnsemble),and achieves the best or the second best G-means and F1Score on most datasets,while FocalBoost achieves better performance on 9 of these unbalanced datasets compared to the original AdaBoost.

Reference | Related Articles | Metrics

Select

Network Representation Model Based on Multi-architectures and Text Fusion

LI Jia-yi, ZHAO Yu, WANG Li

Computer Science 2018, 45 (7): 38-41. DOI: 10.11896／j.issn.1002-137X.2018.07.006

Abstract （265）

PDF（pc）（2137KB）（868）

Save

Network representation obtains the vector representations of nodes by deeply learning network structure,and mines the potential information on the network,which is an important method of reducing dimension in social computing.As for TADW,which is a network representation method based on matrix decomposition and combining text and structure,this paper first analyzed and discussed the influence of the location of text attributes matrix on network representation.Then,it proposed a social network representation method that incorporates relationship structure,interaction structure and textual attributes.Experimental results on multiple datasets show that the proposed method outperforms other classical network representation methods in classification tasks.

Reference | Related Articles | Metrics

Select

R-tree for Frequent Updates and Multi-user Concurrent Accesses Based on HBase

WANG Bo-tao,LIANG Wei,ZHAO Kai-li,ZHONG Han-hui,ZHANG Yu-qi

Computer Science 2018, 45 (7): 42-52. DOI: 10.11896／j.issn.1002-137X.2018.07.007

Abstract （344）

PDF（pc）（2331KB）（679）

Save

Application based on location based service (LBS) has entered the era of big data.Traditional location based service techniques face new challenges such as scalability,performance,etc.Cloud computing technology is the basis of big data processing and index is an important way to optimize query.Although there exist a large number of research results,as far as we know,there is no R-tree index which supports frequent updates and multi-user concurrent accesses based on HBase.According to the above characteristics of moving objects,this paper proposed a new R-tree index which supports frequent updates and multi-user concurrent accesses based on HBase.In this new index,the R-tree only indexes the grid which contains the moving object to avoide the problem of frequent updating effectively.Furthermore,based on the organization of HBase data rows and I/O characteristics of data partitions,this paper reorganized the nodes and encoded the grid cells with z-order,which reduce the read and write operations of HBase and improve the query efficiency.Finally,it proposed an optimization strategy for distributed read and write locks based on Zookeeper,which improves the throughput of new indexes.The experimental results show that the query throughput of the proposed strategy is improved by 25%～50% and the update throughput is about the same level in the case of uneven data compared with the grid index.Compared with the index using distri-buted shared locks,the query throughput of the index using distributed read and write locksis increased by nearly 40%.

Reference | Related Articles | Metrics