Started in January,1974(Monthly)
Supervised and Sponsored by Chongqing Southwest Information Co., Ltd.
ISSN 1002-137X
CN 50-1075/TP
CODEN JKIEBK
Editors
    Content of Data Science in our journal
        Published in last 1 year |  In last 2 years |  In last 3 years |  All
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Database of Chinese Domestic Films for Fox-office Revenue Forecasting
    SHI Zheng, XU Ming-xing
    Computer Science    2019, 46 (11A): 149-152.  
    Abstract236)      PDF(pc) (1989KB)(1314)       Save
    The prediction of film box-office revenues is a hot research area in the Globalfilm industry.A rich film database is the cornerstone for such research.Aiming at the gap between the film industries in China and western countries,and the limited records of Chinese domestic films,this paper established a database of Chinese domestic films for box-office revenue forecasting.Firstly,the global status of film box-office revenue forecasting research is reviewed.Secondly, the ideas and detailed procedures of establishing a database of domestic films are introduced.Finally,a comparison between the proposed database and the well-established databases of films from other countries is performed by using the same box-office revenue prediction method.The test results show that the proposed database shares a similar performance with other databases,confirming that the domestic film database is valid for forecasting box-office revenues.
    Reference | Related Articles | Metrics
    Application of Active Learning in Recommendation System
    ZHAO Hai-yan, WANG Jing, CHEN Qing-kui, CAO Jian
    Computer Science    2019, 46 (11A): 153-158.  
    Abstract841)      PDF(pc) (2748KB)(1185)       Save
    In recent years,recommender system develops very quickly and is becoming more and more mature.However,many approaches are based on an ideal assumption,i.e.,there are plenty of sample data which can help us train a mature model to predict or recommend.In actual industrial production,most users and products lack of rating information or consumption records.And datasets formed by historical accumulation are unevenly distribued,so that it is hard to learn a reliable model.Active learning considers that the benefits of each item to the system is different,so some special items can be selected through specific strategies,and the related preference information can be actively obtained by the interaction between the user and the project.Active learning applied in the recommendation system attempts to training a model with fewer but higher quality samples,which improves the user experience and protects against unbalanced data sets.The applications of active learning in recommendation system in recent years were reviewed and summarized.Future directions were also discussed in this paper.
    Reference | Related Articles | Metrics
    Study on Interdisciplinary Model of Construction of Big Data Discipline in China
    NING Hui-cong
    Computer Science    2019, 46 (11A): 159-162.  
    Abstract218)      PDF(pc) (1817KB)(1083)       Save
    With the vigorous development of new generation of information technology represented by big data,cloud computing and artificial intelligence,digital economy has become an important engine to drive China’s economic growth.It is a great significance to speed up the construction of big data discipline and train new generation of information technology talents.At present,there are many universities and research institutes at home and abroad to carry out the training of big data talents,but there is no mature model on how to carry out the construction of big data discipline.Therefore,this paper summarized the existing achievements of the construction of big data discipline at home and abroad,and used the Delphi method (expert investigation method) and case analysis method to conduct analyse.Lastly,combined with interdisciplinary research and personnel training mechanism,the interdisciplinary model of “point”“line”“lane” and “three-dimensional” in the construction of big data discipline in China was proposed to provide useful refe-rence for the interdisciplinary development research of the construction of big data discipline in our country.
    Reference | Related Articles | Metrics
    Research on Relationship Between Bipartite Network Recommendation Algorithm and Collaborative Filtering Algorithm
    ZHOU Bo
    Computer Science    2019, 46 (11A): 163-166.  
    Abstract270)      PDF(pc) (3217KB)(786)       Save
    This paper introduced the basic principle of collaborative filtering algorithm and bipartite network recommendation algorithm,and proposed the general bipartite network recommendation algorithm.The internal relationship between the two algorithms was analyzed.The results show that collaborative filtering algorithm is a special case of the bipartite network recommendation algorithm,and bipartite network algorithm is proved to perforem better than collaborative recommendation algorithm.This research systematizes and unifies the bipartite recommendation algorithm theory and promotes the further development of recommendation algorithm.
    Reference | Related Articles | Metrics
    Dynamical Network Clustering Algorithm Based on Weighting Strategy
    WANG Zi-jie, ZHOU Ya-jing, LI Hui-jia
    Computer Science    2019, 46 (11A): 167-171.  
    Abstract254)      PDF(pc) (3069KB)(653)       Save
    Network dynamic plays an important role in analyzing the correlation between the function properties and the topological structure.This paper proposed a novel dynamical iteration algorithm incorporating the iterative process of membership vector with weighting scheme,i.e.weighting W and tightness T.These new elements can be used to adjust the link strength and the node compactness for improving the speed and accuracy of community structure detection.To estimate the optimal stop time of iteration,this paper utilized stability function defined as the Markov random walk auto-covariance.The algorithm is very efficient,and doesn’t need to specify the number of communities in advance,so it naturally supports overlapping communities by associating each node with a membership vector describing node’s involvement in each community.Theoretical analysis and experiments show that the algorithm can uncover communities effectively and efficiently.
    Reference | Related Articles | Metrics
    Personalized Question Recommendation Based on Autoencoder and Two-step Collaborative Filtering
    XIONG Hui-jun, SONG Yi-fan, ZHANG Peng, LIU Li-bo
    Computer Science    2019, 46 (11A): 172-177.  
    Abstract434)      PDF(pc) (2347KB)(1038)       Save
    Personalized question recommendation is an effective way to improve learning efficiency.It helps students get rid of the “Massive Questions” and has important significance to achieve adaptive teaching and promote education equity.However,most of the personalized question recommendation methods are based on collaborative filtering without focusing on the knowledge points,which causes the problem that the positioning of the recommended questions are inaccurate.In order to solve this problem,a personalized question recommendation system based on deep autoencoder and a two-step collaborative filtering was adopted in this paper.Firstly,considering students’ master degree of knowledge points,the two-step collaborative filtering question recommendation based on knowledge points is realized.Secondly,item response theory and deep autoencoder are used to predict the scores and the comprehensive scores of the students involving recommended knowledge points on the recommended questions.Finally,the prediction results are synergistically decided,the difficulty of the final personalized recommendation questions is controlled,and a list of final recommended questions in generated.Comparison experiments verify that the recommended results of the proposed recommendation method are more personalized and accurate than that of traditional question recommendation methods.
    Reference | Related Articles | Metrics
    Recommendation Methods Considering User Indirect Trust and Gaussian Filling
    ZHU Pei-pei, LONG Min
    Computer Science    2019, 46 (11A): 178-184.  
    Abstract392)      PDF(pc) (2251KB)(695)       Save
    The existing recommendation algorithm introduces the user display trust,which can effectively improve the recommendation accuracy,but does not fully exploit the social relationship,and the indirect trust has richer potential value in the social information,further affecting the recommendation quality.Although there are related studies on indirect trust,the calculation is complicated and the path of trust transmission is not sufficient.Therefore,through the trust transfer network diagram,the ratio of each branch node to the total path node is multiplied by node-by-node to obtain the trust indirect value globally.Secondly,the information entropy is used to analyze the actual performance of the user’ssocial trust relationship,and the trust is adjusted to form the calculation model IpmTrust of indirect trust.And based on this model,a recommendation algorithm GITCF considering user indirect trust is designed.The algorithm uses the Gaussian model to fill the scoring matrix,and then uses the modified cosine to calculate the user similarity.After IpmTrust calculates the indirect trust,the user trust and the similarity are linearly weighted and merged.Finally,the improved neighbor prediction is used for recommendation.The experiment was carried out on the Matlab simulation platform.The RMSE and MAE evaluations were compared.The GITCF was compared with the exis-ting recommendation algorithms and the traditional recommendation algorithms.The GITCF is improved by nearly 7% compared with the existing recommendation recommendation,and is also higher than the trust-free ones.The experimental results show that the IpmTrust model has certain validity,and the recommended algorithm can improve the quality of recommendation results.
    Reference | Related Articles | Metrics
    VID Model of Vehicles-infrastructure-driver Collaborative Control in Big Data Environment
    CHENG Xian-yi, SHI Quan, ZHU Jian-xin, CHEN Feng-mei, DAI Ran-ran
    Computer Science    2019, 46 (11A): 185-188.  
    Abstract197)      PDF(pc) (2797KB)(1017)       Save
    Aiming at the serious redundancy in the centralized control mode of Internet of vehicles,and the high cost implementation of mutually reinforcing inmulti-source data,this paper described the VID (Vehicles-Infrastructure-driver) model of collaborative control from the perspective of big data.The model consists of perception center and distributed task execution.The unified perception center provides public perception services and integrates perception resource management,task scheduling and data collection.Vehicles-infrastructure Cooperative System (VCS),Driver-Vehicles Cooperative System and Driver Behavior Analysis perform perceptual tasks in a decentralized way.The VID model opens up the global and local loops from perception to service,and has good applicability for scenarios requiring collaborations.
    Reference | Related Articles | Metrics
    Research and Application of Multi-label Learning in Intelligent Recommendation
    ZHU Zhi-cheng, LIU Jia-wei, YAN Shao-hong
    Computer Science    2019, 46 (11A): 189-193.  
    Abstract467)      PDF(pc) (1922KB)(1183)       Save
    Collaborative filtering algorithm is used in traditional intelligent recommendation,but it can’t deal with user’srating information well.The data sparsity and extreme data influence the quality of recommendation.Therefore,the recommendation problem is transformed into a multi-label learning problem,and a complete intelligent recommendation system based on HMM model and user portrait was proposed in this paper.Firstly,different data processing mechanisms are set up to improve the generalization ability of the algorithm.Secondly,an improved HMM model with anti-Markov property is proposed to solve the problem of data sparsity.Finally,a user portrait is constructed to screen the learning experience of the HMM model and get the final recommendation service.Experimental results show that multi-label learning can effectively improve the accuracy and efficiency of intelligent recommendation.
    Reference | Related Articles | Metrics
    Multilayer Perceptron Classification Algorithm Based on Spectral Clusteringand Simultaneous Two Sample Representation
    LIU Shu-dong, WEI Jia-min
    Computer Science    2019, 46 (11A): 194-198.  
    Abstract294)      PDF(pc) (1621KB)(690)       Save
    Classification learning from imbalanced datasets is always one of hot topics in data mining and machine lear-ning domains.Data-level,algorithm-level and ensemble solutions are three main methods so far for addressing imba-lanced learning.Undersmapling,which is one of data-level solutions,is widely utilized in many imbalanced learning scenarios.However,its drawback is discarding potentially useful majority data instances.In this paper,spectral clustering was introduced to take sample of the majority class instances so as to build simultaneous two sample representation.Firstly,all majority class instances are divided into many different clusters by spectral clustering analysis,different numbers of representative samples are extracted from different clusters according to the size of each cluster and the average distance between the minority class instances are generated simultaneous and each cluster,then two sample representation with the extracted instances are generated simultaneous from clusters and the minority class instances.The proposed method not only alleviates the issue of data explosion in simultaneous two sample representation,but also avoids the loss of useful information in random sampling.Finally,several experiments certificate its validity on nine groups of datasets from UCI.
    Reference | Related Articles | Metrics
    User Collaborative Filtering Recommendation Algorithm Based on All Weighted Matrix Factorization
    DENG Xiu-qin, LIU Tai-heng, LIU Fu-chun, LONG Yong-hong
    Computer Science    2019, 46 (11A): 199-203.  
    Abstract402)      PDF(pc) (3923KB)(691)       Save
    Aiming at the problem that traditional user collaborative filtering recommendation algorithm equates users’ preferences for an item,a user collaborative filtering model based on all weighted matrix decomposition was proposed.Firstly,the model designs frequency sensing weights for observations,and non-uniformly designs user-oriented weights for unobserved values.Then,the weights of the observed and unobserved values are combined,and the similarity between user reputation and user relationship is determined according to the score,and the user collaborative filtering model of the fused fully weighted matrix decomposition is constructed.In order to verify the performance of the proposed recommendation algorithm,experiments were carried out on three real data sets:Douban,Epinions and Last.fm.The experimental results demonstrate that the proposed AWMF_UCFR algorithm achieves significant improvements on recommendation accuracy than MF algorithm,WRMF-UO algorithm and SoRS algorithm.
    Reference | Related Articles | Metrics
    Cell Clustering Algorithm Based on MapReduce and Strongly Connected Fusion
    HU Ying-shuang, LU Yi-hong
    Computer Science    2019, 46 (11A): 204-207.  
    Abstract226)      PDF(pc) (2778KB)(693)       Save
    With the explosive growth of large location data,most of the traditional serial clustering algorithms can not process big data efficiently.In order to solve this problem,more and more people are studying parallel clustering algorithm.It is difficult to guarantee the clustering quality of parallel clustering algorithm,so it is important to study the algorithm of reducing the result of parallel clustering.Therefore,a grid clustering algorithm based on strongly connected fusion was proposed.Firstly,clustering result of data subsets is obtained according to the improved DBSCAN algorithm based on MapReduce.Next,the relationship between grid and cluster is analyzed and the concepts of Gird-cluster,connectivity and strong connectivity of Gird-clusters are defined.Then the connectivity weights matrix between Gird-cluster and Gird-cluster is calculated.Finally,whether to reduce two Gird-clusters or not is decided according to connectivity weight.The experimental results show that the proposed algorithm has high efficiency and high clustering quality in processing large location data.
    Reference | Related Articles | Metrics
    Implementation of ETL Scheme Based on Storm Platform
    LIANG Kui-kui
    Computer Science    2019, 46 (11A): 208-211.  
    Abstract320)      PDF(pc) (3278KB)(709)       Save
    With the continuous development of the Internet in various fields,data begin to show the characteristics of structural diversity and volumetric quantification.In the face of the impact of massive data,how to improve the efficiency of ETL is crucial.In view of the problem of inconsistent data source and format and poor real-time data collection in “information island”,this paper proposed a vertical segmentation ETL workflow and horizontal segmentation pending data set,and established a flow-based ETL processing scheme based on Storm platform.At the same time,for the shortcomings of Storm,which is insensitive to the CPU load of the working node during task assignment,the CPU load information of the working node is recorded by the timing task to optimize the slot allocation mode of the Storm scheduler,sothat the load of the Storm cluster is more balanced.The experimental results show that the scheme can effectively improve the processing efficiency of ETL,and the system stability and processing efficiency for slot allocation optimization.
    Reference | Related Articles | Metrics
    Feature Selection Method Based on Ant Colony Optimization and Random Forest
    LI Guang-hua, LI Jun-qing, ZHANG Liang, XIN Yan-sen, DENG Hua-wei
    Computer Science    2019, 46 (11A): 212-215.  
    Abstract410)      PDF(pc) (1665KB)(1241)       Save
    In the face of massive high-dimensional data,eliminating redundant features for feature selection has become one of the important issues faced by information and science and technology today.Traditional feature selection methods are not suitable for searching the whole feature space,and their performance and accuracy are low.In this paper,a me-thod of feature selection based on ant colony optimization and random forest was proposed.This method takes the importance score of random forest as the heuristic factor of ant colony optimization,uses ant colony optimization to search intelligently,and uses the result of feature selection as the evaluation index to feedback the pheromone of ant colony in real time.Experiments show that this feature selection method can effectively reduce the number of features in data sets and improve the accuracy of data classification compared with traditional feature selection methods.
    Reference | Related Articles | Metrics
    Nearest Neighbor Optimization k-means Clustering Algorithm
    LIN Tao, ZHAO Can
    Computer Science    2019, 46 (11A): 216-219.  
    Abstract429)      PDF(pc) (1923KB)(762)       Save
    Traditional k-means algorithms usually ignores the distribution of the data samples,assign all of them in the cluster edge position,center position,outliers to the cluster which nearest clustering center locates,in accordance with the principle of minimum distance,without considering the relationsh1ip between the data sample and other clusters.If the distance between the data sample and the other cluster is close to the minimum distance,the data sample is very close to the two clusters,obviously,the direct division menthod is not reasonable.Aiming at this problem,this paper presented a clustering algorithm optimized nearest neighbor (1NN-kmeans).Using the ideas of neighbor,assign these samples that do not firmly belong to a certain cluster to the cluster that the nearest neighbor sample belongs to.The experimental results show that 1NN effectively reduced the number of iterations and improved the clustering accuracy and finally achieved the better clustering results.
    Reference | Related Articles | Metrics
    Design of Distributed News Clustering System Based on Big Data Computing Framework
    LU Xian-hua, WANG Hong-jun
    Computer Science    2019, 46 (11A): 220-223.  
    Abstract254)      PDF(pc) (1876KB)(1377)       Save
    Rapid clustering of massive Internet news to generate hot topic is an important research direction.Aiming at several key problems of large-scale text clustering:similarity calculation,distributed clustering and clustering result summary generation,this paper designed and implemented a Spark-based distributed news clustering system.Firstly,the GPU-accelerated deep similarity algorithm is used to calculate the similarity relationship of news texts.Then the graph clustering algorithm is used for news clustering.Finally,a short title for each class is generated as the class description.Experiments show that the proposed system has high performance and good scalability,and can effectively handle hotspot clustering tasks of large-scale news.
    Reference | Related Articles | Metrics
    Top-N Personalized Recommendation Algorithm Based on Tag
    MA Wen-kai, LI Gui, LI Zheng-yu, HAN Zi-yang, CAO Ke-yan
    Computer Science    2019, 46 (11A): 224-229.  
    Abstract318)      PDF(pc) (2390KB)(1311)       Save
    With the development of Web2.0,UGC tag system is receiving more and more attention.Tag can not only reflect users’ interests,but also it can describe the innate character of item.Available tag recommendation algorithm does not considerae the influence of continuous behaviors of users.Although traditional recommendation algorithm based on Markov Chain produces recommendation through the emphasis on the research of continuous behaviors of users,it can not be appliedy to the tag recommendation of UCG due to its direct function on the two-dimensional relationships between user and item.Therefore,according to the thoughts of Markov Chain and Collaborative Filtering,an individual recommendation algorithm based on the tag could be applied.The algorithm splits three-dimensional relationships of 〈user-tag-item〉 into two two-dimensional relationships of 〈user-tag〉 and 〈tag-user〉.Firstly,the interest degree is calculated through the application of Markov Chain.Then correspondent item matched through the recommendation of tags.To raise the accuracy rating of recommendation,modeling of satisfaction is established by this tag according to the influence of tags and associated relationships among tags of items .This model is a kind of probabilistic model.At the same time of calculating the interest degree and satisfaction degree of user-tag and user-item,the thought of Collaborative Filtering is also used to complement sparse data.Compared with available algorithm,this algorithm is improved a lot in the aspects of precision and recall rate on the open data set.
    Reference | Related Articles | Metrics
    Model of Music Theme Recommendation Based on Attention LSTM
    JIA Ning, ZHENG Chun-jun
    Computer Science    2019, 46 (11A): 230-235.  
    Abstract413)      PDF(pc) (1939KB)(1609)       Save
    Aiming at the problems of low classification accuracy,long period,and difficulty in meeting the demand for theme music in people’s life,an attention mechanism and LSTM (Long Short-Term Memory) were designed.Based on the neural network model,it consists of a music theme model and a music recommendation model.On the basis of using the attention mechanism and the LSTM network to realize music emotion classification,the music theme model effectively combines the audio codebook and the topic model to achieve Discrimination of a subcategory of music topics under an emotion.In the music recommendation model,a low-level descriptor and a spectrogram are used to construct a joint representation of manual features and Convolutional Recurrent Neural Network (CRNN) features.The emotions expressed by the user’s voice are obtained,and the user is given a precise music theme recommendation by using this mo-del.In the experiment,two models were designed separately,and two different traditional models were used as the baseline.The experimental results show thatthis model not only can improve the classification accuracy of the subject,but also can accurately judge the emotion of the user’s voice data,so as to achieve the recommendation of the theme music compared with the traditional single model.
    Reference | Related Articles | Metrics
      First page | Prev page | Next page | Last page Page 1 of 1, 18 records