Computer Science

CHANG Liang, CAO Yu-ting, SUN Wen-ping, ZHANG Wei-tao and CHEN Jun-tong

Computer Science. 2017, 44 (10): 1-6. doi:10.11896/j.issn.1002-137X.2017.10.001

Abstract

PDF(1187KB) ( 3654 )

References | Related Articles | Metrics

The main research task of current tourism recommendation system is to provide personal recommendation serves for users and improve the accuracy of recommendations and the satisfaction of users.In this paper,the similarities and differences between tourism recommender system and traditional recommender systemare were analyzed.And the research status of tourism recommender technologies was investigated from six aspects,i.e.,recommendation based on content,recommendation based on collaborative-filtering,recommendation based on knowledge,recommendation based on demographics,hybrid recommendation and recommendation based on location-awareness.As a summary of these research works,a general framework for tourism recommender system was proposed.Finally,six key and difficult problems on tourism recommender systems were presented,and some research topics which might bring great progress to tourism recommender systems were emphasized.

Reviews of Multiobjective Ant Colony Optimization

DIAO Xing-chun, LIU Yi, CAO Jian-jun and SHANG Yu-ling

Computer Science. 2017, 44 (10): 7-13. doi:10.11896/j.issn.1002-137X.2017.10.002

Abstract

PDF(1411KB) ( 2132 )

References | Related Articles | Metrics

Multiobjective ant colony optimization is one of the important mutiobjective evolutionary algorithms,which has excellent performance in multiobjective optimization problems especially multiobjective combinational optimization problems.In this paper,we summarized the development of the multiobjective ant colony optimization and classified it into three classes,i.e.method based on pareto’s relation,method based on indicators and method based on decomposition.Besides,we also summarized each method’s characteristics and its classical algorithms.We showed its spread applications in real problems.In the end,we discussed the existing problems in multiobjective ant colony optimization.

Dynamic Scheduling Method of Virtual Resources Based on ARIMA Model

YANG Dong-ju and DENG Chong-bin

Computer Science. 2017, 44 (10): 14-18. doi:10.11896/j.issn.1002-137X.2017.10.003

Abstract

PDF(1292KB) ( 868 )

References | Related Articles | Metrics

Deploying applications to the cloud has become an increasingly common practice in the industry,high concurrency and high traffic have become major features of most cloud applications.How to deal with the rising of the high concurrency and the surge of user traffic,to use resources reasonably,and to ensure the stable operation of the application,are important issues to solve for the cloud resource management.Considering the adjustment of resources based on monitoring data is easy to trigger the delay of resource scheduling,a dynamic scheduling method for resource adjustment based on ARIMA prediction model was proposed in this paper.The method can calculate the required resource size according to the demand of the forecast and the load capacity of the current resources scale,thus configurating or releasing the virtual machine resources.The experimental results show that the prediction model can fit the scene well.By using the predictive model,the resource scheduling algorithm can effectively guarantee the quality of cloud services in a timely and effective manner.

Energy-aware Management of Virtual Machines in Data Center

ZHU De-jian, BAI Guang-wei, CAI Yan-wei, REN Dong and SHEN Hang

Computer Science. 2017, 44 (10): 19-25. doi:10.11896/j.issn.1002-137X.2017.10.004

Abstract

PDF(1331KB) ( 856 )

References | Related Articles | Metrics

Large scale data centers need to consume a large amount of power,resulting in high operating costs and other issues such as environmental pollution.In order to reduce the energy consumption of the data center,we constructed a management model of the data center and proposed the algorithm of the static placement algorithm and dynamic adjustment of the virtual machine.Dynamic migration of virtual machine can effectively reduce the energy consumption while improving resource utilization.However,excessive migration of virtual machines will affect the quality of the application and cause SLA violation.In the dynamic adjustment stage,we adopted dynamic thresholds to control the virtual machine migration and reduce energy consumption.Finally,we used CloudSim to do a lot of experiments.The results show that the energy-aware management of virtual machine (EAMVM) mechanism can reduce energy consumption and reduce the number of virtual machine migration. 〖BHDWG1,WK42,WK43,WK42W〗第10期朱德剑 ,等:数据中心虚拟机节能管理机制

GPU Accelerated cWB Pipeline for Gravitational Waves Discovery

DU Zhi-hui, LIN Zhang-xi, GU Yan-qi, Eric O.LEBIGOT and GUO Xiang-yu

Computer Science. 2017, 44 (10): 26-32. doi:10.11896/j.issn.1002-137X.2017.10.005

Abstract

PDF(1417KB) ( 888 )

References | Related Articles | Metrics

Gravitational wave (GW) is an important prediction of Einstein’s general relativity theory.Some were genera-ted during the big bang.Their most easily detectable sources are expected to be binaries of orbiting objects like black holes and/or neutron stars.Their study can thus give information about some important astrophysical objects.A few large-scale laser interferometer gravitational wave observatories have been built,with the goal of directly detecting GWs for the first time.Coherent Wave Burst (cWB) is an important pipeline that looks for gravitational wave in the data from multiple observatories,simultaneously and in real time.It is useful to improve the performance of cWB so as to allow it to perform deeper analyses. Therefore we analyzed a time-critical function from cWB,designed and implemented an efficient acceleration method on GPU.Experimental results show that our method can achieve at least 10x speedup compared with the original CPU implementation with SSE instruction.The results show that our GPU acceleration method is a viable option for improving gravitational wave data processing.

SBV:A Bioinformatics Visualization Software Based on SVG

CAI Rui-chu, LIN Yin-xian and AI Peng

Computer Science. 2017, 44 (10): 33-37. doi:10.11896/j.issn.1002-137X.2017.10.006

Abstract

PDF(1550KB) ( 1825 )

References | Related Articles | Metrics

Bioinformatics visualization is an important approach to exploit the information behind the massive biological data.In view of the challenges like massive data size,accurate visualization effect and diversified visualization requirements,we presented a bioinformatics visualization software based on SVG,called SBV (SVG for Bioinformatics Visuali-zation).SBV takes advantages of scalability of SVG and customizable performance form of DOM and CSS to draw a variety of bioinformatics maps.It is a maneuverable integrative bioinformatics visualization platform supporting most of existing bioinformatics visualization requirements.The software has been open source in Github,which provides good foundation for the further development.

Research on Essential Protein Identification Method Based on Improved PSO Algorithm

HONG Hai-yan and LIU Wei

Computer Science. 2017, 44 (10): 38-44. doi:10.11896/j.issn.1002-137X.2017.10.007

Abstract

PDF(1370KB) ( 936 )

References | Related Articles | Metrics

The essential protein is the most important material basis for the maintenance of all life activities in the living body.With the development of high throughput technology,how to identify the essential proteins from the protein interaction network has become a hot research topic in proteomics.For most of the existing methods are only based on the information of network topology for recognition as well as high false positive of protein-protein interaction data,this paper presented the improved particle swarm algorithm to identify the essential proteins.We considered the network topology characteristics and multi-source biological attribute information to construct the high quality of the weighted networks.We also considered node links between protein to measure the essentiality of protein,and expanded the local network topology to the second-order neighbor,improving the accuracy greatly.We proposed a measure of the overall top-pindex,which reduces the computational complexity.The experimental results on standard data sets show that our algorithm is superior to other algorithms in comparison with other classical algorithms,which can identify more proteins with higher accuracy.

Cirrhosis Recognition Based on Improved LBP Algorithm and Extreme Learning Machine

LEI Yi-ming, ZHAO Xi-mei, WANG Guo-dong and YU Ke-xin

Computer Science. 2017, 44 (10): 45-50. doi:10.11896/j.issn.1002-137X.2017.10.008

Abstract

PDF(1400KB) ( 845 )

References | Related Articles | Metrics

Computer aided diagnosis of cirrhoisis has great meaning for the early treatment and diagnosis of liver di-sease.For the issues that edge blurring and nonuniform of echo in cirrhosis lesion area and influence of scale factor in B-mode ultrasound images,we proposed an improved LBP algorithm and extracted the corresponding SLBP feature which depicts the lesion area of cirrhosis more precisely than traditional texture features.Through the combination of SLBP and two-dimensional Gabor transform,we solved the difficulties above.Due to the long training time of conventional machine learning methods,we adopted extreme learning machine based method and firstly applied it in cirrhosis recognition.Experimental results show that classification accuracy on test set reaches 95.4%,and time efficiency has further improved compared with traditional method.The comparison between the proposed method and conventional methods,via ROC(Receiver Operating Characteristic) curve,demonstrates that the proposed method possesses the advantages both in accuracy and generalization performance.The proposed method will be helpful for clinical diagnosis of cirrhosis.

Study of Hidden Tidal Wave Orientaion in Pulse Wave

ZHENG Gang, FAN Lin-lin, SUN Ying and DAI Min

Computer Science. 2017, 44 (10): 51-54. doi:10.11896/j.issn.1002-137X.2017.10.009

Abstract

PDF(1213KB) ( 1148 )

References | Related Articles | Metrics

Although the clinical significance of central arterial pressure is superior to traditional brachial artery and radial artery blood pressure,its estimation method is bound by the establishment of GTF (General Transform Function) and the tidal wave position determination of radial artery wave.In this paper,GTF was obtained by Fourier transform on published traumatic central arterial data (MIT MIMIC,MIMIC Database(minicdb)),and according to the central arterial systolic pressure value,combining with wavelet transform on radial artery wave,tidal wave position was calcula-ted.It is found that the sixth zero crossing of the radial wave pulse wave after the sym4 and haar transformations on the radial pulse wave is the concealed tidal wave position after the maximum value of the third order difference wave.The experimental results show that the accuracy of the concealed tidal wave position recognition is up to 91.11%.

Improvement of Multiple Sequence Center Star Method and Its Parallelization in Spark

DONG Gai-fang, FU Xue-liang and LI Hong-hui

Computer Science. 2017, 44 (10): 55-58. doi:10.11896/j.issn.1002-137X.2017.10.010

Abstract

PDF(1188KB) ( 1300 )

References | Related Articles | Metrics

Because center star alignment algorithm needs to calculate the distance and scores of any two input sequences when determining the central sequence,it caused the high time complexity.A strategy for determining the assembling selection of k-mers was proposed by synthesizing computing the k-mers generated by each sequence and the number of occurrences of each k-mer in each sequence.Furthermore,in the process of pair wise sequence alignment,the idea of searching two sequences of the largest similar sub-sequences was used.The accuracy of the improved center star alignment algorithm is improved with a certain degree.The improved center star alignment algorithm was parallelized designed and implemented in Spark.Spark’s Yarn-Client running mode was used to experiment the multi-group data of normal mitochondria.The performance of the algorithm was analyzed and the direction of improvement was analyzed.

Openstack-based Virtualized Computing Cluster and Application for High Energy Physics

HUANG Qiu-lan, LI Hai-bo, SHI Jing-yan, SUN Zhen-yu, WU Wen-jing, CHENG Yao-dong and CHENG Zhen-jing

Computer Science. 2017, 44 (10): 59-63. doi:10.11896/j.issn.1002-137X.2017.10.011

Abstract

PDF(1223KB) ( 969 )

References | Related Articles | Metrics

High energy physics computing is a high-performance computing application,which requires a lot of computing resource.If the utilization of CPU resource is not high,it will cause the worse computing efficiency.In traditional computing environment,the static resource management leads to the difficulty to satisfy the resource requirements of different kinds of jobs such as sudden jobs,batch jobs,CPU-intensive jobs,IO-intensive jobs and so on.The paper discussed the virtualized computing system based on Openstack,which implements scheduling jobs with CPU cores,danamically schedule the resources,greatly improves the utilization of resources according to the current job and resource status.Firstly,we introduced the relative research activities including KVM performance testing and optimization,performance analysis of HEP (High Energy Physics) jobs running between KVM and physical machines and the public cloud service IHEPCloud.All of them illustrate it’s totally acceptable to make HEP jobs run in virtualized platform.Then,we demonstrated the design and implementation of virtualized computing system.Finally,the current status of the virtualized computing cluster is shown,which verifies that the performance of virtualized computing system can meet the needs of high energy physics computing.

Porting and Optimizing OpenFOAM on Sunway TaihuLight System

MENG De-long, WEN Min-hua, WEI Jian-wen and James LIN

Computer Science. 2017, 44 (10): 64-70. doi:10.11896/j.issn.1002-137X.2017.10.012

Abstract

PDF(1372KB) ( 1651 )

References | Related Articles | Metrics

The Sunway TaihuLight supercomputer based on the Chinese-designed many-core processors is the world’s fastest system with a peak performance of 125.4 PFlops.OpenFOAM (open source field operation and manipulation) is one of the most popular open source computational fluid dynamics (CFD) software which is written in C++ and not fully compatible with compilers on the heterogeneous many-core processor SW26010.This paper ported OpenFOAM based on SW26010’s MPE(management processing element)/CPE (computing processing element) cluster architecture.To overcome the compilation incompatibility problem,we adopted the mixed-language application design.We also applied several SW26010’s feature-specific optimizations on the hotspot of OpenFOAM to deliver high performance,such as the register communication,vectorization,and double buffering.The experiments on SW26010 using real datasets show that the single-CG (core group) code runs 8.03x faster than the well-tuned version on the MPE,and the performance of single-CG is 1.18x higher than the serial implementation of Intel(R) Xeon(R) CPU E5-2695 v3.We also optimized the parallel implementation of OpenFOAM and yielded speedups of 184.9x on 256 CGs.The porting methods and optimizations presented can also be referenced for other complex C++ programs to achieve high performance on SW26010.

Implementation and Performance Evaluation of Recommender Algorithms Based on Multi-/Many-core Platforms

CHEN Jing, FANG Jian-bin, TANG Tao and YANG Can-qun

Computer Science. 2017, 44 (10): 71-74. doi:10.11896/j.issn.1002-137X.2017.10.013

Abstract

PDF(1139KB) ( 968 )

References | Related Articles | Metrics

In this paper,we designed and implemented two typical recommender algorithms,alternating least squares and cyclic coordinate descent in openCL.Then we evaluated them on Intel CPUs,NVIDIA GPUs and Intel MIC,and investigated the performance impacting factors: potential feature dimension and the number of thread.Meanwhile,we compared the OpenCL implementation with that of CUDA and OpenMP.Our experimental results show that in the same condition,CCD converges faster and performs more steadily,but is more time-consuming than ALS.We also observed that the performance based on OpenCL is better than CUDA and OpenMP when running on the same platform:the training time on GPU is slightly faster than that of the CUDA implementation (1.03x for CCD and 1.2x for ALS),and the training time on CPU is 1.6~1.7 times less than that of the OpenMP implementation with 16 threads.When running the OpenCL implementation on different platforms,we noticed that CPU performs better than both the GPU and the MIC.

Design and Optimization of Hybrid Storage System in HEP Environment

XU Qi, CHENG Yao-dong and CHEN Gang

Computer Science. 2017, 44 (10): 75-79. doi:10.11896/j.issn.1002-137X.2017.10.014

Abstract

PDF(1296KB) ( 1085 )

References | Related Articles | Metrics

Computing in high energy physics (HEP) is a typical data-intensive application including simulation,reconstruction and physical analysis.Generally,the HEP experiment file is very big and the way of accessing to the files is usually skipping through large data blocks.Therefore,the performance of accessing to big files is one of decisive factors for the HEP computing system.Firstly,this paper analyzed the typical structure of the computing environment in high energy physics and the characters of accessing to files,introduced the advantages of hybrid storage system in high energy physics,summarized the characteristics of data access mode,evaluated the performance of different read/write mode,then proposed a new deployment model of hybrid storage system in high energy physics,which is proved to have higher I/O performance,at the same time the cost was considered to implement a high-performance system with low cost.The test result shows that the hybrid storage system has good performance in some fields such as HEP.Based on the analysis,it can help to get better I/O performance with lower price in High Energy Physics.At the last,the future of the hybrid storage system was analyzed.

Parallel Design and Optimization of Galaxy Group Finding Algorithm on Comparation of SGI and Distributed-memory Cluster

SI Yu-meng, WEI Jian-wen, Simon SEE and James LIN

Computer Science. 2017, 44 (10): 80-84. doi:10.11896/j.issn.1002-137X.2017.10.015

Abstract

PDF(1209KB) ( 960 )

References | Related Articles | Metrics

Halo-based galaxy group finder (HGGF) is an effective algorithm that accomplishes the task of galaxy group finding based on galaxy coordinates,redshift and mass etc.,and provides great help in the research of galaxy group formation and evolution.However,current pure OpenMP implementation of the algorithm is limited by the resource of the underlying single compute node when dealing with large-scale group finding problems.One of the possible solutions is using resources from multiple nodes to reduce execution time while solving large-size galaxy group finding problem.Therefore,it is essential to redesign and implement the algorithm.The major hurdle for such an attempt is remoting memory access due to semi-random galaxy access in the algorithm which damages the performance in multi-node environment.To tackle such a problem,we paralleled the algorithm with adjacent galaxy list design and used unified parallel C (UPC) to implement it.2.25,2.78 and 5.07 times speedup for the kernel were achieved with 4,8 and 16 nodes respectively.Meanwhile,the memory requirement on each node was also reduced significantly.Experiments of OpenMP version of the algorithm on SGI UV 2000 show that due to the nature of the program and the features of NUMA architecture,programs with random memory access behavior like HGGF may not readily benefit from the large number of threads and shared memory provided by such machines.Two-level parallel design that takes advantage of locality principle on distributed memory clusters may be a better solution.

Evaluation of Resource Management Methods for Large High Energy Physics Computer Cluster

SUN Zhen-yu, SHI Jing-yan, JIANG Xiao-wei, ZOU Jia-heng and DU Ran

Computer Science. 2017, 44 (10): 85-90. doi:10.11896/j.issn.1002-137X.2017.10.016

Abstract

PDF(1272KB) ( 1975 )

References | Related Articles | Metrics

High energy physics data consist of multiple events,among which there is no relativity.A high energy phy-sics computing mission is parallelized by running multiple jobs processing multiple different data files simultaneously.Therefore,high energy physics computing is a typical high throughput computing scenario.The computer cluster running at the institute of high energy physics (IHEP) uses the open-source TORQUE/Maui for resource management and job scheduling.IHEP keeps a fair-use policy by dividing the computing resources of this cluster into multiple queues,and limiting the maximum number of running jobs of each user.However,this leads up to a low overall resource usage of the cluster.SLURM and HTCondor are both popular open-source resource management system.SLURM has plenty of job scheduling policy,while HTCondor well suits high throughput computing.Both of them are the possible solutions of resource management for computer clusters,replacing old,lack-of-service TORQUE/Maui.In this paper,job submission behavior of users from Daya Bay experiment was simulated at SLURM and HTCondor testing cluster,testing the resource allocation behaviors and efficiencies of SLURM and HTCondor.Their scheduling results were then compared with the actual scheduling result of the same jobs on IHEP TORQUE/Maui cluster.Finally the strengths and weaknesses of SLURM and HTCondor were analyzed,and the practicability of using SLURM or HTCondor to manage the IHEP computer cluster was discussed.

Modeling and Prediction on Train Communication Network Traffic of CRH2 EMUs

GE Shi-chun, LIU Xiong-fei and ZHOU Feng

Computer Science. 2017, 44 (10): 91-95. doi:10.11896/j.issn.1002-137X.2017.10.017

Abstract

PDF(1410KB) ( 878 )

References | Related Articles | Metrics

Aiming at the increasing complexity of the CRH2 train network traffic data,the method based on principal component analysis (PCA) and back propagation neural network (BP Network) was proposed to model and predict network traffic.Based on the built CRH2 train communication simulation platform,traffic of various links of the network has been collected.In order to reduce the complexity of analysis,the dimension reduction analysis is carried out with the application of PCA,then the data is input to BP network for simulation prediction.It is proved that the method can effectively fit the trend of the train network flow,providing concrete reference for the fault diagnosis of CRH2 train communication network.

Improved Link Prediction Method for Weighted Networks

CHEN Xu and CHEN Ke-jia

Computer Science. 2017, 44 (10): 96-98. doi:10.11896/j.issn.1002-137X.2017.10.018

Abstract

PDF(1175KB) ( 1061 )

References | Related Articles | Metrics

Currently,link mining in complex networks has been extensively studied.However,there are only a few rela-ted works on weighted networks and the results are not satisfactory.A new link prediction method for weighted networks was proposed by improving the weighted similarity measure of network structure.The new method is based on the assumption that when the link xz is strong and the link zy is weak,the link 〈x,z,y〉 has the least contribution to the link between node x and y.Therefore,in the new method,as the link xz is strong and the link zy is weak,the degree of weakening of the link 〈x,z,y〉 to the contribution degree of the similarity score S(x,y) between the node x and y is maximal.Comparative experiments on weighted dataset USAir and NetScience show that the proposed method has better performance in AUC indicators.

DOA Estimating Algorithm Based on Grid-less Compressive Sensing

ZHANG Xing-hang, GUO Yan, LI Ning and SUN Bao-ming

Computer Science. 2017, 44 (10): 99-102. doi:10.11896/j.issn.1002-137X.2017.10.019

Abstract

PDF(1346KB) ( 1130 )

References | Related Articles | Metrics

The basis mismatch is existing in the DOA estimation problem by traditional compressive sensing theory.Applying the grid-less compressive sensing technology based on the ADMM algorithm is a wonderful solution,but the convergence rate of the traditional ADMM algorithm was low.To solve this problem,the AP-ADMM algorithm was proposed in this paper.According to the power of the input signals,the AP-ADMM algorithm is able to choose the original numerical value of the penalty adaptively.In addition,the proposed algorithm converges with the ite-rating adaptive penalty.The convergence rate of the proposed algorithm is much higher than the traditional ADMM algorithm.Meanwhile,the accuracy and the probability of successful restoration of the proposed algorithm are approximate with the the traditional ADMM algorithm.The simulation results demonstrate the efficiency of the proposed algorithm.

Cooperative Spectrum Sensing Based on Reputation Mechanism in Cognitive Ad hoc Networks

QI Quan, WANG Ke-ren and DU Yi-hang

Computer Science. 2017, 44 (10): 103-108. doi:10.11896/j.issn.1002-137X.2017.10.020

Abstract

PDF(1464KB) ( 783 )

References | Related Articles | Metrics

To improve the accuracy of spectrum sensing and to resist the possible threat of SSDF attacks of cognitive Ad hoc networks,a new scheme of cooperative spectrum sensing for cognitive Ad hoc networks based on reputation mechanism was proposed.First,the detection factor is introduced to describe different SUs’ perception ability,and the SUs is divided into different clusters according to the fairness based clustering method.Then,the reputation value of the SUs in the cluster is set and updated according to the sensing results.Finally,the detection factor decision mechanism is designed for spectrum sensing data fusion.The theoretical upper bound of missed detection and false alarm probability is calculated.The simulation result shows that this scheme can effectively identify malicious users and resist SSDF attack with better fault tolerance,smaller false alarm and missed detection probability.

Research on Wireless Channel Resource Allocation Algorithm Based on Particle Swarm Optimization Algorithm

WANG Xiao-nan, JU Yong-feng, GAO Ting and ZHANG Fu-quan

Computer Science. 2017, 44 (10): 109-112. doi:10.11896/j.issn.1002-137X.2017.10.021

Abstract

PDF(1296KB) ( 901 )

References | Related Articles | Metrics

In order to maximize the network utility of multimedia wireless channel resource allocation,a new channel time allocation algorithm based on particle swarm optimization algorithm was proposed.The algorithm can optimize the time allocated to each device in the network so as to maximize the quality of service (QoS) for each network user.The proposed algorithm combines the diversity increasing function and the learning method based on the individual optimal value,and improves the algorithm based on adaptive particle swarm optimization algorithm.The convergence speed of the algorithm increases at the same time of the continuous enhancement of QoS.The proposed algorithm is tested in a gigabit network environment of up to 40 devices.Experimental results show that the proposed algorithm can greatly improve the resource allocation capability,especially in the case of large network size.

Node Forwarding Strategy with Collision Estimation in Urban Vehicular Ad Hoc Networks

HU Chang-jun and YUAN Shu-jie

Computer Science. 2017, 44 (10): 113-116. doi:10.11896/j.issn.1002-137X.2017.10.022

Abstract

PDF(1222KB) ( 811 )

References | Related Articles | Metrics

Aiming at the problem of high collision rate in message transmission,low transmitting efficiency and unreliable routing caused by the uneven distribution of vehicles in urban vehicular ad hoc networks,a node forwarding strategy with collision estimation (NFCE) was proposed based on irresponsible forwarding (IF) algorithm.Firstly,the vehicular node receiving message from others determines the probability of collision in forwarding message.If the probability is below a certain threshold,then the node determines its own forwarding probability on the basis of the node density around it,its communication radius and its distance from the source node.Finally the node with higher forwarding pro-bability has more priority to forward the message.Simulation results show that,compared with other typical algorithms,the NFCE algorithm reduces the rate of transmission collisions,its routing has higher efficiency and reliability than others especially in large vehicle density of urban environments,so NFCE is more suitable for application in urban environment.

Research on Cooperative Clustering-based MAC Protocol for Vehicular Ad Hoc Network

YE Xiang, ZHANG Guo-an, JIN Xi-long and CHEN Feng

Computer Science. 2017, 44 (10): 117-121. doi:10.11896/j.issn.1002-137X.2017.10.023

Abstract

PDF(1334KB) ( 905 )

References | Related Articles | Metrics

Owing to the advancement of wireless communication technologies,vehicular Ad Hoc networks (VANETs) have become an emerging field of research.According to the characteristics of VANET and the strict delay constrains and high reliability requirement of safety messages,we presented a cooperative clustering-based MAC (CCB-MAC) protocol for safety messages.In CCB-MAC,the selected helpers relay the safety message to the nodes that have failed in reception during the broadcast period.In addition,cooperation is conducted in idle slots,without interrupting the normal transmission.Both numerical analysis and simulation results show that the proposed protocol improves the probability of successful packet transmission,reduces the transmission delay and packet loss rate significantly.

Research on Temporal Centrality Prediction of Nodes in Complex Networks

TONG Lin-ping, XU Shou-zhi, ZHOU Huan and JIANG Ting-yao

Computer Science. 2017, 44 (10): 122-126. doi:10.11896/j.issn.1002-137X.2017.10.024

Abstract

PDF(1293KB) ( 767 )

References | Related Articles | Metrics

In this paper,three kinds of temporal centrality of nodes in complex networks were predicted.Through the analysis of the temporal centrality values of nodes at different times in the real datasets,it can be found that temporal centrality values of nodes in different times are highly correlated.Based on this observation,we proposed several prediction methods to predict the temporal centrality values of nodes in the future in real datasets.Then,through the error analysis between the real values and predicted values,the performance of different prediction methods in different real data sets was compared.The results show that the recent weighted average method performs best in the MIT reality trace,and the recent uniform average method performs best in the Infocom 06 trace.

Analysis and Comparison of Privacy Leak Static Detection Tools for Android Applications

YAN Ji-wei, LI Ming-su, LU Qiong, YAN Jun and GAO Hong-yu

Computer Science. 2017, 44 (10): 127-133. doi:10.11896/j.issn.1002-137X.2017.10.025

Abstract

PDF(1245KB) ( 1493 )

References | Related Articles | Metrics

In recent years,the problems of privacy leak in Android applications attract more and more attention.The maliciously access of private information will increase the risk of users’ privacy leak.To solve this problem,researchers have proposed many privacy-leak detection tools that have differences in emphasis point and performance.In order to facilitate the understanding and using for researchers,this paper analyzed and compared nine kinds of privacy leak static detection tools for Android apps.We summarized the detection targets,methods,types of error detection and their efficiency.We also designed and conducted experiments for two open source tools,FlowDroid and IccTA,to test their perfor-mance and detecting ability.For the 50 downloaded apps,FlowDroid successfully detected 9 apps possessing privacy leak and IccTA detected 7 apps possessing ICC leak.For the 12 self-designed test cases,FlowDroid and IccTA can successfully detect all privacy leaks.

Design and Formal Verification of Xen Hybrid Multi-police Model

ZHU Xian-wei, ZHU Zhi-qiang and SUN Lei

Computer Science. 2017, 44 (10): 134-141. doi:10.11896/j.issn.1002-137X.2017.10.026

Abstract

PDF(1393KB) ( 1763 )

References | Related Articles | Metrics

As a popular open-source virtualization tools,XEN has attracted more and more attention.XSM,as a Xen security model,determins its security.Native XSM does not carry on safe differentiated control design to system source and uses Dom0 as unique virtual machines administrative domain that does not meet minimum privileges.According to these questions,we designed a hybrid multi-police model named SV_HMPMD.In order to improve BLP’s practicability,the model introduces multi-level security labels.In order to divide the privilege in detail,we combined DTE with RBAC.We designed a hierarchical model that describes SV_HMPMD by formal methods for xsm to verify the consistency between achievements and security requirements by the tools named Isabelle/HOL.

RFID Authentication Protocol Based on Pseudo ID and Certification by Strand Space Model

XU Yang, YUAN Jin-sha, GAO Hui-sheng, HU Xiao-yu and ZHAO Zhen-bing

Computer Science. 2017, 44 (10): 142-146. doi:10.11896/j.issn.1002-137X.2017.10.027

Abstract

PDF(1351KB) ( 815 )

References | Related Articles | Metrics

Secure and effective authentication protocol is a powerful guarantee for the security of RFID system,and the appropriate formal analysis method can provide a valid proof for the RFID authentication protocol.In this paper,the RFID authentication protocol based on pseudo ID was designed,and the pseudo ID was generated by the tag’s ID,the authentication value of the tag and the random number.Tag’s ID does not appear in the process of protocol implementation,which reduces the possibility of system attacks.The protocol uses hash algorithm of the tag’s ID,authentication value of the tag and the random number to achieve certification.Based on the formal analysis of the protocol by the strand space model,the cluster map of the strand space model of the authentication protocol was established.The perfor-mance of security and authentication of the protocol were proved.By comparing the common protocol based on hash,the proposed method can resist attacks with low computation cost and realize mutual authentication between the tag and reader.

Impossible Differential Attack on 12-round Block Cipher ESF

GAO Hong-jie and WEI Hong-ru

Computer Science. 2017, 44 (10): 147-149. doi:10.11896/j.issn.1002-137X.2017.10.028

Abstract

PDF(1248KB) ( 930 )

References | Related Articles | Metrics

ESF is a lightweight block cipher algorithm with generalized Feistel structure of 32 rounds of iterated block ciphers.Its round function employs SPN structure.The block size of ESF is 64-bit and the key size is 80-bit.In order to analyze impossible differential cryptanalysis on the block cipher ESF,based on one 8-round impossible differential route,according to the relationship of the round keys,through adding two rounds in the front and adding two rounds in the end,12-round ESF was attacked.Computing result shows that the attacks of 12-round ESF need O(2⁵³) data complexity,and O(260.43) time complexity,so 12-round ESF is not immune to impossible differential cryptanalysis.

Mixed Flow Policy Based On-demand Distributed Cloud Information Flow Control Model

DU Yuan-zhi, DU Xue-hui and YANG Zhi

Computer Science. 2017, 44 (10): 150-158. doi:10.11896/j.issn.1002-137X.2017.10.029

Abstract

PDF(1621KB) ( 740 )

References | Related Articles | Metrics

In order to protect the security of user information in virtual machine on the cloud platform,this paper proposed a mixed flow control based on-demand distributed information flow control model (MDIFC).This model deve-lopes from DIFC,and the taint propagation is introduced to track the sensitive data so that the system can enforce the strategy and the user data can be protected better.In order to improve the flexibility of the model,considering the initiative of virtual domains,the concept of on-demand controlled and output classification were proposed.The model can reduce the workload result from taint propagation at the same time.This paper introduced its specification using π calculus and proved the security property of noninterference of MDIFC system with PicNic tool.Finally,this paper used an example to demonstrate of MDIFC.

DWNAF:A Dynamic Window NAF Scalar Multiplication with Threshold

SHI Liang and XU Ming

Computer Science. 2017, 44 (10): 159-164. doi:10.11896/j.issn.1002-137X.2017.10.030

Abstract

PDF(1297KB) ( 1040 )

References | Related Articles | Metrics

In order to improve the safety of the data transmission in underwater acoustic channel,in view of the fact that asymmetric encryption requires high performance of nodes, a dynamic window NAF scalar multiplication with a thre-shold (DWNAF) was proposed for underwater acoustic sensor networks.The method is based on the classic width-ω NAF method through a “threshold” for dynamic control,and it can optimize the pretreatment process and effectively reduce the pre-calculation in scalar multiplication.Experimental results show that under the same pre-calculation,the point-add in DWNAF is only 25% of that in RWNAF.In security,DWNAF adopts the combination of window method,energy balance method and masking method,which can effectively resist the common side channel attacks such as SPA,DPA and its variants RPA and ZPA.

Election Scheme Optimization of Redis Cluster Based on Bully Algorithm

WANG Fen, GU Nai-jie and HUANG Zeng-shi

Computer Science. 2017, 44 (10): 165-170. doi:10.11896/j.issn.1002-137X.2017.10.031

Abstract

PDF(1299KB) ( 1113 )

References | Related Articles | Metrics

With the rapid development of Internet,users obtain more and more information from the system,and the frequency of accessing to system also grows rapidly.While a large number of clients access to the system,the response time of the request greatly increases,the traditional relational database is unable to meet the demand of the user,but in-memory database guarantees the stability of system,improves the user experience,and obtains more and more application.As a kind of in-memory database of NoSQL,Redis supports many data types,and it is applicable to many cases of requirements in caching and storage.In this paper,we mainly introduced Redis cluster,which is a distributed implementation of the Redis,supports master-slave replication,has a certain degree of fault tolerance and linear scalability,and recently is used by Sina microblog,github and so on.Although it is widely used,current Redis cluster occasionally has the case that long recovery time is needed after node fails,which has something to do with election algorithm of current Redis cluster,that is the implementation of Raft algorithm.In this paper,we analyzed the reliability of Redis cluster,and optimized the election algorithm of cluster.The results in test show that the optimized cluster can successfully recover in 50 seconds while only one master node is offline,and it is 40% higher than that of the cluster of the community version.

Research on Detail Level Index Technology of Massive 3D Point Cloud Data in Virtual Tourism

ZHAO Er-ping, DANG Hong-en and LIU Wei

Computer Science. 2017, 44 (10): 171-176. doi:10.11896/j.issn.1002-137X.2017.10.032

Abstract

PDF(1278KB) ( 780 )

References | Related Articles | Metrics

D point cloud data in virtual tourism are particularly huge and the batch index has become a research hotspot.There are some problems in many index trees,such as spatial overlap of sibling node,not achieving level of detail index and low indexing efficiency.Therefore,point data reflection intensity and level of detail technology were introduced into R-tree,and LODR-tree was presented based on improved R-tree.Before establishing this tree,point cloud data needs to be pre-processed,such as sorting,grouping,removing spatial overlap and so on.The index records which meet the thre-shold conditions in the leaf nodes are inserted into the homologous non-leaf nodes along the parent-grandfather-great grandfather family relationship,and LOD index tree is created by this method.Data redundancy is controlled by reflected intensity,and query optimization is achieved by pyramid cutting technology.Finally,experiments show that LODR-tree has obvious advantages in LOD index and query efficiency.

Clustering Architecture-based Skyline Query Processing in Wireless Sensor Networks

LI Qing, XIAO Ying-yuan, WANG Xiao-ye and LI Yu-kun

Computer Science. 2017, 44 (10): 177-181. doi:10.11896/j.issn.1002-137X.2017.10.033

Abstract

PDF(1253KB) ( 777 )

References | Related Articles | Metrics

Obviously,the existing Skyline query algorithm based on single server can not be applied to the kind of distributed multi-hop ad hoc networks,such as wireless sensor networks.In this paper,we proposed a clustering based Skyline query method for the specific networks.Clustering architecture-based routing is adopted,which selects the maxi-mum rule power data tuple as global filter to filter the data that do not satisfy the Skyline condition,in order to reduce the communication overhead of sensor nodes in the Skyline query processing.Meanwhile,the sliding window mechanism is introduced into the Skyline query processing,and the mechanism can also effectively reduce the communication overhead.A large number of experimental results show that the proposed Skyline query algorithm has good performance of energy consumption.

Multi-criteria Recommendation Algorithm Based on Codebook-clustering and Factorization Machines

DING Yong-gang, LI Shi-jun, YU Wei and WANG Jun

Computer Science. 2017, 44 (10): 182-186. doi:10.11896/j.issn.1002-137X.2017.10.034

Abstract

PDF(1286KB) ( 795 )

References | Related Articles | Metrics

The sparsity of user-item ratings is a common problem and the users who share similar preferences on multi-criteria cannot be found by only making use of a single overall rating to calculate the similarity of users in traditional collaborative filtering algorithm,which would affect the accuracy of recommendation.Multi-criteria recommendation algorithm tries to find users who share similar preferences on multi-criteria,but the problem of data sparsity become even worse owing to the high cost of rating.Aim at these problems,we proposed an algorithm which first obtains the information of rating style of users based on the idea of codebook-clustering,and then conducts co-clustering for users and items on each criteria.Finally,this algorithm makes recommendations by factorization machines(FMs) based on users,items,multi-criteria ratings and rating style.The experimental result shows that multi-criteria recommendation algorithm based on codebook-clustering and FMs is able to solve the problem of data sparsity to some extent,thus improving the accuracy of recommendation.

Affinity Propagation Clustering Algorithm Based on Density Adjustment and Manifold Distance

XIA Chun-meng, NI Zhi-wei, NI Li-ping and ZHANG Lin

Computer Science. 2017, 44 (10): 187-192. doi:10.11896/j.issn.1002-137X.2017.10.035

Abstract

PDF(1425KB) ( 937 )

References | Related Articles | Metrics

As affinity propagation(AP)clustering is sensitive to the dataset with scaling parameter and various form while calculating the similarity matrix and the cluster result is not ideal,an affinity propagation clustering algorithm based on density adjustment and manifold distance was proposed.The algorithm introduces local density of data and manifold theory into affinity propagation clustering,and uses a way of distance measure based on manifold structure and density adjustment to describe the clusters’ actual structure better,making up the similarity matrix’s deficiency.At the same time,the algorithm is more efficient.Simulation experiment was done on artificial datasets and standard datasets.The result shows the effectiveness and superiority of proposed algorithm.

Stock Price Movements Prediction Based on Multisources

RAO Dong-ning, DENG Fu-dong and JIANG Zhi-hua

Computer Science. 2017, 44 (10): 193-202. doi:10.11896/j.issn.1002-137X.2017.10.036

Abstract

PDF(1705KB) ( 1435 )

References | Related Articles | Metrics

Predicting stock price movement is a hot topic in the financial intelligence field.So far,people have conti-nuously attempted to use various data sources in the stock price prediction,such as fundamental economic features,technical indicators,Internet public opinions,financial announcements,financial news,financial research reports and so on.However,most of the previous studies use only one or two distinct data sources to build prediction models.Few of them take advantage of three or more sources simultaneously.Undoubtedly,if more sources are provided,people can extract richer information content and consider more information levels.But,since the natures of various sources are distinct,and they have different effects on the stock market,it is not easy to converge several sources in predicting stock price.In addition,multisources naturally increase the risk of suffering the curse of dimensionality.Based on the idea of information fusion,this paper attempted to use three distinct sources to predict the stock price movement.The three sources are fundamental economic features,technical indicators and Internet public opinions.Our method firstly collects various source data,then implements the specific data preprocessing to form a unified data set,and finally uses the SVM classi-fier to build prediction models.Experimental results show that the preformance of prediction model based on the three sources is better than those which use a single source,or sources in pairs,when the linear core function for the SVM classifier is chosen and the data in the non-trading days are added.Besides,when collecting data,we found that the number of Internet public opinions rose sharply,although there were no transactions in the non-trading days (for example,weekends or the suspension period).Therefore,we added more text sentiment data showing the public opinions in the non-trading days and found that the prediction accuracy is improved.The study in this paper shows that although it is difficult to integrate multisources in the stock prediction,it is possible to produce a good predictor after the appropriate feature selection and the specific data preprocessing.

Study on Identification of Adaptive Inverse Control System Based on Dynamic Function Link Neural Network

HU Tao-tao, KANG Bo and SHAN Yao-nan

Computer Science. 2017, 44 (10): 203-208. doi:10.11896/j.issn.1002-137X.2017.10.037

Abstract

PDF(1389KB) ( 831 )

References | Related Articles | Metrics

Adaptive inverse control can eliminate the disturbance of the system and control the performance of dynamic response independently,and the performance of the adaptive inverse control system depends on the accuracy of the system object,the inverse object and the controller identification model.In this paper,the dynamic functional link neural network was proposed to realize the simultaneous on-line modeling of the adaptive inverse control system object and inverse object,and realize the off-line modeling of the controller,and the identification of model parameters was transformed into optimization of spatial parameters.Aiming at the destruction of convergent population structure by chaos initialization,this paper presented variable parameter chaotic particle swarm optimization algorithm to optimize the weights of the neural network.Through the simulation experiment,we can see that the modeling error based on the dynamic function link neural network is small and the identification accuracy based on the dynamic function link neural network is high.Compared with the current reference model adaptive control methods,the method in this paper can achieve better disturbance cancellation effect and improve the tracking response performance of the system,thus verifying the effectiveness and feasibility of the method.

Multi-objective Signal Simulation Optimization for Urban Oversaturated Arterial

GAO Guang, ZHAO Xin-can and WANG Li-ming

Computer Science. 2017, 44 (10): 209-215. doi:10.11896/j.issn.1002-137X.2017.10.038

Abstract

PDF(1569KB) ( 1060 )

References | Related Articles | Metrics

To resolve signal optimization problem of urban oversaturated arterial,a traffic signal simulation optimization model was put forward by analyzing the impact of traffic control objective on vehicle queue.This model takes green split,phase sequence,offset and cycle length as optimized parameters.Meanwhile,vehicle average delay,system average queue-lane length ratio and system capacity are selected as optimized goal.In order to implement the model,a framework was constructed,in which the microscopic traffic simulation environment established by ourselves is used to acquire the evaluation index of special signal scheme.And the repeated individual problem of multi-objective optimization algorithm (NSGA II) is improved,thereby the simultaneous optimization of signal timing scheme of arterial intersections is comleted.Finally,the collected traffic data from arterial consisted of three intersections were used to verify the model.The experimental results indicate that the proposed model not only can effectively control the vehicle queue length and ba-lance the vehicle distribution,but also has better performance on the system capacity and average delay.

Adaptive Water Wave Optimization Algorithm Based on Simulated Annealing

WANG Wan-liang, CHEN Chao, LI Li and LI Wei-kun

Computer Science. 2017, 44 (10): 216-221. doi:10.11896/j.issn.1002-137X.2017.10.039

Abstract

PDF(1364KB) ( 875 )

References | Related Articles | Metrics

Water wave optimization (WWO) is a novel evolutionary algorithm inspired by the shallow wave theory.In this paper,we developed a modified version of simplified water wave optimization algorithm (SimWWO).To fully utilize the history information and experience of the waves,we proposed an adaptive parameter adjustment strategy.The performance of waves on the evolutionary process is used as a feedback to adjust the wave length coefficient adaptively to improve search efficiency.Meanwhile,to avoid the problem of easily being lost in local optimum,the thought of simulated annealing is adopted to accept inferior solution with a certain probability.Through the above two operations,the algorithm achieves better balance between global search and local search.Computational experiments on the CEC 2015 single-objective optimization test problems show that the modified algorithm effectively improves the overall performance.

Research of Multi-branch Precipitation Probability Forecasting Model

YU Lin, LV Xin, ZHOU Si-qi and LIU Xuan

Computer Science. 2017, 44 (10): 222-227. doi:10.11896/j.issn.1002-137X.2017.10.040

Abstract

PDF(1331KB) ( 1066 )

References | Related Articles | Metrics

The size of precipitation plays a decisive role in aspects of water dispatching decision,early warning of flood and drought control,etc.Currently,many precipitation forecasting models have already been put forward.However,due to lack of the nonlinear characteristic of precipitation process consideration,the forecasting accuracy is not high.In addition,it is difficult to use a single forecast value to effectively support the judgment,leading to the fact of lower applicability results.Aimed at the above-mentioned problems,forecasting models of year-on-year branch and month-on-month branch were constructed based on the stationarity and periodicity of precipitation,and then a multi-branch precipitation probability forecasting model (MBPPFM) was proposed.The cross selection algorithm was used in the model to well screen the forecasting results from year-on-year branch and month-on-month branch.Finally,the forecasting accuracy is improved and abnormal forecasting can be avoided.At the same time,probability and confidence values are included in the forecasting results to effectively support decision making.

Anaphoricity Determination of Uyghur Personal Pronouns Based on Deep Belief Network

QIN Yue, YU Long, TIAN Sheng-wei, ZHAO Jian-guo and FENG Guan-jun

Computer Science. 2017, 44 (10): 228-233. doi:10.11896/j.issn.1002-137X.2017.10.041

Abstract

PDF(1298KB) ( 946 )

References | Related Articles | Metrics

Aiming at the problem that the noise was introduced into the research about anaphoricity determination of personal pronouns in Uyghur language,we represented a method based on deep belief networks(DBN).On the basis of analyzing the grammatical features and linguistic rules of personal pronouns in Uyghur language,we summarized the anaphoricity determination feature set containing ten features.First of all,the Restricted Boltzmann Machine(RBM) network is trained layer by layer in a greedy way,in order to make sure that the feature vector is mapped to the different space so that the characteristic information can be retained as much as possible.Then,the BP network in the last layer is set up and the features of the output vector about RBM are classified,as well as the entire network is trained in a supervised way and it is fine-tuned.The experimental result shows that the accuracy rate of correct recognition of anaphoricity determination about Uyghur personal pronouns reaches 95.17%,which is improved by 9% compared to that of SVM algorithm,and the validation and availability of the method are demonstrated.

Research on Member Search Engine Selection in Meta Search

LIU Deng-hong and XU Xian

Computer Science. 2017, 44 (10): 234-236. doi:10.11896/j.issn.1002-137X.2017.10.042

Abstract

PDF(1164KB) ( 749 )

References | Related Articles | Metrics

With the popularity of network,searching online becomes the main way to get information.Compared to independent search engine usually with limited coverage,meta search engine can meet the needs of information retrieval in a better way.When a query is input in the unified interface provided by meta search,it first processes the query and then sends it to appropriate member search engines.An important problem is how to find the underlying search engines which can optimally reply to the user query.In this paper,we proposed a mechanism based on genetic algorithm,which also takes the weight of each member search engine into account.The experimental results show that our method can indeed improve efficiency and accuracy on engine selection.

Ecological Pyramid Particle Swarm Optimization

LIU Ya-hong, ZHANG Wei and FAN Lv-bin

Computer Science. 2017, 44 (10): 237-244. doi:10.11896/j.issn.1002-137X.2017.10.043

Abstract

PDF(1512KB) ( 845 )

References | Related Articles | Metrics

A novel ecological pyramid particle swarm optimization variant was proposed to deal with the high dimensions,complex optimization problems.In the new variant,the ecological pyramid system was introduced to improve the particle’s diversity.At the same time,the variation both on the local exemplar and the global exemplar was also employed extending the search space.To verify the effectiveness of the algorithm,fifteen benchmark problems were used to test the performance of EP-PSO.Experimental results validate the outstanding performance of EP-PSO.Compared with other algorithms,EP-PSO not only obtained high accuracy solutions,but also achieved high efficiency and reliability.

Determining Optimal Number of Subprocesses in Business Process Model Abstraction

SUN Shanwu and WANG Nan

Computer Science. 2017, 44 (10): 245-248. doi:10.11896/j.issn.1002-137X.2017.10.044

Abstract

PDF(1252KB) ( 810 )

References | Related Articles | Metrics

ion SUN Shan-wu WANG Nan (College of Management Science and Information Engineering,Jilin University of Finance and Economics,Changchun 130117,China) (Laboratory of Logistics Industry Economy and Intelligent Logistics,Jilin University of Finance and Economics,Changchun 130117,China) (Jilin Province Key Laboratory of Internet Finance,Jilin University of Finance and Economics,Changchun 130117,China) Abstract According to the characteristics of the business process model,this paper proposed a method to determine the optimal number of subprocesses based on the k-means activity clustering algorithm with two different constraints given in the previous work.Combining the assumption for the process structure with the threshold restriction of activity semantics,the method of determining the appropriate upper bound of the number of subprocesses is given in order to reduce the number of iterations.According to the change of k value,based on the characteristics of structural compactness of the subprocesses and the refined process structure tree,an incremental approach is designed to simplify the incremental of the cluster centers.A reasonable index is designed to evaluate the abstract result model,and then the optimal number of subprocesses is generated.The proposed method is applied to a process model repository in use,and the number of the optimal subprocesses is very close to the result given by the modelers involved.

Chinese Medical Weak Supervised Relation Extraction Based on Convolution Neural Network

LIU Kai, FU Hai-dong, ZOU Yu-wei and GU Jin-guang

Computer Science. 2017, 44 (10): 249-253. doi:10.11896/j.issn.1002-137X.2017.10.045

Abstract

PDF(1176KB) ( 955 )

References | Related Articles | Metrics

With medical field are receiving more and more attention,the theory and application of natural language processing began to expand the field,and information extraction technology in the field of application has become a research hotspot.In this paper,based on the application of information extraction technology in medical domain entity relation extraction,a weak supervised relation extraction method based on convolution neural network was proposed.This me-thod adds the artificial rules to the training corpus with the entity relation label,and then transforms the weak relation training corpus into the vector characteristic matrix,next inputs it into the convolution neural network for training the classification model,and finally realizes the entity relation extraction.The experimental results show that the method is more accurate and efficient than the conventional machine learning method.

Review Spam Detection Approach Based on Topic Model and Sentiment Analysis

JIN Xiang-hong, LI Lin and ZHONG Luo

Computer Science. 2017, 44 (10): 254-258. doi:10.11896/j.issn.1002-137X.2017.10.046

Abstract

PDF(1172KB) ( 1039 )

References | Related Articles | Metrics

With the rapid development of e-commerce,consumers have accepted online shopping increasingly,and the product reviews then have an great influence on consumers’ purchase decision.Product reviews refer to the evaluation or comment information of items or products written by online shopping users.These comments usually include some review spams that may hurt user shopping experiences.Review spam detection,therefore,becomes one of the important problems to improve service quality.In this paper,a review spam detection approach called LDA-SP(LDA-sentiment polarity) was proposed by carefully analyzing the main characteristic of review spams.First,we used LDA topic model to filter the irrelevant reviews,and then applied sentiment analysis to identify the untruthful reviews.Experiments were conducted on a large number of reviews data on a online shopping mall.Our experimental results show that the detection accuracy of LDA-SP method is higher than that of the traditional LDA topic model and the single sentiment polarity analysis method.It can effectively detect review spams,so that more objective and accurate information about products will be displayed to the users of e-commerce.

Effect of Preprocessing on Corpus of Mongolian-Chinese Statistical Machine Translation

LI Jin-ting, HOU Hong-xu, WU Jing, WANG Hong-bin and FAN Wen-ting

Computer Science. 2017, 44 (10): 259-264. doi:10.11896/j.issn.1002-137X.2017.10.047

Abstract

PDF(1145KB) ( 875 )

References | Related Articles | Metrics

The traditional methods of morphology preprocessing use Mongolian suffix segmentation and stemming,which leads to semantic loss of the words.The additional components of Case is a special additional component of the Mongolian word suffix which only represents the syntactic information of the sentence but not the semantic information of the words.Inappropriate preprocessing of the Case causes data sparsity to the machine translation training.Therefore,we summarized and researched the existing corpus preprocessing method of Mongolian morphology to compare the results.Our methods mainly focus on the effect of Case processing and improve the performance of Mongolian-Chinese SMT system of 3.22 relative BLEU score compared to the baseline system.

Cognitive Modeling Based on Binary Matrix Factorization

ZHANG Meng, FU Li-hua, HE Ting-ting and YANG Qing

Computer Science. 2017, 44 (10): 265-268. doi:10.11896/j.issn.1002-137X.2017.10.048

Abstract

PDF(1207KB) ( 1045 )

References | Related Articles | Metrics

A novel logistic binary matrix factorization (LBMF) was proposed to predict the students’ performance and to classify the exam items.Besides a new algorithm was designed to tackle the non-convex optimization problem involved in LBMF.The experiments are performed on both simulated data and real data.The results indicate that LBMF can not only predict the students’ academic performance but also classify the examination items according to the know-ledge points they require.And it can be concluded that LBMF outperforms significantly the out-of-date algorithms in the applications.

Single Line Transit Mixed Scheduling Model Based on Vehicle Departure Timetable

WANG Yang and SHEN Ji-quan

Computer Science. 2017, 44 (10): 269-275. doi:10.11896/j.issn.1002-137X.2017.10.049

Abstract

PDF(1439KB) ( 1019 )

References | Related Articles | Metrics

Aiming at the shortage of the classification of passengers and handling method of stranded passengers in the single line transit mixed model with express bus,this paper supplemented the model by three steps.Firstly,the composition and transformation of the passengers were systematically discussed,meanwhile,a method based on the bus quantity of stranded passenger waiting for,stranded reason and the distance of destination station was put forward to deal with the stranded passenger distribution,and the time cost of the stranded passengers was calculated by this method.Secondly,the departure timetable was established with the combination of three variables-vehicle type,departure mode and headway,and then the variables of operating timetable were calculated based on this departure timetable information,and next the bus services indicators,passenger and vehicle related costs were also calculated.Finally,according to the characteristics of the problem,the max-min ant colony system algorithm was used to solve the model.With a given number of vehicles and a period of time,a comparative experiment was taken to analyze the optimal timetable solution of four scheduling strategies and its corresponding optimal solution’s bus service levels and total system cost.The experimental results demonstrate that the proposed model balances the passenger flow,and minimizes the cost of passenger time and vehicle fuel consumption.

Text Data Preprocessing Based on Term Frequency Statistics Rules

CHI Yun-xian, ZHAO Shu-liang, LUO Yan, GAO Lin, ZHAO Jun-peng and LI Chao

Computer Science. 2017, 44 (10): 276-282. doi:10.11896/j.issn.1002-137X.2017.10.050

Abstract

PDF(1634KB) ( 1345 )

References | Related Articles | Metrics

In age of big data,it is a severe problem that feature t erms are faced with “high-dimension and sparse” challenge in text mining.Contradiction between enormous scale of terms and scarce of features will cause high-time-space complexity and poor efficiency,and restricts the efficiency of text mining seriously.Thus,it is crucial to preprocess data before mining text.Terms-dividing and stop-words-deleting are operated merely in data preprocessing of traditional text mining algorithms.In order to improve process of data preprocessing,data preprocessing algorithm based on term frequency statistics rules (DPTFSR) was proposed.To begin with,expression about number of terms with identical frequency is deduced based on Zif’s Law and rule of maximum area.What’s more,regularities of distribution based on terms with identical frequency is explored.It is discovered that proportion of low-frequency terms in documents reach up to 2/3,but there is little relevancy between them.Lastly,data is preprocessed based on terms frequency statistics rules.Low-frequency terms are deleted,and features dimension is decreased greatly.Correctness of term frequency statistics rules and validity of algorithm DPTFSR are verified on data sets from Reuters-21578 and 20-Newgroups.Experimental results show that accuracy,precision,recall and F1 measure are increased,and running time is shortened obviously.Thus,efficiency of text mining is significantly enhanced.

Method of Short Text Opinion Recognition Based on Feature Extension and Deep Learning

DU Yong-ping, CHEN Shou-qin and ZHAO Xiao-zheng

Computer Science. 2017, 44 (10): 283-288. doi:10.11896/j.issn.1002-137X.2017.10.051

Abstract

PDF(1492KB) ( 862 )

References | Related Articles | Metrics

This paper put forward the opinion recognition method on microblog short text,which contains a small amount of information,and the feature is sparse.The review and repost information of microblog were used to reconstruct the original microblog text.The tool of Word2vec was adopted to cluster the similar sentiment word for feature extension.And also the feature was learned by deep belief network,which achieves the high-quality sentiment feature.The experimental result on the data of COAE (Chinese opinion analysis evaluation) 2015 denotes that our method alleviates the problem of feature sparseness and also more effective sentimental features are mined.The system performance is improved with the precision of 64.1%。

Multi-label Feature Selection Algorithm Based on Label Weighting

LIN Meng-lei, LIU Jing-hua, WANG Chen-xi and LIN Yao-jin

Computer Science. 2017, 44 (10): 289-295. doi:10.11896/j.issn.1002-137X.2017.10.052

Abstract

PDF(1591KB) ( 1060 )

References | Related Articles | Metrics

In multi-label learning,each sample is described as a feature vector and simultaneously associated with multiple class labels.Feature selection is able to remove irrelevant and redundant features,which is an efficient measure of overcoming the curse of dimensionality for multi-label data.Label has different separability with sample,which may provide some usefull informations for multi-label learning.Based on this assumption,a multi-label feature selection algorithm based on label weighting was proposed in this paper.First,the margin of sample in all feature space is calculated and it is used as label weighting.Then,the distinguishability of feature is adopted based on label set for calculating feature weighting,which will measure the importance degree of feature.Finally,all features are sorted by the value of feature weighting.Experiment was conducted on four multi-label datasets,and four evaluation criteria were used to mea-sure the effectiveness of our method.Experimental results show that the proposed algorithm is superior to several state-of-the-art multi-label feature selection algorithms.

Extraction Method of Sentimental Feature Vector Based on Semantic Similarity

LIN Jiang-hao, ZHOU Yong-mei, YANG Ai-min and CHENG Jin

Computer Science. 2017, 44 (10): 296-301. doi:10.11896/j.issn.1002-137X.2017.10.053

Abstract

PDF(1378KB) ( 1196 )

References | Related Articles | Metrics

In order to fill the gap of the semantic representation and domain expansion on sentimental features,in this paper,an extraction method of sentimental feature vector based on semantic similarity was proposed.First of all,the Word2vec model is trained based on 250 thousand sogou news texts and 500 thousand micro-blog texts.Eighty sentimental words,which are obvious sentiment,rich content and diverse POS,are chosen as a set of seed words.Then,the semantic similarity between the candidate sentimental words and the seed words are calculated based on their word vectors.The sentimental words are mapped to the high dimensional vector space and the feature vector representation (Senti2vec) is extracted.Senti2vec is applied into the similarity analysis of sentimental synonyms and antonyms,polarity classification of sentimental words and sentimental text analysis.The experimental results show that Senti2vec can represent the meaning and sentiment of the sentimental words.Senti2vec is based on semantic similarity calculation from large scale of data,which enables this method more adaptable into different domains.

Double Weighted Collaborative Representation Based Classification for Crop Leaf Disease Image Recognition

DU Hai-shun, JIANG Man-man, WANG Juan and WANG Sheng

Computer Science. 2017, 44 (10): 302-306. doi:10.11896/j.issn.1002-137X.2017.10.054

Abstract

PDF(1418KB) ( 908 )

References | Related Articles | Metrics

Crop disease is one of the main agriculture disasters in our country.It is critical to prevent and control crop disease to recognize the category of crop disease.In this paper,we acquired 441 images composed of 22 kinds of crop leaf disease images of wheat,maize,peanut,and cotton.For each crop leaf disease image,we extracted its leaf and disease spot features after the leaf and disease spot have been segmented out,respectively.Furthermore,we combined the leaf and disease spot features into a feature vector,and then normalized the feature vector by max-min normalization operation.Using the feature vectors of all crop leaf disease images,we constructed a crop leaf disease dataset.By considering both the importance of data features and the data locality,we proposed a double weighted collaborative representation-based classification (DWCRC) method for crop leaf disease recognition.Experimental results on the crop leaf disease dataset show that DWCRC is more effective than the state-of-the-art methods for crop leaf disease recognition.

Marking Points Tracking for 3D Dynamic Data Correspondence

PAN Xiang, LIN Jun-mian, WANG Xue-cheng, LIU Zhi and ZHOU Xiao-long

Computer Science. 2017, 44 (10): 307-311. doi:10.11896/j.issn.1002-137X.2017.10.055

Abstract

PDF(1387KB) ( 768 )

References | Related Articles | Metrics

Aiming at that 3D animation feature point matching causes wrong correspondence,this paper proposed interactive mark and motion tracking to improve the reliability and stability of feature point atching.Firstly,the algorithm marks the point on some specified frames.Then,it gets the positions on other frames through motion tracking and optimal prediction window.Finally,the tracking points are used to build isometric bipartite graph for final correspondence.In experiment,the algorithm can get better alignment accuracy than existing algorithms.

Video Saliency Detection Based on Compressed Domain Coding Length

ZHANG Zhao-feng, WU Ze-min, DU Lin and HU Lei

Computer Science. 2017, 44 (10): 312-317. doi:10.11896/j.issn.1002-137X.2017.10.056

Abstract

PDF(1438KB) ( 830 )

References | Related Articles | Metrics

Biology studies show that people will pay much attention to the moving object when they watch a video.In order to simulate this feature and detect the salient region rapidly,we proposed the temporal-spatial saliency in compress domain model(TS2CD).By respectively using H.264 residual coding length and motion vector coding length,we simulated the salient stimulus intensity and then got video saliency features.Finally,we used the linear weighted fusion algorithm to get the final video saliency maps.Experimental results on three public datasets demonstrate that our model outperforms state-of-the-art methods.

Research on Low-resource Mongolian Speech Recognition

ZHANG Ai-ying and NI Chong-jia

Computer Science. 2017, 44 (10): 318-322. doi:10.11896/j.issn.1002-137X.2017.10.057

Abstract

PDF(1254KB) ( 925 )

References | Related Articles | Metrics

With the development of speech recognition technology,the research on low-resource speech recognition has gained extensive attention.Taking the Mongolian as the target language,we studied how to use the multilingual information to improve the performance of speech recognition in the low-resource condition,for example,only 10 hours of transcribed speech data are used for acoustic modeling.More discriminative acoustic model can be gotten by using cross-lingual transfer of multilingual deep neural network and multilingual deep bottleneck features.Large amount of web pages can be gotten by using the web search engine and Web crawler,which can help to get large amount of text data for improving the performance of language model.It can further improve the recognition results by fusing different number of recognition results from different recognizers.Comparing the fusion recognition result with the recognition result of baseline system,there are nearly 12% absolute word error rate (WER) reductions.