Computer Science

Survey on Cost-sensitive Deep Learning Methods

WU Yu-xi, WANG Jun-li, YANG Li, YU Miao-miao

Computer Science. 2019, 46 (5): 1-12. doi:10.11896/j.issn.1002-137X.2019.05.001

Abstract

PDF(1632KB) ( 4768 )

References | Related Articles | Metrics

Cost-sensitive learning method can effectively alleviate the problem of data imbalance in classification tasks and has been successfully applied to various traditional machine learning techniques.With the continuous development of deep learning technology,cost-sensitive method has become a research hotspot again.The combination of deep learning with cost-sensitive methods can not only breaks through the limitations of traditional machine learning technology,but also improve the data sensitivity and classification accuracy of the model,especially when there is a certain imbalance in the data.However,how to effectively combine theabove two factors has become the focus and difficulty of the research.From the aspects of network structure,loss function and training method,researchers have improved the performance of the deep learning model combined with cost-sensitive method.In this paper,the development of the combination of deep learning and cost-sensitive method was described in detail,several innovative models were analyzed and the classification performance of these model was compared.Finally,the development trend of combination of deep learning and cost-sensitive method was discussed.

Research Progress on Techniques for Concurrency Bug Detection

BO Li-li, JIANG Shu-juan, ZHANG Yan-mei, WANG Xing-ya, YU Qiao

Computer Science. 2019, 46 (5): 13-20. doi:10.11896/j.issn.1002-137X.2019.05.002

Abstract

PDF(1342KB) ( 2004 )

References | Related Articles | Metrics

The advent of multi-core era makes concurrent programming more and more popular.However,concurrent programs could easily lead to concurrency bugs due to their inherent concurrency nature and non-deterministic thread scheduling.It is critical to detect concurrency bugs effectively and efficiently.First,concurrency bugs were divided into five categories(i.e.,type-state bug,deadlock,data race,atomicity violation and order violation).Then,concurrency bug detection techniques were classified into static analysis,dynamic analysis and the combination analysis in items of running programs,each with the detailed analysis,comparisons as well as accurate summarizations.Next,the universality of the existing detection techniques was analyzed.Finally,the research directions on concurrency bug detection in the future were discussed.

Survey on Trustworthy Compilers for Synchronous Data-flow Languages

YANG Ping, WANG Sheng-yuan

Computer Science. 2019, 46 (5): 21-28. doi:10.11896/j.issn.1002-137X.2019.05.003

Abstract

PDF(1478KB) ( 1811 )

References | Related Articles | Metrics

Synchronous data-flow languages,such as Lustre and Signal,have been widely used in safety-critical industrialareas,such as airplane,high-speed railways,nuclear power plants and so on.Hence,the safety of development tools themselves for such languages has been paid highly attention on.The trustworthy compiler from a synchronous data-flow language to a sequential imperative language is typically one of such kinds of tools,such as Scade.There are two ways to implement a trustworthy compiler:the traditional method,for instance,plenty of testing and strict process ma-nagement;the formal method,for example,formal verification for the compiler itself,and translation validation,etc.The formal method has been paid much attention in recent years,because it has been widely studied as the critical approach in the construction and verification of a trustworthy compiler,and it is expected to have the opportunity in solving the “miscompilation” problem to the utmost extent.After the introduction of formal methods to construct and verify a trustworthy compiler,the survey and analysis of the research work and the current status for the trustworthy compilers of synchronous data-flow language were specially focused on in this paper.

Research on Privacy Protection Strategies in Blockchain Application

DONG Gui-shan, CHEN Yu-xiang, FAN Jia, HAO Yao, LI Feng

Computer Science. 2019, 46 (5): 29-35. doi:10.11896/j.issn.1002-137X.2019.05.004

Abstract

PDF(2040KB) ( 2652 )

References | Related Articles | Metrics

In recent years,more and more privacy protection requirements have been put forward for identity management systems and user-centric self-sovereign identity.As an important means to solve privacy protection problems,blockchain is used by more and more applications.Aiming at the problem of privacy protection in blockchain applications,firstly,this paper studied the privacy protection strategies of mainstream encrypted currencies,including anonymous processing of sender,receiver,content and other links,setting of blockchain access right,innovative methods such as side chain and payment channel,classified storage of data,etc.Then,the efficiency,emphasis and application prospect of each privacy protection strategy were analyzed.Specially,the importance of zero knowledge proof to distributed application based on blockchain was analyzed.Finally,this paper introduced and analyzed the privacy protection strategies in smart contracts,identity management,supply chain and other practical fields,and put forward the prospects of future direction.

Survey on Sequence Assembly Algorithms in High-throughput Sequencing

ZHOU Wei-xing, SHI Hai-he

Computer Science. 2019, 46 (5): 36-43. doi:10.11896/j.issn.1002-137X.2019.05.005

Abstract

PDF(1445KB) ( 2710 )

References | Related Articles | Metrics

High-throughput sequencing technology is a new sequencing method developed after the first generation sequencing technology,also known as next-generation sequencing technology.Different from the automatic and semi-automatic capillary sequencing method based on Sanger,the high-throughput sequencing technology adopts the parallel sequencing technology based on pyrosequencing.It not only conquers the shortcomings of high cost,low throughput and low speed of the first generation sequencing technology,but also meets the demands of the rapid development of modern molecular biology and genomics with low cost,high throughput and fast speed.Compared with the first generation sequencing data,high-throughput sequencing data are characterized by short lengths,uneven coverage and low accuracy,and the third-generation sequencing technology adopts more efficient single molecular real-time sequencing and Nanopore sequencing technology as well as the principle of sequencing and synthesis,which has the advantages of high throughput,low cost and long sequencing data.Therefore,in order to obtain complete genome sequence,a technique is needed to assemble short sequencing reads into a complte single-stranded sequence of genes.In this case, the sequence assembly algorithm was proposed.Firstly,the development background of sequence assembly algorithms and the related concepts of high-throughput sequencing technology were introduced,and the advantages of high-throughput sequencing technology on sequence assembly were analyzed.Secondly,by summarizing the development of sequence assembly algorithms.The sequence assembly algorithms were illustrated,according to the algorithm classifications,respectively,by greedy strategy,Overlap-Layout-Consensus (OLC) strategy and De Bruijn Graph (DBG) strategy.Finally,the research direction and development trend of sequence assembly algorithms were discussed.

Optimization Model of Working Mode Transformation Strategies for Wireless Sensor Nodes

ZHAO Ning-bo, LIU Wei, LUO Rong, HU Shun-ren^1,3

Computer Science. 2019, 46 (5): 44-49. doi:10.11896/j.issn.1002-137X.2019.05.006

Abstract

PDF(1900KB) ( 1113 )

References | Related Articles | Metrics

It’s efficient to transform the working modes of wireless sensor network for raising energy efficiency,while the current control strategies reach a plateau with much manual intervention but few indicators.Combing the finite state machine algorithm and the reinforcement learning algorithm,this paper established a working mode transformation model.Based on this model,while adopting energy consumption and data throughput as two indicators,this paper used difference matrix to evaluate transformation strategy,constructed characteristic function to estimate the energy efficiency,and then established an optimization model to judge the strategies by two steps.Compared with the common control strategy,the model reduces energy consumption by about 57%,but only lost about 14% of data throughput.It significantly improves energy efficiency,and provides model support and theoretical guidance for node work mode control.

Two-phase Multi-target Localization Algorithm Based on Compressed Sensing

LI Xiu-qin, WANG Tian-jing, BAI Guang-wei, SHEN Hang

Computer Science. 2019, 46 (5): 50-56. doi:10.11896/j.issn.1002-137X.2019.05.007

Abstract

PDF(2873KB) ( 1463 )

References | Related Articles | Metrics

The RSS-based multi-target location has the natural property of the sparsity in wireless sensor networks.In this paper,a two-phase multi-target localization algorithm based on compressed sensing was proposed.This algorithm divides the grid-based target localization problem into two phases:coarse location phase and fine location phase.In the coarse location phase,the optimal number of measurements is determined according to the sequential compressedsen-sing,and then the locations of the initial candidate grids are reconstructed by l_p optimization.In the fine location phase,all candidate grids are continually divided by quadripartition method,and the accurate locations of targets in the corresponding candidate grids are estimated by using the minimum residual principle.Compared with the traditional multi-target localization algorithm using l₁ optimization,the simulation results show that the proposed localization algorithm has better localization performance when the number of targets is unknown.Meanwhile,the localization time is significantly reduced.

Node Encounter Interval Based Buffer Management Strategy in Opportunistic Networks

ZHANG Feng

Computer Science. 2019, 46 (5): 57-61. doi:10.11896/j.issn.1002-137X.2019.05.008

Abstract

PDF(1851KB) ( 1073 )

References | Related Articles | Metrics

Opportunistic network which employs store-carry-and-forward pattern doesn’t consider encounter possibility of destination nodes and passing nodes during message transmission,leading to a large deviation for the estimation on message transmission status.This paper proposed an encounter interval based buffer management strategy.It facilitates the characteristic of exponential distribution for the encounter intervals between nodes,and the number of message co-pies is also considered.So the average delivery probability is estimated.All messages stored in the buffer are sorted according to the average delivery probability when the buffer is overflow,and the messages with low delivery probability will be dropped first for buffer management intent.Simulation results show that the proposed strategy achieves better performance compared to existing schemes in respect of the delivery ratio,average latency and overhead ratio.

Graph Theory Based Interference Coordination for H2H/M2M Coexisting Scenarios

SUI Nan-nan, XU You-yun, WANG Cong, XIE Wei, ZHU Yun

Computer Science. 2019, 46 (5): 62-66. doi:10.11896/j.issn.1002-137X.2019.05.009

Abstract

PDF(2328KB) ( 1151 )

References | Related Articles | Metrics

For human-to-human (H2H) and machine-to-machine (M2M)communications coexisting scenarios in the co-channel deployed LTE-A heterogeneous networks,a maximum independent set (MIS) based interference coordination and resource block (RB) expansion allocation algorithm (CGMMIS) was proposed to maximize the system sum rate while ensuring RB allocation continuity.Firstly,the interference graph is obtained according to the relative interference between two nodes.Secondly,CGMMIS algorithm divides the nodes with strong interference to each other into different MISs and maximizes the sum channel gain of the nodes in the MIS.However,a node may belongs to multiple MISs under this circumstance.Lastly,in order to guarantee the consecutive allocation of RBs,the RB expansion allocation me-thod is exploited in CGMMIS algorithm,in which the node will only select the MIS that can maximize its achievable rate.Simulation results demonstrate that in the dense deployment scenario of M2M devices,the proposed CGMMIS algorithm is superior to both the non-cooperative algorithm and the random graph coloring based MIS search algorithm in terms of system sum rate.

K-level Region Coverage Enhancement Algorithm Based on Irregular Division

JIANG Yi-bo, HE Cheng-long, MEI Jia-dong, WANG Nian-hua

Computer Science. 2019, 46 (5): 67-72. doi:10.11896/j.issn.1002-137X.2019.05.010

Abstract

PDF(4737KB) ( 1271 )

References | Related Articles | Metrics

Based on the depth analysis and comparison of the existing K-level area coverage algorithm reducing the star-ting number of the sensors’ nodes,the entire monitoring region was divided through using the node-sensing region boundary.The scanning method was introduced to quickly judge the basic segmentation cell collection within the node-sensing zone,and the node weight function was designed to judge the sequence of enablement.Based on the environment variables and random distribution strategy and other factors,a node is selected to start firstly.Then it drives the neighboring nodes to start so as to achieve the K-level coverage over the whole monitoring area.On the basis of this analysis,this paper further proposed an irregular divisionarea coverage enhancement algorithm(IDACEA).A series of simulated experiment results show that this algorithm can reduce the number of sensor activations and achieve K-level coverage of monitoring area.

Novel Fault Diagnosis Parallel Algorithm for Hypercube Networks

GUO Yang, LIANG Jia-rong, LIU Feng, XIE Min

Computer Science. 2019, 46 (5): 73-76. doi:10.11896/j.issn.1002-137X.2019.05.011

Abstract

PDF(1397KB) ( 1245 )

References | Related Articles | Metrics

Hypercube is one of valuable interconnection networks.Aiming at the problem of high complexity of existing fault diagnosis algorithm in hypercube network,this paper proposed a concept of fault fan.The parallel depth-first search strategy algorithm is used to find the fault fan in hypercube networks,and the fault node of network is determined in order to replace or repair it,which provides a significant way for enhancing the reliability of network.In the end,the complexity of the proposed algorithm was analyzed.It is proved that the time complexity of the algorithm does not exceed O(N),which is far better than the algorithm with more than square complexity.

DV-Hop Localization Algorithm Based on Grey Wolf Optimization Algorithm with
Adaptive Adjutment Strategy

SUN Bo-wen, WEI Su-yuan

Computer Science. 2019, 46 (5): 77-82. doi:10.11896/j.issn.1002-137X.2019.05.012

Abstract

PDF(1547KB) ( 1081 )

References | Related Articles | Metrics

Aiming at the problem of the least square estimation error in traditional distance vector-hop (DV-Hop) algorithm for wireless sensor networks,a fusion algorithm of improved grey wolf optimization(GWO) and DV-Hop was proposed.Firstly,the traditional DV-Hop algorithm is used to estimate the distance between beacon nodes and unknown nodes.Secondly,GWO algorithm with adaptive strategy is employed to replace the least square method to estimate the position of unknown nodes.The improvements include the introduction of good-points sets for initial wolves individuals to improve the ergodicity of the initial population.In order to speed up the update of population position,the control parameter a is adaptive adjusted and the population position is updated according to the fitness values of α,β and σ.Finally,the mirroring strategy is adopted to deal with the estimated cross-border node.Experimental results show that the proposed algorithm has high positioning accuracy and good stability compared with the traditional DV-Hop algorithm,the literature [1]’s algorithm and the literature [2]’s algorithm.

Collusion Behavior Detection Towards Android Third-party Libraries

ZHANG Jing, LI Rui-xuan, TANG Jun-wei, HAN Hong-mu, GU Xi-wu

Computer Science. 2019, 46 (5): 83-91. doi:10.11896/j.issn.1002-137X.2019.05.013

Abstract

PDF(1795KB) ( 1471 )

References | Related Articles | Metrics

Third-party library is an important part of Android applications.Application developers often introduce some third-party libraries with specific functions forrapid development.Concerning the risk of collusion in Android third-party libraries,this paper studied the collusion of Android third-party libraries.Android third-party libraries and applications belong to different interests.Communication behaviors hidden in third-party libraries can be considered as a special case of application collusion,and it will also lead to privilege escalation and component hijacking.Furthermore,these behaviors can cause excessive system consumption,and even trigger security threats.This paper presented a systematic survey of existing research achievements of the domestic and foreign researchers in recent years.First,this paper gave the definition of collusion,and analyzed the risks of the collusion behavior in Android third-party libraries.Then,it pre-sented the design of the Android third-party library collusion behavior detection system in detail.For the 29 third-party libraries in the test set,the experiment shows that the accuracy of this design is 100%,the recall rate is 89.66%,and the F-measure value is 0.945.At the same time,the downloaded 1207 third-party libraries were analyzed.The experiments also verify the resource consumption caused by non-sensitive information collusion behavior of 41 domestic famous third-party libraries.Finally,this paper concluded the work and gave a perspective of the future work.

Malware Detection Algorithm for Improving Active Learning

LI Yi-hong, LIU Fang-zheng, DU Zhen-yu

Computer Science. 2019, 46 (5): 92-99. doi:10.11896/j.issn.1002-137X.2019.05.014

Abstract

PDF(3084KB) ( 1603 )

References | Related Articles | Metrics

The traditional malware detection technology relies on a large number of labeled samples.However,the number of marked labels is often less for the new malwares,so the traditional machine learning detection methods are difficult to get good detection results.Therefore,this paper proposed a malware detection algorithm based on active lear-ning.It contains a sample selection strategy based on Maximum Distance and a sample tagging strategy based on Minimum Risk Estimate,which can achieve better detection results with a small number of marked samples.Experimental results show that the proposed algorithm performs better than the overall detection method without active lear-ning,and the active learning effect is better when the number of labeled samples is 10% compared with the random selection strategy.Moreover,the algorithm has better temporal performance than the active learning strategy of artificial tagging strategy.

Anti-eavesdropping Physical Layer Transmission Scheme Based on Time-reversal in D2D Communication Link

LI Fang-wei, ZHOU Jia-wei, ZHANG Hai-bo

Computer Science. 2019, 46 (5): 100-104. doi:10.11896/j.issn.1002-137X.2019.05.015

Abstract

PDF(1662KB) ( 1055 )

References | Related Articles | Metrics

In view of the physical layer security problem that the communication information between D2D users is easy to be bugged,a transmission scheme based on time inversion (TR) technology was proposed to improve the safety rate.Firstly,the MISO wiretapping channel model is created,and TR technology on D2D link is used to improve system safety rate due to TR technology’s spatiotemporal focusing characteristics.Secondly,an interference cooperation mechanism is designed to guarantee the security rate of the system.In this mechanism,the Stackelberg game auction model is established to guarantee the help of the interference users to the D2D users,and the existence of the Nash equilibrium (NE) of the game model is proved.In the end,the simulation results show thatthe proposed safe transmission scheme improves the safety rate performance effectively compared with the existing physical layer transmission scheme.

Risk Modeling for Cyber-physical Systems Based on State/Event Fault Trees

XU Bing-feng, HE Gao-feng, ZHANG Li-ning

Computer Science. 2019, 46 (5): 105-110. doi:10.11896/j.issn.1002-137X.2019.05.016

Abstract

PDF(1586KB) ( 1262 )

References | Related Articles | Metrics

The cyber-physical system is prone to be attacked by the network attacker because of the application of embedded system network in it,and the attacker may utilize the vulnerabilities in the software and communication components to control the system,resulting in a system failure.The existing modeling methods of integrating safety and securi-ty are built on traditional static fault trees,and don’t consider the characteristics of dynamic and temporal dependencies of the software control system,so they can’t infer the final impacts caused by network attracts.In light of this,this paper presented a modeling method of integrating safety and security of cyber-physical systems.Firstly,the Attack-SEFTs model is proposed based on SEFTs model.On this basis,common vulnerabilities in the cyber physical system are proposed,and various vulnerability patterns are modeled based on Attack-SEFTs.Secondly,the unified representation of the Attack-SEFTs model is presented to support its analysis.Finally,a case study is described specially to show the feasibi-lity of the proposed method.

Comparison of DGA Domain Detection Models Using Deep Learning

PEI Lan-zhen, ZHAO Ying-jun, WANG Zhe, LUO Yun-qian

Computer Science. 2019, 46 (5): 111-115. doi:10.11896/j.issn.1002-137X.2019.05.017

Abstract

PDF(1273KB) ( 1858 )

References | Related Articles | Metrics

For solving the problem of detection diffculty of the DGA domain name,this paper proposed a new DGA domain detection model from the viewpoint of character level by deep learning model.The model consisted of character embedding layer,feature detection layer and classification prediction layer.The character embedding layer realizes the digital encoding of DGA domain.The feature detection layer adopts the deep learning model to extract features automati-cally,and the classification prediction layer adopts neural network for classification prediction.In order to select the optimal model of feature extraction,the LSTM and GRU models using Bidirectional mechanism,Stack mechanism,Attention mechanism,CNN models and CNN models integrated respectively with LSTM and GRU model were compared.The results show that the LSTM and GRU models using Stack mechanism and Attention mechanism integrated with Bidirectional mechanism,CNN models and CNN models integrated with LSTM and GRU model can improve the detection effect.The DGA domain detection model using CNN model integrated with Bi-GRU can obtain the optimum detection effect.

High-performance Association Analysis Method for Network Security Alarm Information

FU Ze-qiang, WANG Xiao-feng, KONG Jun

Computer Science. 2019, 46 (5): 116-121. doi:10.11896/j.issn.1002-137X.2019.05.018

Abstract

PDF(1895KB) ( 1340 )

References | Related Articles | Metrics

In the network security defense system,the intrusion detection system will produce massive redundancy and wrong network security warning information in real time.Therefore,it is necessary to mine frequent item patterns from association rules and sequential patterns of alert information,distinguish normal behavior patterns,and screen out real attack information.Compared with Apriori,FP-growth and other algorithms,COFI-tree algorithm possesses bigger advantages of performance ,but it still can not meet the needs offast analysis on large-scale network security information.To this end,this paper proposed an improved network security alert information association analysis algorithm based on COFI-tree algorithm.The algorithm improve the performance of COFI-tree algorithm through node addressing mode based on reverse linked list and frequent item processing method based on new SD structure.The experimental results based on Kddcup99 dataset show that this method can basically guarantee the accuracy,reduce a lot of computing overhead,shorten processing time by more than 21% on average compared with the traditional Cofi algorithm,and solve the problem of low speed in association analysis under massive network alarm information.

Pre-cache Based Privacy Protection Mechanism in Continuous LBS Queries

GU Yi-ming, BAI Guang-wei, SHEN Hang, HU Yu-jia

Computer Science. 2019, 46 (5): 122-128. doi:10.11896/j.issn.1002-137X.2019.05.019

Abstract

PDF(1778KB) ( 1186 )

References | Related Articles | Metrics

Location data bring huge economic benefits,but the problem of leaking location privacy also follows.Aiming at the problem of maximum movement boundary (MMB) attack in continuous R-range queries,this paper proposed a pre-cache based privacy protection mechanism.First,a pseudo-random generalization method is proposed to control the generalized area of the snapshot query on the basis of protecting location privacy.Then,the upcoming intersection is predicted within the generalized query area.The next generalized query area is calculated and pre-cached with the intersection position.Through the pre-caching method,the time correlation between consecutive queries is reduced and the privacy protection level is improved.Performance analysis and experimental results show that the proposed privacy protection mechanism can effectively reduce the privacy leakage caused by the maximum movement boundary attack.

Malicious Information Source Locating Algorithm Based on Topological Extension in Online Social Network

YUAN De-yu, GAO Jian, YE Meng-xi, WANG Xiao-juan,

Computer Science. 2019, 46 (5): 129-134. doi:10.11896/j.issn.1002-137X.2019.05.020

Abstract

PDF(2086KB) ( 1389 )

References | Related Articles | Metrics

Social media on the Internet has become the main interactive platform for online users with the rapid development of online social network.Malicious information often hides in the massive data of online social networks.In addition,the limitation of topologies and the disguise of malicious information bring a lot of difficulties for locating and tra-cing malicious information.On the one hand,it is difficult to achieve global monitoring by means of manual labeling.Evenby means of semantic analysis and information search,only the information “fragments” in the current network can be obtained after the hotspots are recognized.Coupled with the variation of information in the evolution process,a chain of propagation will be interrupted and split into multiple pieces.Without identifying and distinguishing,the number of information sources will be increased,and the algorithm complexity will be greatly increased.On the other hand,malicious information often uses camouflage techniques,such as providing false elements,manufacturing hotspots to attract users,and manipulating online “water army” to interfere with the public opinions,which makes the information topology and relation topology inconsistent.The original locating algorithm relies on the distribution of the current infected nodes and the current topology.The singularization of the infection state changes to randomization,making the statistical inference framework more complex.It is necessary to improve the state inference method of non-observed nodes.In the process of information dissemination of online social networks,the information propagation relationship is often attached to the information itself.Therefore,hidden information can be mined according to the state of the current network node.Based on the current state of network nodes,this paper proposed the concepts of relation topology and information topology and designed a candidate source expansion algorithm based on information topology.Based on this,this paper also pre-sented a malicious information locating algorithm based on Jordan center.Experiments on the generated network and real network show that the algorithm can effectively identify malicious information sources compared with other algorithms.

Feature Model Refactoring Method Based on Semantics

ZHANG Li-sheng, ZHANG Yue, LEI Da-jiang,

Computer Science. 2019, 46 (5): 135-142. doi:10.11896/j.issn.1002-137X.2019.05.021

Abstract

PDF(1816KB) ( 1059 )

References | Related Articles | Metrics

In the domain engineering of software product lines development,feature model is widely adopted to capture and organize the reusable requirements.Currently,the construction of feature model relies on the modeler’s analysis.With the increasing complexity of domain requirements,building a feature model that satisfies the requirements not only increases the workload of the modeler,but also reduces the accuracy of the feature model.A method for analyzing the semantics and defining semantic terms was proposed to solve the problem of inconsistent modeling vocabulary between different feature models in this paper.To refactor the feature model effectively,a semi-automated refactoring method was defined by using Description Logic.The consistency of the model can be also inferred by this method.The proposed method is verified based on two feature models,and the result shows that the method can refactor the feature model as well as verify the consistence the refactored feature model.

Optimization of Spark RDD Based on Non-serialization Native Storage

ZHAO Jun-xian, YU Jian

Computer Science. 2019, 46 (5): 143-149. doi:10.11896/j.issn.1002-137X.2019.05.022

Abstract

PDF(1501KB) ( 1385 )

References | Related Articles | Metrics

Spark framework is taken as the computing framework of big data by more and more enterprises.However,with the increasing of available memory resource of current severs,Spark can’t match with new environment well.Spark runs on Java Virtual Machine (JVM).Asheap space memory is used heavily,the ratio of time cost produced by Java virtual machine to provide space for new objects by reclaiming memory(GC) to total time cost of Spark jobs increases significantly,but the efficiency of Spark jobs doesn’t improve with a certain ratio when the available memory increases.After using OffHeap (native) memory storage mode,the cost of serialization/deserialization becomes the new conflict point instead of GC.This paper used the way of native storage to deal with GC problem,and speeded up the job by reducing the overhead of GC.This paper also proposed and modified the storage structure of Spark,and improved the elimination mechanism and the caching way of RDD.The data without serialization are moved into native memory,realizing low garbage collection overhead and avoiding the time spending on serialization.Experimental results demonstrate that the GC cost of modification method on server with single node and large memory is 5% to 30% compared with the storage on heap of Spark.Meanwhile,the overhead of serialization decreases,the throughput increases and the running time of job can be reduced by more than 8%.

DFTS:A Top-k Skyline Query for Large Datasets

WEI Liang, LIN Zi-yu, LAI Yong-xuan

Computer Science. 2019, 46 (5): 150-156. doi:10.11896/j.issn.1002-137X.2019.05.023

Abstract

PDF(1524KB) ( 1207 )

References | Related Articles | Metrics

Top-k Skyline query combines the features of Top-k query and Skyline,which can find the best object in the datasets.However,the available methods can not fit to large datasets well.An efficient Top-k Skyline query method called DFTS was proposed,which can perform well for large datasets.DFTS involves three steps.Firstly,the degreescore function is used to rank the dataset,and a large quantity of objects with low ranking will be filtered out.Secondly,DFTS makes a Skyline query upon the candidates and generates a Skyline subset.Finally,top-k objects with high ran-king will be selected from the Skyline subset as the final result.Through these steps,DFTS can significantly reduce the time cost.It is proved that the results of DFTS satisfy the demand of Top-k Skyline query.Extensive experimental results show that DFTS can achieve much better performance for large datasets than state-of-the-art methods.

Prediction Method of Cyclic Time Series Based on DTW Similarity

LI Wen-hai, CHENG Jia-yu, XIE Chen-yang

Computer Science. 2019, 46 (5): 157-162. doi:10.11896/j.issn.1002-137X.2019.05.024

Abstract

PDF(1819KB) ( 1981 )

References | Related Articles | Metrics

This paper presented a DTW distance-based sampling framework to effectively improve the accuracy of cyclic time series prediction in large-scale datasets.It addresses the problem of noisy identification for each given prediction condition,and formalizes the impact of noise with the SVR-based predicting method.On top of the DTW-based similarity measurement,this paper presented an end-to-end identification method to improve the quality of the training set.It also introduced a regularized function in the kernel function of SVR,such that the generalization error can be minimized based on the distances between each training instance and the prediction condition.The experiment conducts a series of widely adopted cyclic time series to evaluate the precision and stability of the proposed method.The results demonstrate that in terms of high-quality training instances and the weighted regularization strategy,the proposed method remarkably outperforms its competitors in most of the datasets.

Multi-output Intuitionistic Fuzzy Least Squares Support Vector Regression Algorithm

WANG Ding-cheng, LU Yi-yi, ZOU Yong-jie

Computer Science. 2019, 46 (5): 163-168. doi:10.11896/j.issn.1002-137X.2019.05.025

Abstract

PDF(2071KB) ( 1084 )

References | Related Articles | Metrics

Support vector machine regression is an important machine learning algorithm,and it has been applied to some areas successfully.However,single-output support vector regression (SVR) has long training time and lacks practicality in some complex systems.The multi-output intuitionistic fuzzy least squares support vector regression (IFLS-SVR) introduces intuitionistic fuzzy on the basis of multi-output SVR to solve the problem of uncertain multi-output complex system and reduce the training time.Most applications in life are complex.Based on traditional support vector regression,this paper proposed a multi-output intuitionistic fuzzy least squares support vector regression model (IFLS-SVR).The multi-output IFLS-SVR transforms the actual data into fuzzy data by intuitionistic fuzzy algorithm,and transforms the quadratic programming optimization problem into the process of solving a series of linear equations.Compared with the existing fuzzy support vector regression,the multi-output IFLS-SVR uses the intuitionistic fuzzy method to calculate the membership function,and exploits least square method to improve the training efficiency,thus reducing the training time,obtaining more accurate solution.Compared with other methods,the multi-output IFLS-SVR achieves good results by simulation model.Finally,the multi-output IFLS-SVR model also performs excellently when it is applied to predict the wind speed and wind direction.

Asynchronous Advantage Actor-Critic Algorithm with Visual Attention Mechanism

LI Jie, LING Xing-hong, FU Yu-chen, LIU Quan

Computer Science. 2019, 46 (5): 169-174. doi:10.11896/j.issn.1002-137X.2019.05.026

Abstract

PDF(1552KB) ( 1335 )

References | Related Articles | Metrics

Asynchronous deep reinforcement learning (ADRL) can greatly reduce the training time required for learning models by adopting the multiple threading techniques.However,as an exemplary algorithm of ADRL,asynchronous advantage actor-critic (A3C) algorithm fails to completely utilize some valuable regional information,leading to unsatisfactory performance for model training.Aiming at the above problem,this paper proposed an asynchronous advantage actor-critic model with visual attention mechanism (VAM-A3C).AM-A3C integrates visual attention mechanism with traditional asynchronous advantage actor-critic algorithms.By calculating the visual importance value of each area point in the whole image compared with the traditional Cofi algorithm,and obtaining the context vector of the attention mechanism via regression function and weighting function,Agent can focus on smaller but more valuable image areas to accelerate network model decoding and to learn the approximate optimal strategy more efficiently.Experimental results show the superior performance of VAM-A3C in some decision-making tasks based on visual perception compared with the traditional asynchronous deep reinforcement learning algorithm.

Short-term Bus Passenger Flow Prediction Based on Improved Convolutional Neural Network

CHEN Shen-jin, XUE Yang

Computer Science. 2019, 46 (5): 175-184. doi:10.11896/j.issn.1002-137X.2019.05.027

Abstract

PDF(1515KB) ( 2020 )

References | Related Articles | Metrics

Aiming at the random,time-varying and uncertain problems of urban public transport passenger flow,this paper proposed an unsupervised feature learning theory and an improved convolutional neural network based short-term bus station passenger flow prediction,which provides real-time,accurate and effective bus travel services for citizens.In order to prevent and reduce the occurrence of over-fitting,an efficient and reliable model prediction system based on DropSample training method of improved convolutional neural network is constructed.During the training process,the model can be used to describe the short-term passenger flow in different dates and different time periods.The optimizer of Adam algorithm is used to optimize the model,the network model parameters are updated,and different parameters for the adaptive learning rate are set.The results show that the root mean square error of the improved CNN network model is 229.539 and the average absolute percentage error is 0.117.Compared with CNN network model,multiple li-near regression model,Kalman filter model and BP neural network model,this model is more accurate and reliable.The prediction error of the proposed method is smaller,and an example proves that the improved model and algorithm are practical and reliable.

Study on Check-in Prediction Based on Deep Learning and Factorization Machine

SU Chang, PENG Shao-wen, XIE Xian-zhong, LIU Ning-ning

Computer Science. 2019, 46 (5): 185-190. doi:10.11896/j.issn.1002-137X.2019.05.028

Abstract

PDF(2406KB) ( 1364 )

References | Related Articles | Metrics

Location-Based Social Networks (LBSN) provides users with location-based services,allowing mobile users to share their location and location-related information in social networks.The research of check-in prediction has become an important and very challenging task in LBSN.Most of the current prediction techniques mainly focus on user-centered check-in studies,while few researches are based on POI-centered.This paper focused on the check-in prediction of POI-centered.Due to the extreme sparseness of data,it is difficult to use the traditional model to dig out users’ potential check-in pattern from data.To solve the problem of prediction based on POI-centered,this paper proposed a novel network model(TSWNN) combining factorization machine and deep learning.This model fuses temporal features,spatial features and weather features,takes advantage of the idea of factorization machine to deal with high dimensional sparse vectors and applies fully-connected hidden layer to the model to dig out users’ potential check-in pattern and predict users’ check-in behavior on specific point of interest.The experimental results on two classical LBSN datasets(Gowalla and Brightkite) show the superior performance of the proposed model.

MOEA/D Algorithm Based on New Neighborhood Updating Strategy

GENG Huan-tong, HAN Wei-min, ZHOU Shan-sheng, DING Yang-yang

Computer Science. 2019, 46 (5): 191-197. doi:10.11896/j.issn.1002-137X.2019.05.029

Abstract

PDF(2358KB) ( 1869 )

References | Related Articles | Metrics

To solve the problem of the lack of population diversity caused by the unrestricted replacement of neighbourhood updating strategy when the MOEA/D algorithm solves the complex optimization problem,a new MOEA/D algorithm based on new Neighbourhood Updating Strategy (MOEA/D-ENU) was proposed.In the process of evolution,the algorithm fully excavates the information of the solution,classifies the new solution generated according to the capacity of neighbourhood updating,and adopts different neighbourhood updating strategies adaptively to different types of new solutions to ensure the population convergence rate.At the same time,it takes into account the diversity of the population.The proposed algorithm was compared with five other algorithms on 9 benchmarks including ZDT,UF and CF.The values of IGD and HV show that MOEA/D-ENU has certain advantages than other algorithms in terms of convergence and distribution.

Neural Machine Translation Inclined to Close Neighbor Association

WANG Kun, DUAN Xiang-yu

Computer Science. 2019, 46 (5): 198-202. doi:10.11896/j.issn.1002-137X.2019.05.030

Abstract

PDF(1501KB) ( 1256 )

References | Related Articles | Metrics

The existing neural machine translation model only considers the relevance of the target end corresponding to the source end when modeling the sequences,and does not model the source end association and the target end association.In this paper,the source and target associations were modeled separately,and a reasonable loss function was designed.The source-hidden layer is more related to its neighboring K word-hidden layers.The target-side hidden layer is more related to its historical M word-hidden layers.The experimental results on the large-scale Chinese-English dataset show that compared with the neural machine translation which only considers the relevance of the target end to the source,the proposed method can construct a better neighbor correlation representation and improve the translation qua-lity of the machine translation system.

Imbalanced Data Classification Algorithm Based on Probability Sampling and Ensemble Learning

CAO Ya-xi, HUANG Hai-yan

Computer Science. 2019, 46 (5): 203-208. doi:10.11896/j.issn.1002-137X.2019.05.031

Abstract

PDF(1664KB) ( 1634 )

References | Related Articles | Metrics

Ensemble learning has attracted wide attention in imbalanced category circumstances such as information retrieval,image processing,and biology due to its generalization ability.To improve the performance of classification algorithm on imbalanced data,this paper proposed an ensemble learning algorithm,namely Oversampling Based on Probabi-lity Distribution-Embedding Feature Selection in Boosting (OBPD-EFSBoost).This algorithm mainly includes three steps.Firstly,the original data are oversampled based on probability distribution estimation to construct a balanced dataset.Secondly,when training base classifiers in each round,OBPD-EFSBoost increases the weight of misclassified samples,and considers the effect of noise feature on classification results,thus filtering the redundant noise feature.Finally,the eventual ensemble classifier is obtained through weighted voting on different base classifiers.Experimental results show that the algorithm not only improves the classification accuracy for minority class,but also eliminates the sensitivity of Boosting to noise features,and it has strong robustness.

Multi-cost Decision-theoretic Rough Set Based on Covering Approximate Space

LUO Gong-zhi, XU Xin-xin

Computer Science. 2019, 46 (5): 209-213. doi:10.11896/j.issn.1002-137X.2019.05.032

Abstract

PDF(1233KB) ( 970 )

References | Related Articles | Metrics

In order to make up for the deficiency of decision-theoretic rough model that there is no crossover between concepts and ignores the importance of multi-cost matrices,a rough set of multi-cost decision-theoretic rough set based on covering approximate space was proposed.Firstly,the problem of excessive granularity classification in the decision-theoretic rough set based on the equivalence relation was analyzed.Considering the quantitative relation and the importance among the cost matrices,covering and weighted multiple cost matrices were introduced to improve the covering multi-cost decision-theoretic rough set.Then,for four kinds of rough set of multi-cost decision-theoretic rough set based on covering approximate space,the rough approximations of knowledge were acquired and the relationship with each other was discussed.And relevant theorems and properties were proved. Finally,the feasibility and effectiveness of the method was verified by the case of medical diagnosis.

BiLSTM-based Implicit Discourse Relation Classification Combining Self-attention
Mechanism and Syntactic Information

FAN Zi-wei, ZHANG Min, LI Zheng-hua

Computer Science. 2019, 46 (5): 214-220. doi:10.11896/j.issn.1002-137X.2019.05.033

Abstract

PDF(1627KB) ( 1847 )

References | Related Articles | Metrics

Implicit discourse relation classification is a sub-task in shallow discourse parsing,and it’s also an important task in natural language processing(NLP).Implicit discourse relation is a logic semantic relation inferred from the argument pairsin discourse relations.The analytical results of the implicit discourse relationship can be applied to many na-tural language processing tasks,such as machine translation,automatic document summarization,and questionanswe-ring system.This paper proposed a method based on self-attention mechanism and syntactic information for the classification task of implicit discourse relations.In this method,Bidirectional Long Short-Term Memory Network (BiLSTM) is used to model the inputted argument pairs with syntactic information and express the argument pairs into low-dimension dense vectors.The argument pair information was screened by the self-attention mechanism.At last,this paper conducted experiments on PDTB2.0 dataset.The experimental results show that the proposed model achieves better effects than the baseline system.

Improved CycleGANs for Intravascular Ultrasound Image Enhancement

YAO Zhe-wei, YANG Feng, HUANG Jing, LIU Ya-qin

Computer Science. 2019, 46 (5): 221-227. doi:10.11896/j.issn.1002-137X.2019.05.034

Abstract

PDF(4308KB) ( 1647 )

References | Related Articles | Metrics

Low-frequency and high-frequency ultrasound probes used in intravascular ultrasound (IVUS) image acquisition have their own characteristics.Doctors have to choose ultrasound probes with different frequencies according to clinical needs during the diagnosis of Coronary atherosclerosis and other diseases.Therefore,a Cycle Generative Adversarial Networks (CycleGANs) based on the Wasserstein distance for intravascular ultrasound images enhancement was presented to combine high-frequency ultrasonic details and overcome the problems of edge blur and low resolution of low-frequency ultrasound image,assisting doctors in the diagnosis of cardiovascular disease.Firstly, according to the shape characteristics of coronary artery,several approaches used for data augmentationsuch as rotating,scaling up or down and implementing gamma transformation,are applied to increase the number of IVUS samples in training set,in order to reduce the risk of over-fitting during the training stage.Then,in the spirit of adversarial training,a joint loss function based on adversarial loss and cycle-consistent loss is constructed.Finally,the Wasserstein distance is added to the loss function as a regular term to stabilize the training and accelerated the convergence process.The input of this model is a low-frequency IVUS image and the output is an enhanced IVUS image containing high frequency detail information.An international standard IVUS image database was used for verification in the experiment.Clarity,contrast and edge energy were used as evaluation criteria to quantify.It is verified that the convergence speed of this model is twice of the original CycleGANs model.Three evaluation criteria are increased by 15.8%,11.4% and 46.6%,respectively.The experimental results show that the W-CycleGANs model can learn the feature information of the image domain effectively.Based on the original CycleGANs algorithm,it can further enrich the details of image edges and enhance the diagnostic information,also improve the sensitivity of doctors to diagnosis cardiovascular disease.In addition,100 pieces of clinical IVUS images are used for verification and well enhancement results are gotten.

Multi-semantic Interaction Based Iterative Scene Understanding Framework

YAO Tuo-zhong, ZUO Wen-hui, AN Peng, SONG Jia-tao

Computer Science. 2019, 46 (5): 228-234. doi:10.11896/j.issn.1002-137X.2019.05.035

Abstract

PDF(4696KB) ( 1307 )

References | Related Articles | Metrics

Traditional feed-forward based visual systems have been widely used for years and one fatal defect of this kind of system is that they can’t correct the mistakes by themselves during working,thus resulting in the performance degradation.This paper proposed a simple interactive framework,which solves the semantic uncertainty of the scene through the cooperation of multiple visual analysis processes,leading to scene understanding optimization.In this framework,three classic scene understanding algorithms are used as visual analysis modules and their outputs such as surface layout,boundary,depth,viewpoint and object class are shared for each other by contextual interaction,so as to improve their own performance iteratively.The proposed framework doesn’t need man-made constraints and can add new models in without large modifications of the original framework and algorithms,so it has good scalability.The experimental results on Geometric Context dataset demonstrate that this intrinsic information interaction based system has better flexibility and performs better than traditional feed-forward based systems.The mean accuracy of surface layout,boundary and viewpoint estimation is increased by more than 5% and the mean accuracy of object detection is increased by more than 6%.This attempt can be an efficient way of improving traditional visual systems.

Remote Sensing Image Classification Based on Heterogeneous Machine Learning Algorithm Fusion

TIAN Zhen-kun, FU Ying-ying, LIU Su-hong

Computer Science. 2019, 46 (5): 235-240. doi:10.11896/j.issn.1002-137X.2019.05.036

Abstract

PDF(3588KB) ( 1387 )

References | Related Articles | Metrics

In the application of multi-spectral remote sensing data,such as land cover change,environmental monitoring and thematic information extraction,the classification accuracy is not high enough due to the uncertainty of remote sen-sing information acquisition and processing.In order to further improve the classification accuracy,this paper proposed a fusion algorithm based on 6 heterogeneous machine learning classifiers.This algorithm provides classification results in abstract level,ranked level and measurement level by using prior knowledge set which is composed of precision and recall matrix,Accuracy and Difference (AD) index of the combination of classifiers,and the 3-dimensional probability matrix.Based on the Landsat 8 image data,the classification results in the study area of Beijing are forecasted by the proposed fusion algorithm and other different algorithms respectively.Experimental results shows that the 3-classifier combination composed of NB,KNN and SVM obtaines maximum AD value and the best classification effect.The abstract level output of the algorithm is 12.28% higher than the average accuracy of 6 single classifiers and even 2.24% higher than the best single classifier of SVM.Compared with the commonly used algorithms such as Random Forest (RF),Bagging and Boosting failed in the case of “strong member classifier”,the proposed fusion algorithm performs still well with accuracy 11.23%,7.56% and 11.36% higher than RF,Bagging and Boosting respectively.The proposed fusion algorithm can effectively improve the classification accuracy of remote sensing data by making full use of the diversity of classifiers and prior knowledge such as precision and recall matrix in the process of classification.

Multi-contrast Carotid MRI 3D Registration Method Based on Spatial Alignment and Contour Matching

WANG Xiao-yan, LIU Qi-qi, HUANG Xiao-jie, JIANG Wei-wei, XIA Ming

Computer Science. 2019, 46 (5): 241-246. doi:10.11896/j.issn.1002-137X.2019.05.037

Abstract

PDF(2129KB) ( 1108 )

References | Related Articles | Metrics

Multi-contrast high-resolution magnetic resonance imaging(MRI) technology can non-invasively display the wall structure and plaque composition,providing an effective method for diagnosis and analysis of carotid atherosclerotic plaque.The registration of vessels in multi-contrast images becomes a critical task for plaque identification.This paper proposed a three-dimensional registration algorithm based on spatial position alignment and lumen contour matching.With multi-contrast carotid MRI,a coarse-to-fine strategy was adopted.Firstly,the physical coordinates are found to perform the spatial alignment.Then,the ostu algorithm and active contour model are used to complete the semi-automatic continuous segmentation of the blood vessel lumens.Finally,the lumen contour point clouds are utilized to perform three-dimensional rigid registration based on an improved iterative closest point algorithm.The results indicate that the three-dimensional average lumen inclusion rate between TOF and T₁Gd sequence reaches 92.79%,and the average lumen inclusion rate between T₁WI and T₁Gd sequence reaches 94.66%.The proposed algorithm achieves three-dimensional accurate registration of multi-contrast MRI,which lays the foundation for the subsequent analysis of vulnerable atherosclerotic plaque.

Melanoma Classification Method by Integrating Deep Convolutional Residual Network

HU Hai-gen, KONG Xiang-yong, ZHOU Qian-wei, GUAN Qiu, CHEN Sheng-yong

Computer Science. 2019, 46 (5): 247-253. doi:10.11896/j.issn.1002-137X.2019.05.038

Abstract

PDF(2716KB) ( 1887 )

References | Related Articles | Metrics

To solve the classification problems of melanoma,such as low contrast,indistinguishable by the naked eyes,mass information interference,small dataset and data imbalance,this paper proposed an integrated classification method based on mask data augment and deep convolutional residual network.Firstly,according to the characteristics of skin lesion image and the previous researches,two data augmentation methods by masking the partial area of the trainingima-ges were proposed.Secondly,on the basis of these two data augmentation methods,some features were extracted by using deep convolutional residual 50-layer network.Thirdly,two different classification models were constructed and integrated based on these features.Finally,a series of experiments were conducted based on the datasets of Internal Skin Imaging Collaboration (ISIC) 2016 Challenge competition.The experimental results show that the integrated classification structure model can overcome the deficiencies of a single convolution residual network in melanoma classification tasks,and can achieve better classification results than other methods on skin lesion dataset with less training examples,and multiple evaluation indicators in the proposed method are better than the top-5 results in the ISIC2016 Challenge competition.

Multi-modal Medical Volumetric Image Fusion Based on 3-D Shearlet Transform
and Generalized Gaussian Model

XI Xin-xing, LUO Xiao-qing, ZHANG Zhan-cheng

Computer Science. 2019, 46 (5): 254-259. doi:10.11896/j.issn.1002-137X.2019.05.039

Abstract

PDF(5893KB) ( 1267 )

References | Related Articles | Metrics

In view of the limitation of most traditional multi-modal medical image fusion methods that cannot deal with the medical volumetric images,this paper presented a multi-modal medical volumetric image fusion method based on 3-D shearlet transform (3DST) and generalized gaussian model.Firstly,the preregistered medical volumetric images are decomposed into low frequency parts and high frequency parts by using the 3DST.Next,a novel fusion rule with the local energy is performed on the low frequency subbands.Moreover,an effective fusion rule based on Generalized Gaussian Model (GGD) and fuzzy logic is proposed for integrating the high frequency subbands.Finally,the fused image is obtained by the inverse 3DST.Through subjective and objective performance comparison,experiments on medical volumetric images show thatthe proposed method can obtain better fusion results.

Wind Turbine Visual Inspection Based on GoogLeNet Network in Transfer Learning Mode

XU Yi-ming, ZHANG Juan, LIU Cheng-cheng, GU Ju-ping, PAN Gao-chao

Computer Science. 2019, 46 (5): 260-265. doi:10.11896/j.issn.1002-137X.2019.05.040

Abstract

PDF(3396KB) ( 1176 )

References | Related Articles | Metrics

Aiming at the interference of shooting angle changes and insignificant features in the drone aerial photography environment,this paper proposed an improved GoogLeNet convolutional neural network to identify and locate the wind turbines,which can automatically extract wind turbine category features without manual pre-selection.The deep feature vectors of wind turbines are constructed through GoogLeNet network.In the network model training process,the concept of transfer learning is introduced and the pre-trained GoogLeNet network is trained by using wind turbine images.The classification network can be prevented from falling into the local optimal solution while speeding up the model training.The region proposal network and the multi-task loss function are used to integrate the candidate region search and border regression into the network in the Faster RCNN framework,so that the wind turbines in the aerial image can be automatically classified and annotated,and the time complexity can be reduced.Experimental results show that the optimized GoogLeNet network can improve the accuracy of target visual detection in the complex aerial photography environment and complete the task of wind turbine automatic positioning by means of transfer learning.The avera-ge accuracy of wind turbines based on GoogLeNet is over 96%.

Adaptive Dictionary Learning Algorithm Based on Image Gray Entropy

DU Xiu-li, ZUO Si-ming, QIU Shao-ming

Computer Science. 2019, 46 (5): 266-271. doi:10.11896/j.issn.1002-137X.2019.05.041

Abstract

PDF(1908KB) ( 1306 )

References | Related Articles | Metrics

Aiming at the problem that the traditional dictionary learning algorithm of image sparse representation only learns a single dictionary for image training,and can not optimally sparsely represent image blocks containing different image information,through introducing the local gray entropy of image into the dictionary learning algorithm,this paper proposed an adaptive dictionary learning algorithm based on image local gray entropy.The proposed algorithm makes use of the image database as training sample.Firstly,the image database is divided into blocks,and the gray entropy of each sub-block is calculated.Then,the sub-blocks are classified according to the size of the gray entropy,and different K-Singular Value Decomposition (K-SVD) parameters are set for different categories of sub-blocks to perform dictionary training respectively,thus obtaining a plurality of different dictionaries.Lastly,a well-trained dictionary is selected for the image sub-blocks to conduct sparse representation according to the size of the gray entropy.Simulation experiment results show that the proposed algorithm can sparsely represent the images better,and the effect of image reconstruction is also improved significantly.

Fast Stripe Extraction Method for Structured Light Images with Uneven Illumination

ZHENG Hong-bo, SHI Hao, DU Yi-cheng, ZHANG Mei-yu, QIN Xu-jia

Computer Science. 2019, 46 (5): 272-278. doi:10.11896/j.issn.1002-137X.2019.05.042

Abstract

PDF(3682KB) ( 1675 )

References | Related Articles | Metrics

The stripe extraction of structured light images can be easily impacted by uneven illumination.The accuracy of the extracted stripes is an important prerequisite for the accuracy of the subsequent 3D reconstruction.Therefore,how to eliminate the influence of uneven illumination and accurately extract the stripes of structured light images is the goal of the study.This paper proposed a processing algorithm combining Gaussian filtering and mean filtering,which is suitable for the structural light image stripe extraction of uneven illumination.The algorithm not only can effectively eliminate the influence of uneven illumination on the image,but also retains the feature information of the original image and achieves good experimental results. In order to speed up the filtering process, this paper used separable filters to improve the algorithm,reducing the computational complexity.In addition,GPU parallel computing-based CUDA technique is used to accelerate the algorithm,and the processing speed is improved greatly.

Multi-target Tracking of Cancer Cells under Phase Contrast Microscopic Images Based
on Convolutional Neural Network

HU Hai-gen, ZHOU Li-li, ZHOU Qian-wei, CHEN Sheng-yong, ZHANG Jun-kang

Computer Science. 2019, 46 (5): 279-285. doi:10.11896/j.issn.1002-137X.2019.05.043

Abstract

PDF(2280KB) ( 1575 )

References | Related Articles | Metrics

Detecting and tracking cancer cells under phase contrast microscopic images plays a critical role for analyzing the life cycle of cancer cells and developing new anti-cancer drugs.Traditional target tracking methods are mostly applied to rigid target tracking or single target tracking,while cancer cells are non-rigid multiple targets with constant fission,and it makes tracking more challenging.Taking bladder cancer cells in the sequence of phase contrast micrographs images as research object,this paper proposed a multi-target tracking method of cancer cells based on convolutional neural network.Firstly,through making use of detection-based multi-target method,the proposed algorithm adopted the deep learning detection framework Faster R-CNN to detect the bladder cancer cells and preliminarily obtain the cancer cells to be tracked.Then CSA (circle scanning algorithm) was utilized to further optimize the detection of adhesion cancer cells,and further improve the detection accuracy of cells in adhesion area.Finally,it integrated the features of convolution,size and position into a synthetic feature descriptor by using weighting methods,thus tracking multiple cancer cells by achieving the efficient correlation and matching of different frames of cancer cells.The results of a series of experiments show that this method can not only improve the accuracy of detecting and tracking cancer cell,but also deal with the occlusion problem effectively.

Speech Recognition Combining CFCC and Teager Energy Operators Cepstral Coefficients

SHI Yan-yan, BAI Jing

Computer Science. 2019, 46 (5): 286-289. doi:10.11896/j.issn.1002-137X.2019.05.044

Abstract

PDF(1308KB) ( 1531 )

References | Related Articles | Metrics

In view of the imperfection of the existing features which represent the speech characteristics,this paper proposed a mutual integration method based on Cochlear Filter Cepstral Coefficients and Teager Energy Operators Cepstral Coefficients.First,the fusion feature of CFCC that reflects human auditory characteristics and TEOCC that embodies nonlinear energy characteristics is applied to speech recognition system.Then principal component analysis is applied to the selection and optimization of fusion features.Finally,support vector machine is used for speech recognition.The results show that the proposed fusion features can achieve better speech recognition performance than single feature,and after combining PCA,the accuracy rate of speech recognition is increased by 3.7% on average.

Improved Learning Model for Cloud Computing Swarm Optimization Time Efficiency

JIAN Cheng-feng, KUANG Xiang, ZHANG Mei-yu

Computer Science. 2019, 46 (5): 290-297. doi:10.11896/j.issn.1002-137X.2019.05.045

Abstract

PDF(3014KB) ( 938 )

References | Related Articles | Metrics

Aiming at the time-consuming problem when the traditional task scheduling models of cloud computing deal with the tasks,this paper proposed an ONBA algorithm combining DE (Differential Evolution) to get the scheduling data of task.Then,the obtained scheduling data are used to train the improved IDBN model.By adjusting the learning rate and training times,the time efficiency can be improved,thus achieving fast and accurate prediction of cloud computing scheduling results.The experimental results show that the improved IDBN model trained by this method can effectively shorten the actual scheduling time on the premise of ensuring precise prediction results and make up for the defect of long running time in traditional swarm optimization models.

Divide-and-Conquer Algorithm for Sparse Polynomial Interpolation

DENG Guo-qiang, TANG Min, LIANG Zhuang-chang

Computer Science. 2019, 46 (5): 298-303. doi:10.11896/j.issn.1002-137X.2019.05.046

Abstract

PDF(1511KB) ( 1444 )

References | Related Articles | Metrics

Sparse interpolation is widely used in quite different applications and areas of science and engineering.Its goal is to recover the goal polynomial by taking advantage of the sparse structure of the polynomial and given discrete va-lues.For polynomials with the large size,the current methods show high time complexity,because the size and the number of algebra operations are related to the number of terms and the total degree of the goal polynomials.For this reason,this paper presented a divide-and-conquer algorithm for sparse polynomial interpolation over finite fields.The basic strategy is to choose one of variables as the main variable and the coefficients are multivariate polynomials in other variables.In this way,the original polynomial interpolation is divided into a list of univariate polynomial interpolations and a list of sub-polynomials with smaller size.The solution of the original problem is to merge these sub-polynomials.To implement the divide-and-conquer strategy for the sparse polynomial interpolation,this paper designed four sub-algorithms:univariate polynomial interpolation based on early termination strategy,univariate polynomial interpolation with a prior knowledge of the total degrees of the polynomial,the determination of the number of terms of the polynomial via Hankle matrix determinant,and Ben-Or/Tiwari’s algorithm with an upper bound of the number of terms.In numerical experiments,the performance of the new algorithm is compared with that of Zippel’s algorithm,Ben-Or/Tiwari’s algorithm and Javadi/Monagan’s algorithm.Extensive experiments show that the new algorithm is much faster than other three algorithms.The experimental data demonstrate that the use of divide-and-conquer and early termination strategy not only eliminates some priori knowledge of the total degree and the number of terms of the goal polynomials,but also decomposes a large number of higher order algebra operations into smaller ones.Therefore,the bottleneck of the large-scale multivariate polynomial interpolation problems is effectively solved.

Implementation Technology and Application of Web Crawler for Multi-data Sources

ZENG Jian-rong, ZHANG Yang-sen, ZHENG Jia, HUANG Gai-juan, CHEN Ruo-yu

Computer Science. 2019, 46 (5): 304-309. doi:10.11896/j.issn.1002-137X.2019.05.047

Abstract

PDF(1533KB) ( 2171 )

References | Related Articles | Metrics

The research of social computing method based on big data technology is the hot spot in the academic circle,and how to obtain the corresponding data resources from the network is the key to the research.At present,network crawlertechno-logy is the main method to collect network data.In light of the problem that the existing crawler technology is not easy to collect multi-source data,this paper proposed a network-crawler data-acquisition technology facing multi-data sources.On the basis of six data collection crawlers on media platforms including Sina micro-blog,People’s Daily,Baidu Baike,Baidu Tieba,wechat public account and Easter Wealth Stock Bar,the Web crawlers for multiple data sources are fused to solve the problem of data collection for different media platforms by backstage scheduling technology Servlet.During theimplementation process,firstly,the Web application test kit selenium is used to simulate the artificial actions like logining,then the element query technology Xpath is used to analyze the source code of the Web page and extract the data information and put them into the database,finally the data crawled from multi sourcesare read out from database and displayed on front webpages.Experiments show that the crawler achieves the maximization of acquisition efficiency under the premise of ensuring data integrity.

Long-term Operational Situation Assessment System for Terminal Buildings

HENG Hong-jun, WANG Rui

Computer Science. 2019, 46 (5): 310-314. doi:10.11896/j.issn.1002-137X.2019.05.048

Abstract

PDF(1291KB) ( 1008 )

References | Related Articles | Metrics

With the rapid development of the airport construction and the increase of ridership,how to efficiently make use of the equipment and facilities in the terminal buildings and provide high-quality services for passengers has become urgent problems for airport managers.In order to make airport managers have an intuitive understanding of the daily operating conditions of the terminal building,this paper constructed a long-term operational situation assessment system for terminal buildings based on the fuzzy comprehensive evaluation method.The system takes the daily operation status of the terminal building as the research object,uses the fuzzy comprehensive evaluation method to comprehensively assess the operation status of terminal building,and determines its operating situation level and its corresponding characteristics.The example verifies that the rating feature description of evaluation system is consistent with the actualopera-tion of airport terminal building,and further proves the applicability and effectiveness of the long-term operating situation evaluation system for terminal building.

Comprehensive Evaluation of Network Service Quality Based on Cloud Model
and Improved Grey Relational Analysis Model

SUN Ming-wei, QI Yu-dong

Computer Science. 2019, 46 (5): 315-319. doi:10.11896/j.issn.1002-137X.2019.05.049

Abstract

PDF(1372KB) ( 1240 )

References | Related Articles | Metrics

With the rapid development of multimedia technology and high-speed network technology,various new network applications with higher quality requirements are constantly emerging.In view of the problem that current traditional computer network can only provide the “best” of services,and can not guaranteehigh quality of service and fast and effective evaluation of sudden new business,through measuring theperformance parameters of network service quality,this paper proposed a comprehensive evaluation model of network service quality based on cloud-improved grey relational analysis model,so as to realize real-time and rapid classification of measurement data.The experiment shows that this method can get accurate evaluation results and has good guiding significance for exploring the comprehensive evaluation method.

Study on Information Propagation Dynamics Model and Opinion Evolution Based on Public Emergencies

LIU Xiao-yang, HE Dao-bing

Computer Science. 2019, 46 (5): 320-326. doi:10.11896/j.issn.1002-137X.2019.05.050

Abstract

PDF(2227KB) ( 1966 )

References | Related Articles | Metrics

Aiming at the problem that the traditional evolutionary model of information dissemination for public emergencies does not introduce dynamic parameters,this paper proposed a dynamic diffusion system for public event information public opinion evolution and mathematical model based on propagation dynamics.Firstly,the information dissemination of public emergencies is analyzed and disigned.Secondly,the dynamic diffusion network is designed and combined with the dynamics to construct the mathematical model of public emergency information propagation.Finally,the model is simulated and analyzed,and compared with real social statistics.The results show that the similarity between experimental data and real data is 0.8386,and the correlation coefficient is 0.8279.The proposed model reveals the inherent laws of micro-individual information exchange and public opinion transmission,and is consistent with the process of real event propagation,which prove that the proposed model is reasonable and effective.

Application of Grey Prediction Model in Prediction of Stability of Wedge-shaped Body of Tunnel

WU Fa-you, WANG Lin-feng, WENG Qi-neng

Computer Science. 2019, 46 (5): 327-330. doi:10.11896/j.issn.1002-137X.2019.05.051

Abstract

PDF(1290KB) ( 1251 )

References | Related Articles | Metrics

With the construction of a large number of transport infrastructure in China,tunnel engineering is inevitable,especially in the western region.In the tunnel,the stability of the roof wedge is one of the harms existing in the construction.It is of great significance to monitor and predict the wedge to ensure the safety of the tunnel construction and the later period.Through prediction,timely and effective measures are taken to eliminate risks and avoid economic losses and casualties.The influence factors of tunnel wedge instability are complex and difficult to quantify and are uncertain,which accords with the characteristics of grey system.This paper applied the grey system theory to the prediction of tunnel wedge deformation,and established the G(1,1) single point prediction model of tunnel wedge deformation based on the original monitoring data.The accuracy of the model was verified by an engineering example.In the prediction model,the posterior error ratio C is 0.1195 and the frequency of small error P is 1.The results show that the precision of the prediction model reaches a higher level,and the prediction results can guide the actual construction very well.