Computer Science

CodeSearcher:Code Query Using Functional Descriptions in Natural Languages

LU Long-long, CHEN Tong, PAN Min-xue, ZHANG Tian

Computer Science. 2020, 47 (9): 1-9. doi:10.11896/jsjkx.191200170

Abstract

PDF(1915KB) ( 2362 )

References | Related Articles | Metrics

When a developer is required to implement a function,but not knowing how to implement this function using a specific programming language,he/she usually needs to perform code query using natural language.It is time-consuming and labor-intensive to perform code query while programming.There have been bunch of code query tools proposed over the past years to assist developers,while most of the approaches require complex inputs or have low precision.We propose a new code query approach called CodeSearcher based on natural language description.Relying on the 〈natural language description,code snippet〉 data pairs extracted from Stack OverFlow,which is a software development related Q&A website,we design a neural network model and the corresponding training method to map “natural language description” and “code snippets” to the same vector space.CodeSearcher is different from the conventional code query systems.On the one hand,it accepts all kinds of user-provided code bases for searching,because the system only relies on the source codes without depending on the comments or description of the source codes;on the other hand,it no longer limits the form of code query process to “entering the natural language description and feeding back the code snippets”,but extends a code Q&A section,helping the users pick the appropriate code snippet by the characteristic key words,so that developers do not have to read all returned code snippets in detail.The experimental results show that CodeSearcher has high precision compared with the baseline.

Cross-project Clone Consistency Prediction via Transfer Learning and Oversampling Technology

OUYANG Peng, LU Lu, ZHANG Fan-long, QIU Shao-jian

Computer Science. 2020, 47 (9): 10-16. doi:10.11896/jsjkx.200400041

Abstract

PDF(1534KB) ( 1327 )

References | Related Articles | Metrics

In recent years,as software requirements increase,developers have introduced a large amount of clone code into the project by reusing existing code.As the software version is updated,the clone code changes and it may become a burden on software maintenance.Researchers have attempted to use the machine learning to conduct research on the prediction of clone code consistency,and help the software quality assurance team to allocate maintenance resources more effectively by predicting whether changes to cloned code will cause additional maintenance costs,thereby improving work efficiency and reducing maintenance costs.However,in the early stage of software development,software projects are often not fully evolved,and historical data is lacking for constructing an effective predictive model.Therefore,cross-project clone code consistency prediction methods are proposed.In this paper,we propose a cross-project clone code consistency prediction method via transfer learning and oversampling technology (CPCCP+).This method aims to match test set and training set into kernel space,reduce the distribution discrepancy of cross-project data by transfer component analysis,and alleviate the class imbalance issue to improve the performance of cross-project prediction model.In terms of experimental datasets,this paper selects seven open source datasets,which can form 42 combinations of cross-project clone code consistency prediction tasks totally.In terms of model performance comparison,the CPCCP+ proposed in this paper is compared with the method only using base classifier.The evaluation metrics include precision,recall and F-measure.The experimental results show that CPCCP+ can more effectively perform cross-project clone code consistency prediction.

Analysis of Target Code Generation Mechanism of CompCert Compiler

YANG Ping, WANG Sheng-yuan

Computer Science. 2020, 47 (9): 17-23. doi:10.11896/jsjkx.200400018

Abstract

PDF(1472KB) ( 1763 )

References | Related Articles | Metrics

CompCert is a well-known C-language trustworthy compiler,which is one of the outstanding representatives among the formally verified compilers.In recent years,CompCert has been widely used in many research and development work in academia and industry.The current version of the CompCert compiler supports a variety of target architectures.The target code generation mechanism of CompCert compiler is analyzed,by mainly introducing the design logic,the translation,the semantic preseving and the code structure.Finally,as a summary,the key points for retargeting the CompCert compiler are given.The paper is helpful to retarget the Compcert compiler,for example,we can construct a back-end for some important domestic processor.

Performance Analysis of Randoop Automated Unit Test Generation Tool for Java

LIU Fang, HONG Mei, WANG Xiao, GUO Dan, YANG Zheng-hui, HUANG Xiao-dan

Computer Science. 2020, 47 (9): 24-30. doi:10.11896/jsjkx.200200116

Abstract

PDF(2432KB) ( 2205 )

References | Related Articles | Metrics

Automated unit testing is a hotspot in modern software development research.Randoop,an automated unit test cases generation tool,is designed for Java and.NET code,and generates test cases based on feedback guidance.It is widely used in the industry.In order to effectively use Randoop for automated testing,this paper uses empirical software engineering methods toana-lyze the performance characteristics of Randoop through experiments.Four representative Java open source projects are selected to analyze the code coverage of Randoop-generated test cases and the ability to detect mutants,and the relationship of the effectiveness of randoop with the time cost and the source code structure.The experiment find that Randoop can generate valid test cases in a short time.With the increase of generation time,the performance of Randoop generation test cases increases,and tends to be stable when the tests generation time is 120s,with an average mutants coverage of 55.59% and an average mutants kill rate of 28.15%.The performance of the test cases generated by Randoop is related to the structure and complexity of the source code of the tested classes.This paper provides a valuable reference for software testers to effectively use the Randoop tool.

Memory Leak Test Acceleration Based on Script Prediction and Reconstruction

LI Yin, LI Bi-xin

Computer Science. 2020, 47 (9): 31-39. doi:10.11896/jsjkx.200100075

Abstract

PDF(2425KB) ( 1006 )

References | Related Articles | Metrics

Memory leak is a common defect in continuous working software,such as cloud applications,web service,middleware,etc.It can affect the stability of software applications,lead to run in bad performance and even crash.To clearly observe memory leaks,the test cases toward them need to execute longer time in order to generate significant memory pressure.The cost of memory leaks testing is expensive.If the execution orders of test cases are not optimized,we may waste lots of time on the test cases that are not likely to reveal faults before finding test cases that really containing memory leaks.This seriously reduces the efficiency of fault discovery.In order to make up for the shortcomings of the existing technology and solve the problems of the me-mory leak of Java Web program while running for a long time,which is not easy to find,diagnose and repair,this paper studies the memory leak detection technology,proposes the memory leak test script prediction method based on machine learning.The methodtrains and predicts the script with memory leak by building the memory feature model.Then,based on the training model,it predicts the risk value of script memory leak,and gives the corresponding parameter scores,to guide the subsequent script reorganization,can predict and obtain the function test script that is more likely to cause memory leak.At the same time,a script reorganization optimization method is proposed to improve its defect revealing ability.Priority testing of predicted and recombined scripts can accelerate the detection of leakage defects.Finally,a case study shows that the proposed framework has strong leak detection ability.The speed of defect detection of the optimized test script can be more than twice as fast as that of the common script,thus accelerating the exposure time of memory expansion problem,achieving the purpose of improving test efficiency and ensuring software quality.

Test Case Generation Approach for Data Flow Based on Dominance Relations

JI Shun-hui, ZHANG Peng-cheng

Computer Science. 2020, 47 (9): 40-46. doi:10.11896/jsjkx.200700021

Abstract

PDF(1706KB) ( 1065 )

References | Related Articles | Metrics

The design of control flow in programs serves for realizing correct data flow. Performing the data flow testing is important. With formulating the problem of all-uses data flow criterion oriented test case generation as a many-objectives optimization problem,a genetic algorithm based test case generation approach is proposed. By constructing the control flow graph for to-be-tested program,data flow analysis is performed to compute all the definition-use pairs which are the testing requirements. Then many-objectives oriented genetic algorithm is performed to search the optimal solution for satisfying all-uses criterion. An improved fitness function is defined based on the dominance relations. The existence of killing definition,as well as the sequence of definition node and use node in the execution path,are taken into consideration to analyze the coverage of test case with respect to the definition-use pair.Experimental results show that the proposed approach can effectively generate test cases for satisfying all-uses criterion. And compared with other approaches,it can improve the coverage percentage and reduce the number of generations.

Deduplication Algorithm of Abstract Syntax Tree in GCC Based on Trie Tree of Keywords

HAN Lei, HU Jian-peng

Computer Science. 2020, 47 (9): 47-51. doi:10.11896/jsjkx.190600042

Abstract

PDF(1549KB) ( 1576 )

References | Related Articles | Metrics

The abstract syntax tree text generated by GCC compiler compiling C language source program contains a lot of redundant information independent of source code.If directly parsed,it will seriously affect the analysis efficiency,reduce the analysis accuracy,and occupy a large amount of storage space.Aiming at this problem,a GCC abstract syntax tree elimination redundancy algorithm based on the keyword Trie tree is proposed.The Trie tree is built according to the keywords containing the abstract syntax tree text useful information nodes,which can filter the useless node information of the syntax tree text,thus achieving optimized compilation results.Compared with the traditional KMP redundancy elimination algorithm,the keyword Trie tree algorithm can effectively avoid the loss of useful information nodes such as constants and variables in the process of redundancy removal and ensure the integrity of data.At the same time,the keyword Trie tree algorithm can minimize the comparison of repeated prefixes or suffix strings,saving time and space overhead.This paper selects different lengths of C language source files for de-redundancy experiments,tests the performance of the algorithm,and compares it with the traditional KMP algorithm.The experimental results show that the algorithm can greatly improve the redundancy efficiency and precision.

Survey of Network Representation Learning

DING Yu, WEI Hao, PAN Zhi-song, LIU Xin

Computer Science. 2020, 47 (9): 52-59. doi:10.11896/jsjkx.190300004

Abstract

PDF(2403KB) ( 3202 )

References | Related Articles | Metrics

A network is a collection of nodes and edges,usually is represented as a graph.Many complex systems take the form of networks,such as social networks,biological networks,and information networks.In order to make network data processing simple and effective,the representation learning for nodes in the network has become a research hotspot in recent years.Network representation learning is designed to learn a low-dimensional dense representation vector for each node in the network that can advance various learning tasks in the network analysis area such as node classification,network clustering,and link prediction.However,most of previous works have been designed only for plain networks and ignore the node attributes.When the network is high sparsity,attributes can be the very useful complementary content to help learn better representations.Therefore,the network embedding should not only preserve the structural information,but also preserve the attribute information.In addition,in practical applications,many networks are dynamic and evolve over time with the addition,changing and deletion of nodes.Meanwhile,similar as network structure,node attributes also change naturally over time.With the development of machine learning,studies on the network representation learning emerge one after another.In this paper,we will systematically introduce and summarize the network representation learning methods in recent years.

Natural Language Interface for Databases with Content-based Table Column Embeddings

TIAN Ye, SHOU Li-dan, CHEN Ke, LUO Xin-yuan, CHEN Gang

Computer Science. 2020, 47 (9): 60-66. doi:10.11896/jsjkx.190800138

Abstract

PDF(1397KB) ( 1220 )

References | Related Articles | Metrics

Converting natural language into query statements that can be executed in database is the core problem of intelligent interaction and human-computer dialogue system,and is also the urgent need of personalized operation and maintenance system for urban rail trains.At the same time,it is the difficulty of docking the bottom application platform with the support platform for large data application of the new power supply train.The existing neural network-based methods don’t utilizing semantic-rich table content or utilize it partially,which limits the improvement of the execution accuracy.This paper studies how to improve the query accuracy of natural language query interfaces when table content is included in the inputs.Aiming at this problem,this paper proposes a table column embedding method based on table content which embeds the table columns by utilizing the content stored in each table column.Based on the method,this paper proposes a new structure of embedding layer.This paper also proposes a method of data augmentation by utilize table content.It generates new training samples by replacing attribute values in queries with other records in the same column of the table.This paper finally conducts experiments on WikiSQL dataset for the proposed methods of column embedding and data augmentation.The experimental results show that,on the basis of the state-of-the-art methods,the two methods can improve the query accuracy by 0.6%~0.8% when they are used separately and nearly 1% when they are used together.Therefore,it proves that the methods of column embedding and data augmentation proposed in this paper can achieve good improvements on execution accuracy.

Efficient Top-k Query Processing on Uncertain Temporal Data

WEI Jian-hua, XU Jian-qiu

Computer Science. 2020, 47 (9): 67-73. doi:10.11896/jsjkx.190800143

Abstract

PDF(2314KB) ( 866 )

References | Related Articles | Metrics

Temporal data is widely used in many applications such as medical,economic and e-commerce.The uncertainty is mainly caused by factors such as inaccurate measurement techniques.This paper studies top-k queries over uncertain temporal data.Such a query returns Top-k intervals with the largest scores which are calculated by a function combining the original weight of the data and the probability of intersection with the query data.To answer the query efficiently,this paper proposes a 2D R-tree based on the relational model and auxiliary structures.The relational model is used to manage all intervals,and the auxiliary structure is used to manage the order of the weights of each node in the R-tree.Based on the proposed index structure,a query algorithm for accessing data in descending order by weights is proposed.It traverses the R-tree from the root node.For each node that intersects with the query point,the item with the largest weight in it can be found according to the information stored in the auxiliary structure,and it is determined as the next accessed object.This paper uses synthetic datasets with data sizes ranging from 300000 to 10 million,and a real dataset of a flight information with size of 3.2 million.In the extensible database system SECONDO,the proposed method is compared with the unindexed method,R-tree,and interval tree,and the average I/O access times and CPU time are used as the indicators of the experimental results.The experimental results show that the proposed approach outperforms baseline methods by 2 to 3 orders of magnitude using 10 million intervals.Comparing the probabilities and weights of the k results with the results of all intersecting data,it is found that the probabilities and weights of the k results are close to the maximum value of the actual intersecting data,so the proposed algorithm is feasible and effective.

Method for Simulating and Verifying NVM-based In-memory File Systems

WANG Xin-xin, ZHUGE Qing-feng, WU Lin

Computer Science. 2020, 47 (9): 74-80. doi:10.11896/jsjkx.190700037

Abstract

PDF(2628KB) ( 1272 )

References | Related Articles | Metrics

Most existing NVM-based file systems conduct experiments by simulating NVM with DRAM.However,they ignore the differences between NVM and DRAM,and make it impossible to accurately reflect the performance and wear distribution of file systems on NVM devices.The accuracy and interfaces of existing NVM emulators are not sufficient to support the simulation requirements of NVM-based file systems.This paper proposes a method for simulating NVM write latency and verifying wear distribution of NVM-based file systems.The accuracy of latency simulation is improved by injecting software-created delay according to the I/O characteristics of file system.To depict the wear distribution of NVM physical devices caused by the NVM-based file system,every update operation to NVM page is tracked.Experimental results show that the proposed method can reduce the error rate of write latency simulation by 65% on average,while accurately reflect the wear distribution of NVM.

Topic-Location-Category Aware Point-of-interest Recommendation

MA Li-bo, QIN Xiao-lin

Computer Science. 2020, 47 (9): 81-87. doi:10.11896/jsjkx.191100120

Abstract

PDF(1993KB) ( 1050 )

References | Related Articles | Metrics

With the continuous development of Location-Based Social Networks(LBSN),Point-of-Interest(POI) recommendations that help users explore new locations and merchants discover potential customers has received widespread attention.How-ever,due to the high sparsity of the users’ check-in data,POI recommendation faces serious challenges.To cope with this challenge,this paper explores the textual information,geographic information,and category information,incorporating interest topics,geographical influence,and category preference factors effectively,and proposes a topic-location-category aware collaborative filtering algorithm called TGC-CF for POI recommendation.The proposed algorithm uses the Latent Dirichlet Allocation(LDA) model to learn the interest topics distribution of users and calculate the similarity of interest topics distribution among users by mining textual information associated with POIs,models geographical influence by combining geographic distance and user’s regionalpre-ference,uses the TF-IDF statistical method to assess the target user’s preference for the category and consider the impact of other users’ category preference in the recommendation process,and finally integrate these influencing factors into a collaborative filtering recommendation model to generate a list of recommendations containing POIs that users are interested in.Experimental results on two real data sets show that TGC-CF algorithm performs better than other recommendation algorithms.

CSR-based PageRank Algorithm on Historical Graphs

PAN Pei-xian, ZOU Zhao-nian, LI Fa-ming

Computer Science. 2020, 47 (9): 88-93. doi:10.11896/jsjkx.190800122

Abstract

PDF(1810KB) ( 1177 )

References | Related Articles | Metrics

In recent years,the research on static graphs has become more and more comprehensive and in-depth,and a perfect theoretical system has been formed.However,for some application issues in life,such as the changing relationships in social networks,it seems a bit weak to use static graphs to show that such moments are changing.Historical graph can be used to represent dynamic changes.The PageRank algorithm is an algorithm for measuring the importance of a web page,and there are constantly new or deleted websites in the network.Considering all these factors,such a network is quite appropriate to be represented by historical graphs.Therefore,this paper considers using the CSR (Compressed Sparse Row) structure to implement PageRank on the history graph,so that the program can give the scores of each website at several target times.In turn,it can provide changes in website ratings and give forecasts of website influence trends.Comparing its performance with the PageRank algorithm implemented on the linked list based on the hyperlinks network dataset provided by Wekipedia,the results show that its performance is much better than the use of linked list structure,and its advantages will become more and more obvious as the data size and target time scale increase.

Database Anomaly Access Detection Based on Principal Component Analysis and Random Tree

FENG An-ran, WANG Xu-ren, WANG Qiu-yun, XIONG Meng-bo

Computer Science. 2020, 47 (9): 94-98. doi:10.11896/jsjkx.190800056

Abstract

PDF(1338KB) ( 1168 )

References | Related Articles | Metrics

As a platform for data storage and interaction,database contains confidential and important information,making it a target of malicious personnel attacks.To prevent attacks from outsiders,database administrators can limit unauthorized user access through role-based access control system,while masquerade attacks from insiders are often less noticeable.Therefore,the research on database anomaly detection based on user behavior have important practical application value.A user anomaly detection algorithm PCA-RT based on Principal Component Analysis (PCA) and Random Tree (RT) is proposed for the anomaly detection of database user access behavior.Firstly,users’ profile is constructed according to the characteristics of the query submitted by the users,then the principal component analysis is applied to reducing the dimension of the users’ profile and feature selection.Finally,random tree has trained anomaly detector.The experiments,based on dataset constructed according to TPC-E,which is a new generation of database performance evaluation standard issued by TPC (Transaction Processing Performance Council),show that the user profile and PCA-RT are fast and effective for anomaly detecting of database user access behavior.PCA algorithm reduces data during data preprocessing up to more than 35%.The accuracy and recall of PCA-RT algorithm are improved by 1.78% and 9.76% respectively.It is proved that the construction method of user profile vector and the PCA-RT algorithm are effective for anomaly detection of user access behavior in TPC-E database.

High-order Multi-view Outlier Detection

ZHONG Ying-yu, CHEN Song-can

Computer Science. 2020, 47 (9): 99-104. doi:10.11896/jsjkx.200600170

Abstract

PDF(1580KB) ( 1105 )

References | Related Articles | Metrics

Due to the complex distribution of data between different views,the traditional single-view outlier detection method is no longer applicable to the detection of multi-view outliers,making multi-view outlier detection a challenging research topic.Multi-view outliers can be divided into three types:attribute outliers,class outliers,and class-attribute outliers.Existing methods use pairwise constraints across views to learn new feature representations and define outlier scoring metrics based on these features,which do not take full advantage of the interactive information between views and results in higher computational complexity when facing three or more views.Therefore,this paper considers to reshape multi-view data into tensor set form,defines high-order multi-view outliers,and proves that all of the existing three types of multi-view outliers meet the definition of high-order multi-view outliers,so as to propose a new multi-view outliers detection algorithm called high-order multi-view outliers detection algorithm (HOMVOD).Specifically,the algorithm firstly reshapes multi-view data into tensor set form,then learns its low-rank representation,and finally designs outlier function under tensor representation to realize detection.Experiments on UCI datasets show that this method is superior to existing methods in detecting multi-view outliers.

Short Term Load Forecasting via Zoneout-based Multi-time Scale Recurrent Neural Network

ZHUANG Shi-jie, YU Zhi-yong, GUO Wen-zhong, HUANG Fang-wan

Computer Science. 2020, 47 (9): 105-109. doi:10.11896/jsjkx.190800030

Abstract

PDF(1618KB) ( 995 )

References | Related Articles | Metrics

Because accurate power load forecasting,smart grids can provide more efficient,reliable and environmentally friendly power services than traditional grids.In real life,power load data often has a high temporal correlation with historical data,while traditional neural networks pay little attention to it.In recent years,the recurrent neural network (RNN) has received more and more attention in power load forecasting,because it can well capture the correlation between data with large cross-time scale.However,due to the unique self-connections structure of RNN,when the back-propagation through time(BPTT) is adopted for network training,the problems such as vanishing gradient are prone to occur with the number of network layers increases,resulting in a decrease in prediction accuracy.There are varieties of RNN architectures that can solve the vanishing gradient problem,such as long short-term memory (LSTM) and gated recurrent unit (GRU),but their complex internal structure will increase the training time.In order to solve the above problems,this paper first analyzes and studies RNN and itsvariants,and then combines the Zoneout function to design a multi-time scale modularized RNN architecture,focuses on the update strategy of hidden layer modules.It not only effectively solves the vanishing gradient problem,but also greatly reduces the number of network parameters that need to be trained.Experimental results based on the benchmark dataset and the real-worldload dataset show that this architecture can achieve better performance than the current popular RNN architecture.

Big Data Valuation Algorithm

ZHAO Hui-qun, WU Kai-feng

Computer Science. 2020, 47 (9): 110-116. doi:10.11896/jsjkx.191000156

Abstract

PDF(1780KB) ( 980 )

References | Related Articles | Metrics

With the rapid development of information technology,the generation of data has shown an exponential growth trend.Big data has become one of the most frequently used words due to the rapid emergence of big data and its great value.It is not only an academic vocabulary,but has gradually become a commodity name.Whether from academic research or data trading needs,how to evaluate the availability of big data sets is a new issue.A big data usability evaluation model is proposed to provide refe-rence for academic and circulation fields in this paper.Combined with the 4V(Volume,Variety,Velocity,Value) characteristics of big data,the 4V characteristic distribution of the statistical data is segmented,which gives the probability model of big data based on the piecewise distribution and the availability of large data sets and weighted evaluation model.An algorithm for realizing big data block sampling and an estimation algorithm for weighting coefficients of each characteristic in the big data set evaluation model are proposed.Combined with the data availability evaluation requirements in video big data analysis,the specific applications of the proposed models and algorithms are demonstrated.The big data usability evaluation model can be used for data evalua-tion of data science experiments,and can also be used for data set pricing in big data transaction markets.In the actual evaluation work,how to standardize(commercialized) data sets,and how to determine the specific operational aspects of the video field eva-luation benchmarks are given.The application case supports the proposed model and further tests the feasibility of the model.

Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data

MA Meng-yu, WU Ye, CHEN Luo, WU Jiang-jiang, LI Jun, JING Ning

Computer Science. 2020, 47 (9): 117-122. doi:10.11896/jsjkx.190800121

Abstract

PDF(2142KB) ( 1129 )

References | Related Articles | Metrics

Rapid visualization of large-scale geographic vector data remains a challenging problem in geographic information scien-ce.In existing visualization methods,the computational scales expand rapidly with data volumes,leading to the result that it is difficult to provide real-time visualization for large-scale geographic vector data,though parallel acceleration technologies are adop-ted.This paper presents a display-oriented data visualization method for large-scale geographic vector data.Different from traditional methods,the core task of the display-oriented method is to determine the pixel range according to the screen display and then calculate the value of each pixel in the range.As the number of pixels for display is stable,the display-oriented data visualization method is less sensitive to data volumes and can be used to provide real-time data visualization for large-scale geographic vector data.Experiments show that our approach is capable of handling 100-million-scale geographic vector data.

Action-related Network:Towards Modeling Complete Changeable Action

HE Xin, XU Juan, JIN Ying-ying

Computer Science. 2020, 47 (9): 123-128. doi:10.161896/jsjkx.190800101

Abstract

PDF(1271KB) ( 938 )

References | Related Articles | Metrics

When modeling the complete action in the video,the commonly used method is the temporal segment network (TSN),but TSN cannot fully obtain the action change information.In order to fully explore the change information of action in the time dimension,the Action-Related Network (ARN) is proposed.Firstly,the BN-Inception network is used to extract the features of the action in the video,and then the extracted video segmentation features are combined with the features output by the Long Short-Term Memory (LSTM),and finally classified.With the above approach,ARN can take into account both static and dyna-mic information about the action.Experiments show that on the general data set HMDB-51,the recognition accuracy of ARN is 73.33%,which is 7% higher than the accuracy of TSN.When the action information is increased,the recognition accuracy of ARN will be 10% higher than TSN.On the Something-Something V1 data set with more action changes,the recognition accuracy of ARN is 28.12%,which is 51% higher than the accuracy of TSN.Finally,in some action categories of HMDB-51 dataset,this paper further analyzes the changes of the recognition accuracy of ARN and TSN when using more complete action information res-pectively.The recognition accuracy of ARN is higher than TSN by 10 percentage points.It can be seen that ARN makes full use of the complete action information through the change of the associated action,thereby effectively improving the recognition accuracy of the change action.

Multi-branch Convolutional Neural Network for Lung Nodule Classification and Its Interpretability

ZHANG Jia-jia, ZHANG Xiao-hong

Computer Science. 2020, 47 (9): 129-134. doi:10.11896/jsjkx.190700203

Abstract

PDF(1856KB) ( 1788 )

References | Related Articles | Metrics

The characteristics of lung nodules are complex and diverse,which make it difficult to classify lung nodules.Although more and more deep learning models are applied to the lung nodule classification task of computer-aided lung cancer diagnosis systems,the “black box” characteristics of these models cannot explain what knowledge the model has learned from the data and how the knowledge influences the decision,leading to a lack of reliability in the diagnosis results.To this end,an interpretable multi-branch convolutional neural network model is proposed to identify the benign and malignant lung nodules.The model uses the semantic features of the pulmonary nodules which radiologists use in diagnosis to assist identifying the benign and malignant lung nodules.These characteristics are combined with the branch of malignancy classification into a multi-branch network.Then beyond the malignancy classification,the model can predict nodule attributes,which could potentially explain the diagnosis result.Experimental results on the LIDC-IDRI dataset show that,compared with the existing methods,the proposed model can not only obtain interpretable diagnostic results,but also achieve better classification of lung nodules with an accuracy rate of 97.8%.

Single Image Super-resolution Algorithm Using Residual Dictionary and Collaborative Representation

TIAN Xu, CHANG Kan, HUANG Sheng, QIN Tuan-fa

Computer Science. 2020, 47 (9): 135-141. doi:10.11896/jsjkx.190600146

Abstract

PDF(2107KB) ( 939 )

References | Related Articles | Metrics

Usually,the traditional single image super resolution (SR) algorithms generate the high resolution (HR) images with insufficient high-frequency information and blurred edges.To improve the quality of the reconstructed HR images,this paper proposes a single image SR algorithm by using residual dictionary and collaborative representation(Residual Dictionary and Collaborative Representation,RDCR).In the training phase,firstly,based on the ideas of dictionary learning and collaborative representation,a main dictionary and the corresponding main projection matrices are learned.After that,the reconstructed image samples are utilized to train multiple layers of residual dictionaries and residual projection matrices.In the testing phase,high-frequency information is gradually refined by reconstructing the residual information layer by layer.Extensive experimental results show that,at a scale factor of 4,the average peak signal-to-noise ratio (PSNR) values obtained by the proposed method on Set5 and Set14 are 0.20dB and 0.18dB higher than the traditional method A+,respectively.And the running time of the proposed method is close to that of A+.

Expression Animation Synthesis Based on Improved CycleGan Model and Region Segmentation

YE Ya-nan, CHI Jing, YU Zhi-ping, ZHAN Yu-liand ZHANG Cai-ming

Computer Science. 2020, 47 (9): 142-149. doi:10.11896/jsjkx.190900203

Abstract

PDF(4489KB) ( 1585 )

References | Related Articles | Metrics

Aiming at the problems of mostly relying on data source driver,low generation efficiency and poor authenticity of the existing facial expression synthesis methods,this paper proposes a new method for expression animation synthesis based on the improved CycleGan model and region segmentation.This new method can synthesize new expression in real time and has good stability and robustness.The proposed method constructs a new covariance constraint in the cycle consistent loss function of the traditional CycleGan model,which can effectively avoid color anomalies and image blurring in generation of new expression images.The idea of zonal training is put forward.The Dlib face recognition database is used to detect the key points of the face images.The detected key feature points are used to segment the face in domain source and target domain into four zones:left eye,right eye,mouth and the rest of the face.The improved CycleGan model is used to train each region separately,and finally the training results are weighted and fused into the final new expression image.The zonal training further enhances the authenticity of expression synthesis.The experimental data comes from the SAVEE database,and the experimental results are presented with python 3.4 software under the Tensorflow framework.Experiments show that the new method can directly generate real and natu-ral new expression sequences in real time on the original facial expression sequence without data source driver.Furthermore,for the voice video,it can effectively ensure the synchronization between the generated facial expression sequence and the source audio.

No-reference Stereo Image Quality Assessment Based on Disparity Information

ZHU Ling-ying, SANG Qing-bing, GU Ting-ting

Computer Science. 2020, 47 (9): 150-156. doi:10.11896/jsjkx.190700213

Abstract

PDF(2737KB) ( 1368 )

References | Related Articles | Metrics

In recent years,with the rapid development of deep learning in the field of image quality assessment (IQA),2D-IQA has been improved,but 3D-IQA still needs to be improved.Therefore,combining the three-branch convolutional neural network,the paper proposes a no-reference stereo image quality assessment based on disparity information and analyzes the influence of different disparity maps on the performance of the model.The algorithm takes the left/right view patches and the disparity map patches as input,automatically extracts features,and obtains the regression model through training to realize the prediction of the stereo images.In this paper,5 different stereo matching algorithms are used to generate disparity maps.The experimental results show that the SAD algorithm is the best.The experimental results on stereo image databases LIVE3D and MCL3D show that the method is not only suitable for evaluating symmetric distortion images,but also for evaluating asymmetric distortion stereo images.The overall distortion results of this method are superior to other comparison algorithms.Especially on the MCL3D image database,the evaluation method PLCC and SROCC of the proposed method are 1% and 4% higher than other methods.The Experimental data shows that the proposed model improves the performance of stereo image quality assessment,which is highly consistent with human subjective perception.

Cascaded Siamese Network Visual Tracking Based on Information Entropy

ZHAO Qin-yan, LI Zong-min, LIU Yu-jie, LI Hua

Computer Science. 2020, 47 (9): 157-162. doi:10.11896/jsjkx.190800160

Abstract

PDF(2738KB) ( 991 )

References | Related Articles | Metrics

Visual tracking is an important research direction in the field of computer vision.In view of the problems such as poor robustness of the current algorithms to object appearance changes,this paper proposes a cascaded Siamese network visual trac-king method based on information entropy.Firstly,the deep convolution feature is extracted from the first frame target template and the area to be detected of the current frame by using the Siamese network,and the response map is calculated by correlation.Then,the quality of the response map is evaluated according to the defined information entropy and the average peak coefficient,and for the response map with poor quality,the model factor of convolution feature is updated.Finally,the final response map is used to determine the target position and calculate the optimal scale.The experimental results on VOT2016 and VOT2017 datasets show that the proposed method is superior to other algorithms on the basis of ensuring real-time operation.

Fast Face Recognition Based on Deep Learning and Multiple Hash Similarity Weighting

DENG Liang, XU Geng-lin, LI Meng-jie, CHEN Zhang-jin

Computer Science. 2020, 47 (9): 163-168. doi:10.11896/jsjkx.190900118

Abstract

PDF(2177KB) ( 1578 )

References | Related Articles | Metrics

Whether using the traditional method or neural network for face recognition,there are problems of large computation and long computation time.It is difficult to detect and match the faces in the video in real time.Aiming at the above problems,lightweight neural network is used for face detection,simple hash algorithm is used to calculate the similarity of face images,and multiple hash similarity values are weighted for face matching.It is a feasible scheme to reduce computation time and realize fast face recognition.The lightweight neural network Mobilenet is used as the face feature extraction network,and the pruned SSD model is used as the detection network.The face detection is realized by cascading Mobilenet and SSD,and then the detected face image is recognized.Firstly,the mean hash similarity and the perceived hash similarity of the face images are calculated separately.Then,taking α and β as weighted coefficients of the mean hash and the perceived hash respectively,the mean hash and perceived hash similarity value of the image are weighted,and the result is taken as the final similarity of the image.When the weighted similarity value is greater than the set threshold I,it is considered to be the same person.When the weighted similarity value is less than the set threshold K,it is considered to be a different person.For images whose similarity is between thresholds I and K,they are optimally matched in order of similarity values from high to low.The face detection accuracy rate of the proposed method on WiderFace and FDDB reaches 92.5% and 94.2% respectively,and the average processing time per image is 56ms.The accuracy of face matching in the ORL standard face database reaches 96.2%.When camera is used for real-time face recognition test,the face recognition accuracy of the proposed method is 95%,and the average face recognition speed is 80ms.It has been proved by experiments that real-time face detection and matching can be realized under the premise of ensuring high accuracy.

Improved Sequence-to-Sequence Model for Short-term Vessel Trajectory Prediction Using AIS Data Streams

YOU Lan, HAN Xue-wei, HE Zheng-wei, XIAO Si-yu, HE Du, PAN Xiao-meng

Computer Science. 2020, 47 (9): 169-174. doi:10.11896/jsjkx.190800060

Abstract

PDF(2280KB) ( 2551 )

References | Related Articles | Metrics

Using deep learning to predict the vessel trajectory is of great significance for the intelligent shipping.AIS (Automatic Identification System) data contain a huge amount of information about vessel trajectory features.The prediction of ship trajectories based on AIS data becomes one of the research hotspots in the intelligent shipping realm.In this paper,an improved sequence-to-sequence model using AIS data streams is proposed for the short-term vessel trajectory prediction.The proposed model utilizes a GRU network to encode the historical spatio-temporal sequence into a context vector,which not only preserves the sequential relationship among those trail points,but also is helpful for the alleviation of the gradient descent problem.Meanwhile,a GRU network is used as a decoder to output target trail points sequence.In this paper,a large scale of real AIS data are used in the experiments.The Chongqing section and the Wuhan section of the Yangzi River are selected as typical experimental areas,which is for the evaluation of the validity and applicability of the model.Experimental results show that the proposed model improves the accuracy and efficiency of short-term ship trajectory prediction.The proposed model provides an effective solution for the intelligent shipping warning in the future.

Ship Trajectory Classification Method Based on 1DCNN-LSTM

CUI Tong-tong, WANG Gui-ling, GAO Jing

Computer Science. 2020, 47 (9): 175-184. doi:10.11896/jsjkx.191000162

Abstract

PDF(2447KB) ( 2146 )

References | Related Articles | Metrics

Due to the limited vision and cost of the monitoring equipment,the classification methods of ships based on images or videos are not very effective.So it is urgent to improve classification methods of the ships and the accuracy of those methods.In recent years,with the widelyused of various trajectory data acquisition systems,it has become possible to classify ship types through ship trajectory data.Based on the problem that the traditional two-dimensional convolutional neural network is lacking the ability of feature compression and temporal feature expression in ship trajectory recognition,this paper proposes a hybrid model which combines one-dimensional convolutional neural network (IDCNN) with long short-term memory (LSTM).This model can identify ship types by using the data collected from the automatic identification system (AIS).Firstly,this paper preprocesses the ship trajectory data collected by AIS to filter the noise data.Secondly,to solve the problem that the features hidden in the original ship trajectory information are over obscurity for 1DCNN,this paper proposes an algorithm for constructing the trajectory distribution feature vectors which can be accepted by 1DCNN for a large number of ship trajectory data.On this basis,the algorithm extracts the time series feature vectors which can be accepted by LSTM.Finally,this paper combines the trained 1DCNN model and LSTM model to get a hybrid ship classification model.Based on the AIS data of Bohai area on June 2016, the hybrid model combining 1DCNN with LSTM is used to classify five different typical ships including fishing ships,passenger ships,tanker ships,container ships and bulk-cargo ships.The experimental results show that compared with the method of using a neural network such as LSTM as classifier,the proposed method is obviously effective,and is an effective ship trajectory classification method.

MTHAM:Multitask Disease Progression Modeling Based on Hierarchical Attention Mechanism

PAN Zu-jiang, LIU Ning, ZHANG Wei, WANG Jian-yong

Computer Science. 2020, 47 (9): 185-189. doi:10.11896/jsjkx.190900001

Abstract

PDF(1639KB) ( 1529 )

References | Related Articles | Metrics

Alzheimer’s disease (AD) is an irreversible neurodegenerative disease.The degeneration of brain tissue causes serious cognitive problems and eventually leads to death.There are many clinical trials and research projects to study AD pathology and produce some data for analysis.This paper focuses on the diagnosis of AD and the prediction of potential prognosis in combination with a variety of clinical features.In this paper,a multi-task disease progression model based on hierarchical attention mechanism is proposed.The task of disease automatic diagnosis is regarded as the main task,and the task of disease prognosis is regarded as the auxiliary task to improve the generalization ability of the model,and then improve the performance of disease automatic diagnosis task.In this paper,two layers of attention mechanism are applied in the feature layer and the medical record layer respectively,so that the model can pay different attention to different features and different medical records.The validation experiment is carried out on ADNI (Alzheimer’s Disease Neuroimaging Initiative) dataset.Compared with several benchmark models,the experimental results show that the proposed method has better performance and provides better robustness for clinical application.

Active Label Distribution Learning Based on Marginal Probability Distribution Matching

DONG Xin-yue, FAN Rui-dong, HOU Chen-ping

Computer Science. 2020, 47 (9): 190-197. doi:10.11896/jsjkx.200700077

Abstract

PDF(2017KB) ( 1041 )

References | Related Articles | Metrics

Label distribution learning (LDL)is a new learning paradigm for learning on instances labeled with label distribution,and has been successfully applied to real world scenes such as face age estimation,head pose estimation,and emotion recognition in recent years.In label distribution learning,enough data labeled by label distribution is needed when people train a model with good prediction performance.However,label distribution learning sometimes faces the dilemma that labeled data is insufficient and that marking enough label distribution data means high annotation cost.The Active label distribution learning based on marginal probability distribution matching (ADLD-MMD)algorithm is designed to solve the problem of high annotation cost for label distribution learning,by reducing the amount of labeled data required to train the model and reducing the annotation cost accor-dingly.The ALDL-MMD algorithm trains a linear regression model.While ensuring the minimum training error of the linear regression model,it learns a sparse vector that reflects that which instance in the unlabeled data set are selected,so that the data distribution of the training data set and unlabeled data set after instance selection is as similar as possible.We relax the vector for easy calculation.An effective method to optimize the objective function in ALDL-MMD is given,and proof for the convergence of ALDL-MMD is also provided.The experimental results on multiple label distribution data sets show that the ALDL-MMD algorithm is superior to the existing active example selection methods on the two evaluation measures of "Canberra Metric" (distance) and “Intersection” (similarity) to measure that what degree of the label distribution of the instance is accurate,which reflects its effectiveness in reducing annotation costs.

Construction of Semantic Mapping in Dynamic Environments

QI Shao-hua, XU He-gen, WAN You-wen, FU Hao

Computer Science. 2020, 47 (9): 198-203. doi:10.11896/jsjkx.191000040

Abstract

PDF(2579KB) ( 2529 )

References | Related Articles | Metrics

Three-dimensional semantic maps play a key role in tasks such as robot navigation,path planning,intelligent grasping and human-computer interaction.So how to construct 3D semantic maps in real time is especially important.The current SLAM (simultaneous localization and mapping) algorithm can achieve higher positioning and mapping accuracy.However,how to eliminate dynamic objects to obtain higher positioning accuracy in a dynamic environment,and to understand the existence of objects and their location information in the surrounding scenes are still not well solved.This paper presents an algorithm for constructing semantic maps in a dynamic environment.This algorithm is improved on ORB-SLAM2.The dynamic and static point detection algorithm is added to the tracking thread to eliminate the feature points detected as dynamic feature points,which improves the positioning accuracy in dynamic environment.Object detection threads are added to detect key images.The mapping threads are added with the Octo-Map dense map construction.At the same time,the 3D object database is constructed according to the detection results.In order to prove the feasibility of the algorithm,the laboratory is used as the test environment,and the object detection,dynamic point detection,3D target information acquisition,and semantic map construction experiments in the dynamic environment are performed.In the object detection experiment,a high-speed and high-precision object detection network,mobilenet-v2-ssdlite,is trained,which can reach a detection speed of 7 frames/s,which can basically achieve real-time detection.In dynamic point detection,the optical flow method was used to eliminate dynamic point,processing speed is 16.5 frames/s.And this paper creates a data set to evaluate the performance of the algorithm.Compared with the original ORB-SLAM2 algorithm,the positioning accuracy is improved by 5 times after combining with the optical flow method.For the acquisition of three-dimensional object information,two methods based on depth filtering and point cloud segmentation are adopted.The results show that the latter’s 3D object acquisition is more accurate.Finally,the entire laboratory is constructed with a semantic map in a dynamic environment,an Octo-Map dense map is constructed,and a 3D object database is constructed based on the detection results.The detected values of the object size and position are compared with the true values,and the errors are within 5cm.The results show that the proposed algorithm has high accuracy and real-time performance.

FastSLAM Algorithm Based on Adaptive Fading Unscented Kalman Filter

WANG Bing-zhou, WANG Hui-bin, SHEN Jie, ZHANG Li-li

Computer Science. 2020, 47 (9): 213-218. doi:10.11896/jsjkx.190700186

Abstract

PDF(2114KB) ( 1210 )

References | Related Articles | Metrics

Simultaneous localization and mapping(SLAM) is the main method to realize autonomous navigation of robots in unknown environments and FastSLAM algorithm is a popular solution to SLAM problem.Due to the sequential importance sampling method used in FastSLAM,a few of particles have a larger weight while the weight of most particles becomes very small throughout the iterative process,which leads to particle degradation.In order to make the particle distribution more accurate and reduce the particle degradation,a FastSLAM algorithm based on adaptive fading unscented Kalman filter (AFUKF) is proposed to improve the estimation accuracy of FastSLAM algorithm.To overcome the problem of particle degradation in FastSLAM,starting from the study of particle’s proposal distribution function,this paper uses adaptive fading unscented Kalman filter (AFUKF) instead of EKF to estimate the proposed distribution function of robot’s position to avoid the linearization error of EKF.With using the idea of adaptive fading filter,the proposal distribution is closer to the posterior position of the mobile robot and the particle set degradation is relieved.The simulation results on MATLAB platform show that the mean square error of position estimation of the proposed method is 28.7% lower than that of standard FastSLAM,i.e.the estimation accuracy is improved by 28.7%.And the proposed method achieves high estimation accuracy compared with the related algorithms in recent years.When increasing the increase of the number of particles,the estimation accuracy of each algorithm is improved,and the proposed algorithm still achieves the highest estimation accuracy.The experimental results fully show that the proposed algorithm can calculate the proposed distribution function more accurately and effectively alleviate the particle degradation problem in FastSLAM algorithm,which significantly improve the estimation accuracy of FastSLAM algorithm.

Graph Classification Model Based on Capsule Deep Graph Convolutional Neural Network

LIU Hai-chao, WANG Li

Computer Science. 2020, 47 (9): 219-225. doi:10.11896/jsjkx.190900044

Abstract

PDF(3276KB) ( 1516 )

References | Related Articles | Metrics

Aiming at the problems of structure information extraction when the extracted graph representation is used for graph classification,a graph classification model based on the fusion of graph convolutional neural network and capsule network is proposed.Firstly,the node information in the graph is processed by the convolutional neural network,and the node representation is obtained after iteration.The sub-tree structure information of the node is contained in the representation.Then,by using the idea of Weisfeiler-Lehman algorithm,the multi-dimensional representations of nodes are sorted to obtain multi-view representations of the graph.Finally,the multi-view graph representations are converted into capsules and input into the capsule network to obtain a higher level of classification capsule by dynamic routing algorithm,then proceed to classification.The experimental results show that the classification accuracy of the proposed model is improved by 1%~3% on the public dataset,and it has stronger structu-ral feature extraction ability.Compared with DGCNN,its performance is more stable in the case of less samples.

Survey of Layered Architecture in Large-scale FANETs

YOU Wen-jing, DONG Chao, WU Qi-hui

Computer Science. 2020, 47 (9): 226-231. doi:10.11896/jsjkx.190900164

Abstract

PDF(1673KB) ( 5338 )

References | Related Articles | Metrics

In recent years,with the development of electronic and communication technologies,UAVs tend to be miniaturized,large-scale Unmanned Aerial Vehicle (UAV)formations represented by UAV swarms have attracted the attention of industry and academia.Considering the increasingly complex tasks and application environment,autonomous UAV formations are become an important development direction.In order to realize the autonomous control of the formation,the Flying UAV Ad hoc Networks (FANETs) which can provide efficient and flexible communication among the UAVs becomes critical.However,large-scale brings a series of challenges to resource allocation,channel access and network routing of FANETs,and the layered architecture can effectively deal with these challenges.Firstly,this paper introduced the research status of two kinds of common layered architectures including clustering and alliance,analyzed the application environments for both architectures.And then this paper made a comparative study of the two kinds of architectures.Finally,the potential research directions in the future was discussed in details.

Novel Real-time Algorithm for Critical Path of Linear Network Coding

HAN Xiao-dong, GAO Fei, ZHANG Li-wei

Computer Science. 2020, 47 (9): 232-237. doi:10.11896/jsjkx.190800023

Abstract

PDF(1640KB) ( 1011 )

References | Related Articles | Metrics

Nowadays,the amount of stored and exchanged information in human society is growing geometrically,and the throughput and real-time performance of data transmission need to be improved.While existing studies of network coding focus on improving throughput,the significant impacts of the real-time performance on multipath transmission in big data networks are ignored.This paper addresses the fastest arrival problem for linear network coding,and proposes a critical path computation algorithm with optimizatized matrix multiplication to improve the real-time performance.In particular,this paper uses the abstract algebra to analyze the critical path algorithm,constructs the commutative ring for the critical path,and proves the optimal substructure property.Simulation results show that the optimized algorithm significantly reduces the time complexity of critical path computation to O(n^2.81lgn),shortens the propagation delays and improves the real-time performance.The time-cost growth rate based on the Strassen critical path algorithm is significantly lower than the repeated square path algorithm while n>6.Specially,while n=12,the computational complexity based on the Strassen critical path algorithm is approximately 2/3 compared to the repetitive squared critical path algorithm,and the time overhead required is about 1/2 compared to the latter.

Cloud Resource Scheduling Mechanism Based on Adaptive Virtual Machine Migration

LI Shuang-gang, ZHANG Shuang, WANG Xing-wei

Computer Science. 2020, 47 (9): 238-245. doi:10.11896/jsjkx.190900189

Abstract

PDF(2453KB) ( 1326 )

References | Related Articles | Metrics

Virtual machine (VM) migration is an important research field of current cloud computing resource scheduling.Now the continuous growth of users has brought some new challenges,and current typical migration strategies are difficult to adapt to dynamically changing internal and external environments.Aiming at this problem,this paper proposed an overall framework of adaptive VM migration.Via modeling VM migration,the concepts of “migration path” and “service overhead” were proposed,and the server’s CPU utilization and bandwidth utilization of links between servers were used as indicators to plan the optimal migration path for all to-be-migrated VMs in the system to minimize the total service overhead.Firstly,a threshold-based selection algorithm is presented for the selection of the to-be-migrated VMs.Secondly,an auto regressive integrated moving average model (ARIMA)-based time series prediction algorithm is designed to predict the service overhead within the server’s future time window.Then,the migration path calculation algorithm is designed based on servers’ predicted service overhead and dynamic programming,and an optimal migration plan is made for each to-be-migrated VM.Finally,based on the performance of the prediction window determined by the difference between the predicted service overhead and the real value via the migration path,a prediction window adaptive adjustment algorithm is designed and implemented.Experiments prove that the adaptive VM migration has good effects in terms of adaptive adjustment and minimizing service overhead.

NFV Memory Resource Management in 5G Communication Network

SU Chang, ZHANG Ding-quan, XIE Xian-zhong, TAN Ya

Computer Science. 2020, 47 (9): 246-251. doi:10.11896/jsjkx.190800008

Abstract

PDF(2371KB) ( 1010 )

References | Related Articles | Metrics

ith the deepening of 5G research and the advancement of commercialization,it also brings various challenges.Among them,the resource management of 5G communication system is a key challenge for the research of 5G network.Network Function Virtualization (NFV) technology provides key support for 5G implementation,and it also introduces new research directions for 5G resource management issues.However,resource management in network function virtualization scenarios is a more complex issue.In particular,different placements of virtual network functions have different effects on their performance.Firstly,this paper analyzed and studied the impact of NFV resource allocation methods and NFV placement on performance.On this basis,this paper mainly discussed machine learning based on the example proposed by Knowledge Definition Network (KDN).The technology is applied to the study of virtual network function memory resource management,constructing a neural network learning model,and predicting memory resource consumption.Secondly,the focus of this paper is on the extraction of the characteristics of the input traffic.The traffic is mainly represented by a set of features,which represent small batches of information from the data link layer to the transport layer,where the memory consumption is from the hypervisor.The average memory consumption of the batch is obtained on the performance monitoring tool.Finally,this paper aimed to manage memory resources by using neural networks to predict memory resource consumption.

RFID Indoor Relative Position Positioning Algorithm Based on ARIMA Model

XU He, WU Man-xing, LI Peng

Computer Science. 2020, 47 (9): 252-257. doi:10.11896/jsjkx.200400038

Abstract

PDF(2790KB) ( 1288 )

References | Related Articles | Metrics

For indoor positioning scenarios,there is often a need to obtain the order in which certain items are placed.RFID(Radio Frequency Identification) is one of the solutions that can be selected because of its light weight and low cost.To solve the problem of relative positioning of items by studying the ARIMA based on the phase and time series prediction model,this paper proposes an indoor relative position positioning algorithm based on UHF (Ultra-High Frequency) RFID tags.By using passive RFID tags and readers,moving the RFID antenna to obtain the phase value,the ARIMA model is used to predict the sequence of the phase change during the movement of the antenna,the time series is predicted to reach a certain time stamp,and then the prediction time is given.The weights are assigned to the time stamps of some special phase points in the process of stamping and phase change,and the final time stamps are obtained to sort relative positions.Experiments show that this RFID indoor relative position positioning system can achieve recognition accuracy rateby 96.67% for book sequence detection in a library environment.Compared with the classical STPP algorithm and HMRL algorithm,its performance is greatly improved.

HATBED:A Distributed Hardware Assisted Tracing Testbed for IoT

MA Jun-yan, LI Yi, LI Shang-rong, ZHANG Te, ZHANG Ying

Computer Science. 2020, 47 (9): 258-264. doi:10.11896/jsjkx.191000048

Abstract

PDF(3708KB) ( 1087 )

References | Related Articles | Metrics

Internet of Things systems,such as wireless sensor networks,usually have the characteristics of the high restriction of the resources and coupling with the physical world,which makes it difficult to debug the equipment after deployed.Therefore,it is especially important to thoroughly test and profile the systems before deploying to the real world.Due to the intrusiveness,traditional debugging methods based on the serial port are incompetent for detailed tracing on resource-constrained devices.This paper studies the application of hardware assisted tracing technology in the embedded network sensor systems’ test and evaluation.Then,it designs and realizes a Hardware Assisted Tracing testBed (HATBED).HATBED consists of a controller,observers,and targets.It can provide three services,network-wide remote debugging,flexible software tracing and non-invasive software profiling.HATBED can support non-intrusive tracing and profiling without relying on operating systems and applications.In the experiment,this paper benchmarks time and power consumption,time accuracy,and code coverage under bare-metal and FreeRTOS.Then,it tests the RIOT-OS examples and completes the ping6 command high time accuracy feature profiling and UDP communication function coverage and basic block coverage.With the help of hardware assisted tracing technology,HATBED caneva-luate the resource-constrained Internet of Things systems more efficiently and adequately.

Study on Complex Network Cascading Failure Based on Totally Asymmetric Simple Exclusion Process Model

YANG Chao, LIU Zhi

Computer Science. 2020, 47 (9): 265-269. doi:10.11896/jsjkx.190700069

Abstract

PDF(2423KB) ( 1110 )

References | Related Articles | Metrics

Studying the impact of cascading failures of complex networks on the dynamic behavior of the network has a high application value for maintaining network security and ensuring network stability.From the perspective of network cascading,the problem of system traffic change in the totally asymmetric simple exclusion process model is analyzed.Therefore,this paper uses a network model based on a completely asymmetric simple exclusion process for cascading failure research.The size of the largest strongly connected subgraph,the number of strongly connected subgraphs,and the current of network are compared.It is shown that the size of the largest strongly connected subgraph is positively correlated with the current.And the minimum threshold of network current is determined by the number of strongly connected subgraphs of the network.Then,the simulation experiments are carried out in different average networks,which shown that with the increase of the edge removal rate,the greater the average degree of network is,the lower the rate of network traffic decline is.Finally,the different particle densities are taken.The simulation experiments on network show that the change of average density has little effect on the rate of flow decline at low density and high density,and the decline rate of current is almost constant in the intermediate density interval.

Energy Efficient Virtual Network Mapping Algorithms Based on Node Topology Awareness

ZHU Guo-hui, ZHANG Yin, LIU Xiu-xia, SUN Tian-ao

Computer Science. 2020, 47 (9): 270-274. doi:10.11896/jsjkx.190700162

Abstract

PDF(1964KB) ( 1085 )

References | Related Articles | Metrics

Aiming at the problem of over-saturation of existing network resources,this paper proposes an efficient and energy-saving virtual network mapping algorithm based on node topology awareness.In the node mapping stage,the proposed algorithm quantifies the cost of node mapping and considers the topological attributes.It evaluates the candidate physical nodes of each virtual node through the improved node sorting algorithm,and calculates the best mapping nodes.In the link mapping stage,the Dijkstra algorithm is used to redefine the link by considering the maximum link residual bandwidth resources,the maximum path node residual resources and hops.In order to achieve the goal of energy-saving and high efficiency,the ranking value is used to obtain the effective link with the lowest energy cost.The simulation results show that the proposed sorting method can effectively reduce the energy cost and significantly improve the parameters such as request acceptance rate and revenue-cost ratio of virtual networks.

Comprehensive Review of Secure Electronic Voting Schemes

PU Hong-quan, CUI Zhe, LIU Ting,RAO Jin-tao

Computer Science. 2020, 47 (9): 275-282. doi:10.11896/jsjkx.190900125

Abstract

PDF(1745KB) ( 3161 )

References | Related Articles | Metrics

In recent years,electronic voting has been highly concerned because it can greatly improve the efficiency of voting activities and the accuracy of the results.The security problem has been the bottleneck of the development of electronic voting.Many researchers put forward relevant electronic voting schemes for a certain application function scenario.Combined with the academic research status of electronic voting,this paper analyzes the types,models and security requirements of electronic voting in detail,summarizes and analyzes four types of typical electronic voting schemes by combining blind signature,secret sharing and other related cryptography technologies,then introduces the mature electronic voting system,and finally this paper studies the possible development direction of electronic voting in the future,which provides reference for further optimization and improvement of electronic voting schemes.

Overview of Deepfake Video Detection Technology

BAO Yu-xuan, LU Tian-liang, DU Yan-hui

Computer Science. 2020, 47 (9): 283-292. doi:10.11896/jsjkx.200400130

Abstract

PDF(2576KB) ( 5178 )

References | Related Articles | Metrics

The abuse of deepfake brings potential threats to the country,society and individuals.Firstly,this paper introduces the concept and current trend of deepfake,analyzes the generation principle and models of deepfake videos based on generative adversarial networks,and introduces the video data processing algorithms and the mainstream deepfake datasets.Secondly,this paper summarizes the detection methods based on the tampering features in video frames.Aiming at the detection of visual artifacts and facial noise features in deepfake video frames,the classification algorithms and models related to machine learning and deep learning are introduced.Then,specific to inconsistency of time-space state between deepfake video frames,the relevant time series algorithms and detection methods are introduced.Then,the tamper-proof public mechanism based on blockchain tracing and information security methods such as digital watermark and video fingerprinting are introduced as supplementary detection means.Finally,the future research direction of deepfake video detection technology is summarized.

Research on Lattice-based Quantum-resistant Authenticated Key Agreement Protocols:A Survey

NI Liang, WANG Nian-ping, GU Wei-li, ZHANG Qian, LIU Ji-zhao, SHAN Fang-fang

Computer Science. 2020, 47 (9): 293-303. doi:10.11896/jsjkx.200400138

Abstract

PDF(1512KB) ( 2321 )

References | Related Articles | Metrics

Recent advances in quantum computing have posed a serious potential security threat to the majority of current network security protocols,whose security relies on classical number-theoretic hard problems.As the basic network security protocols,authenticated key agreement protocols bear the brunt.Therefore,quantum-resistant authenticated key agreement protocols have become a recent hot research topic.Thereinto,lattice-based post-quantum cryptographic schemes,with strong security and high computational efficiency,have gained extensive attention in recent years,and are developing rapidly,which are expected to be included in the future standards of quantum-resistant cryptographic algorithms.In this paper,research on lattice-based post-quantum authenticated key agreement protocols is focused on.Firstly,the research background of quantum-resistant authenticated key agreement protocols is introduced,and the main computational hard problems that the security designs of current lattice-based post-quantum cryptographic schemes depend on are also described.Then,an overview of the existing typical lattice-based post-quantum authenticated key agreement protocols is given,and by taking the two-party protocols as the main research object,the basic construction modes of related schemes and performance of several current typical related protocols are discussed,analyzed and compared.Lastly,the existing problems in the current research are summarized,and the future development of related research is also forecasted.

Certificateless Signature Scheme Without Bilinear Pairings and Its Application in Distribution Network

LIU Shuai, CHEN Jian-hua

Computer Science. 2020, 47 (9): 304-310. doi:10.11896/jsjkx.200500002

Abstract

PDF(1476KB) ( 1021 )

References | Related Articles | Metrics

The certificateless cryptosystem solves the complex problem of public key certificate management in the traditional public key cryptosystem and the problem of key escrow in the identity based cryptosystem.This paper proposes a certificateless signature scheme based on the elliptic curve with no bilinear pairings.Under the assumption of random oracle model and the difficulty of elliptic curve discrete logarithm,by using the bifurcation lemma (the Forking lemma),this paper proves that the scheme can resist the attack of the first class of strong adversaries and the second class of adversaries.Then,the performance of the scheme is compared with that of the other four certificateless signature schemes based on elliptic curve proposed since 2016,and all signature schemes are implemented by C language,and the efficiency of all schemes is compared.The results show that the ave-rage total time consumption of the proposed scheme is similar to that of Jia scheme,and compared with that of Hassouna scheme,Zhang scheme and Karati scheme,the average total time consumption are decreased by 51.0%,10.4% and 22.1% respectively,which shows that the total efficiency of this scheme has some advantages.Finally,the signature scheme of this paper is applied to the message authentication of Modbus TCP (Transmission Control Protocol) communication in distribution network.The security analysis of the proposed authentication protocol shows that the proposed scheme can resist replay attack,camouflage attack and man in the middle attack.

DGA Domains Detection Based on Artificial and Depth Features

HU Peng-cheng, DIAO Li-li, YE Hua, YANG Yan-lan

Computer Science. 2020, 47 (9): 311-317. doi:10.11896/jsjkx.191000118

Abstract

PDF(2760KB) ( 2148 )

References | Related Articles | Metrics

Nowadays,various families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to C&C (Command and Control) servers,in order to launch corresponding attacks.There are two existing methods to detect DGA domains.On the one hand,it is a machine learning method based on the randomness of DGA domain name to construct artificial features.This kind of algorithm has the problems of time-consuming and laborious artificial feature engineering and high false alarm rate and so on.On the other hand,LSTM,GRU and other deep learning technologies are used to learn the sequence relationship of DGA domain names.This kind of algorithm has a low detection accuracy for DGA domain names with low randomness.Therefore,this paper proposes a domain name generic feature extraction scheme,establishes a data set containing 41 DGA domain name families,and designs a detection algorithm based on artificial features and depth features that enhances the generalization ability of the model and improves the identification types of DGA domain families.Experimental results show that DGA domain name detection algorithm based on artificial features and depth features has achieved higheraccuracy and better generalization ability than traditional deep learning methods.

Multi-keyword Semantic Search Scheme for Encrypted Cloud Data

LI Yan, SHEN De-rong, NIE Tie-zheng, KOU Yue

Computer Science. 2020, 47 (9): 318-323. doi:10.11896/jsjkx.190800139

Abstract

PDF(1811KB) ( 1018 )

References | Related Articles | Metrics

Due to the flexibility,versatility,and low cost of cloud services,it is common to hand over data to cloud server management.However,cloud servers are not completely trusted,so it is one of the hot issues in current research to transfer encrypted data to cloud servers and support encrypted search.Although encryption can protect data privacy and security,it will cover the semantic information of the data itself and increase the difficulty of searching.This paper proposes a secure semantic search solution for multi-keywords for encrypted cloud data.The core idea is to obtain the topic vector of the document and the word distribution vector of the topic based on the topic model,and calculate the query keyword to be similar to the semantics of each topic.The query vector is generated to support the similarity between the query vector and the document subject vector in the same vector space.The calculation method of calculating the similarity between the query vector and the topic based on EMD combined with word embedding is proposed to improve the accuracy of semantic similarity.To support efficient semantic search,a topic vector index tree is constructed and a "greedy search" algorithm is used to optimize keyword search.Finally,theoretical analysis and experimental results show that the proposed solution can achieve secure multi-keyword semantic sorting search and greatly improve search efficiency.

Extended Algorithm of Pairwise Constraints Based on Security

YANG Fan, WANG Jun-bin, BAI Liang

Computer Science. 2020, 47 (9): 324-329. doi:10.11896/jsjkx.200700092

Abstract

PDF(1429KB) ( 970 )

References | Related Articles | Metrics

Cluster analysis based on pairwise constraints is an important research direction of semi-supervised learning.The number of pairwise constraints has become an important factor affecting the effectiveness of this type of the algorithm.However,in practical applications,the acquisition of pairwise constraints requires a lot of costs.Therefore,the extended algorithm of pairwise constraints based on security (PCES) is proposed.This algorithm takes the maximum local connected distance in the transitive closures as the safe value.According to the safe value,the similarity between the different transitive closures is modified to reduce the risk of merging transitive closures.Finally,the method of graph clustering is used to merge similar transitive closures to extend the pairwise constraints.This algorithm can not only safely and effectively expand pairwise constraints,but also apply the extended pairwise constraints to different semi-supervised clustering algorithms.This paper compares the extended algorithm of pairwise constraints on eight benchmark data sets.The experimental results show that the proposed algorithm can extend pairwise constraints safely and effectively.

High Trusted Cloud Storage Model Based on TBchain Blockchain

LI Ying, YU Ya-xin, ZHANG Hong-yu, LI Zhen-guo

Computer Science. 2020, 47 (9): 330-338. doi:10.11896/jsjkx.190800147

Abstract

PDF(2673KB) ( 1061 )

References | Related Articles | Metrics

Data stored in the cloud can be illegally stolen or tampered with,exposing users’ data to confidentiality threats.In order to store mass data more safely and efficiently,this paper proposes a storage model CBaaS(Cloud and Blockchain as a service) that supports the combination of index,traceability and verifiability of Cloud storage and Blockchain,which can enhance the credibility of data in the Cloud.Secondly,blockchain consensus protocol leads to low throughput and slow processing speed of transactions,which seriously restricts the development of decentralized applications.Based on this,this paper implements a three-tier architecture Blockchain model TBchain,which improves the scalability of the Blockchain and the throughput of transactions in the blockchain by dividing a part of the blockchain and locking it in the block of a higher level blockchain.Next,due to the demand of decentralization,blockchain occupies a large amount of storage space of massive nodes,which greatly limits the development and application of the database system based on blockchain technology.Part of the transaction is stored locally through TBchain,which increases the scalability of blockchain capacity.The ETag in the cloud storage object metadata is used to identify the contents of an Object and can be used to check if the contents of the Object have changed.By storing the object metadata in the cloud storage on the blockchain,the ETag value can be used to check whether the content of the Object changes and the data on the blockchain can not be tampered with to verify whether the data stored on the cloud is safe and improve the reliability of the data stored on the cloud.The experimental results show that the TBchain model improves the scalability and storage capacity scalability of the blockchain,and the CBaaS model also improves the reliability of data stored in the cloud.