Computer Science

Research Progress and Challenge of Programming by Examples

YAN Qian-yu, LI Yi, PENG Xin

Computer Science. 2022, 49 (11): 1-7. doi:10.11896/jsjkx.211000225

Abstract

PDF(1921KB) ( 9381 )

References | Related Articles | Metrics

Program synthesis means that the computer automatically constructs code that conforms to the specified grammar and user’s given specifications.Programming by examples is a kind of paradigm in program synthesis that takes input and output examples as the specifications.It has high usability and low learning cost.In recent years,it has been successfully implemented in applications such as data wrangling and string transformation,and has great potential for development.There are two main pro-blems to be solved in programming by examples.One is the problem of efficient search in huge program space,and the other is the problem of ambiguity in solutions.To solve the first problem,a method for programming by examples needs to select the appropriate domain-specific language and formulate the search algorithm when specifying the searching strategy.The applied searching algorithm can be classified into a rule-based algorithm and an algorithm based on statistical model.To solve the second problem,a method for programming by examples needs to formulate a sorting strategy.The applied sorting strategy can be classified into a sorting method based on given examples and a sorting method based on user interaction.This paper sorts out related literature on programming by examples in recent years,summarizes the methods and key technical points to solve the above two problems,and finally gives suggestions for future research directions in the field of programming by examples.

Research and Progress on Bug Report-oriented Bug Localization Techniques

NI Zhen, LI Bin, SUN Xiao-bing, LI Bi-xin, ZHU Cheng

Computer Science. 2022, 49 (11): 8-23. doi:10.11896/jsjkx.220200117

Abstract

PDF(2280KB) ( 10394 )

References | Related Articles | Metrics

Software bug localization is an important task in bug fixing process.The bug report-oriented localization approaches typically use software bug reports that describe the phenomenon of bugs as queries,and source code as corpus.First,correlations between the bug report and each source code unit are analyzed.Then,a bug localization data set is created by mining the software repository,aiming to construct a bug localization model to localize the source code unit(i.e.bug location) corresponding to the bug report.This paper offers a systematic survey of existing research achievements of the domestic and foreign studies of bug localization in recent years.First,the related concepts in bug report-oriented bug localization are introduced,and the main localization process is summarized,followed by discussing the existing research works that focus on the three key steps in the localization process.Then,the commonly used data sets and evaluation metrics for bug localization are summarized.Finally,future work in this research area is discussed.

Study on Integration Test Order Generation Algorithm for SOA

ZHANG Bing-qing, FEI Qi, WANG Yi-chen, Yang Zhao

Computer Science. 2022, 49 (11): 24-29. doi:10.11896/jsjkx.210400210

Abstract

PDF(1866KB) ( 10097 )

References | Related Articles | Metrics

Integration test order generation is an important problem in software integration testing research.Reasonable test order can improve the efficiency of integration test and reduce the cost of test.Service oriented architecture(SOA) is a kind of distributed architecture widely used in enterprises in recent years.At present,there are few researches on integration test order generation in SOA architecture.Due to the polymorphism of service composition in SOA architecture,it is impossible to get the integration test sequence between service software in SOA architecture by using the traditional top-down and bottom-up integration test strategies.However,the current research on the generation of integration test sequence based on class cluster in object-oriented system is difficult to apply to the complex coupling relationship between services in SOA architecture.Therefore,an integration test order generation method based on genetic algorithm is proposed to solve the integration test problem between service software in SOA architecture.In this method,the concept of service feature group is used to represent the influence factors of integration testing,and the concept of integration testing priority is used to represent the importance of integration testing of service software.At the same time,the test dependency graph is constructed to describe the complex coupling relationship between ser-vice software in SOA Architecture.In order to reduce the cost of test,a genetic algorithm is designed to generate integrated test sequence.Finally,an example is given to verify the feasibility and correctness of the method.The example results show that the proposed method can generate service software integration test orders with relatively high test priority and low test cost.

Decision Tree Algorithm-based API Misuse Detection

LI Kang-le, REN Zhi-lei, ZHOU Zhi-de, JIANG He

Computer Science. 2022, 49 (11): 30-38. doi:10.11896/jsjkx.211100177

Abstract

PDF(3144KB) ( 10738 )

References | Related Articles | Metrics

Application programming interface(API) benefits to effectively improve software development efficiency by reusing existing software frameworks or libraries.However,many constraints must be satisfied to correctly use APIs,such as call order,exception handling.Violation of these constraints will cause API misuse,which may result in software crashes,errors,or vulnerabilities.Although many API misuse detection techniques have been proposed,these techniques still face two challenges:1) the acquisition of API usage specification is difficult,and 2) the detection of many different types of API misuse at the same time is difficult.To address the above challenges,a decision tree algorithm-based API misuse detection method is proposed.First,the API usage source code is converted into an API usage graph,and the API usage specification is mined from the graph to effectively solve the first challenge.Second,an API usage decision tree is constructed based on the obtained API specification information,and the generalization ability of the API usage decision tree is improved by incorporating pruning strategies.Finally,a combination of coarse-grained and fine-grained detection is proposed in the detection phase to improve the detection capability of the API usage decision tree,which effectively solves the second challenge.Experimental results show that the proposed approach can rea-lize detection of API misuse defects to a certain extent.

AutoUnit:Automatic Test Generation Based on Active Learning and Prediction Guidance

ZHANG Da-lin, ZHANG Zhe-wei, WANG Nan, LIU Ji-qiang

Computer Science. 2022, 49 (11): 39-48. doi:10.11896/jsjkx.220200086

Abstract

PDF(2609KB) ( 10652 )

References | Related Articles | Metrics

Automated test case generation technology aims to reduce test costs.Compared with manual test generation,it has higher test efficiency.Most existing testing tools treat all files in the software equally,but in fact,files with defects account for only a small part of the whole code.Therefore,if testers can detect files that are more prone to defects,they can greatly save testing resources.To solve the above problems,this paper designs a predictive guidance test tool AutoUnit,which is based on active learning.We first predict the defect files in the whole file pool to be detected.Next,we use the detection tool to detect the most “suspicious” files.Then we feed back the actual detection results to the prediction model and update the model to enter the next round of prediction.In addition,when the total number of defective files is unknown,AutoUnit can stop in time by setting diffe-rent target recall rates.It can predict the total number of defective files according to the tested files,calculate the current recall rate,judge whether to stop predict guidance and ensure testing efficiency.Experimental analysis shows that when the same number of defect files are tested,the shortest time and the longest time taken by AutoUnit is 70.9% and 80.7% of the current mainstream testing tools,respectively.When the total number of defective files is unknown and the target recall rate is set to 95%,compared with the latest version of Evosuite,AutoUnit only needs to check 29.7% of the source code files to achieve the same detection level,and its test time is only 34.6% of Evosuite,the cost of testing is greatly reduced.Experimental results show that the method effectively improves the efficiency of test.

Semantic Restoration and Automatic Transplant for ROP Exploit Script

SHI Rui-heng, ZHU Yun-cong, ZHAO Yi-ru, ZHAO Lei

Computer Science. 2022, 49 (11): 49-54. doi:10.11896/jsjkx.210900230

Abstract

PDF(2661KB) ( 10836 )

References | Related Articles | Metrics

Exploit script plays an important role in security research.Security researchers need to study how the exploit script trigger and exploit the vulnerability,so as to effectively protect the vulnerable program.However,many exploit scripts obtained from network have poor generality and adaptability.They are limited to specific operating system and execution environment,and the change of environment will lead to execution failure.This problem is particular common in exploit scripts based on return-orinted programming(ROP),makes the transplanting and exploit analysis of ROP scripts are difficult and rely on manual assistance and expert knowledge.To solve this problem,we propose ROPTrans system,which locates key semantics and its variables related to the running environment through analysing the semantic of ROP script,and then automatically generates ROP script adapted to the target environment,so as to achieve the target of transplanting ROP scripts automatically.Experimental results show that the success rate of ROPTrans can reach up to 80%,which verifies the effectiveness of our method.

Study on Effectiveness of Quality Objectives and Non-quality Objectives for Automated Software Refactoring

GUO Ya-lin, LI Xiao-chen, REN Zhi-lei, JIANG He

Computer Science. 2022, 49 (11): 55-64. doi:10.11896/jsjkx.220300058

Abstract

PDF(3409KB) ( 10785 )

References | Related Articles | Metrics

The cost of software maintenance increases as the continuous iterative development of software.To reduce this cost,automated software refactoring is proven to be an effective solution.One of the most typical automated software refactoring approaches is the search-based refactoring.This approach refactors software according to some predefined objectives.Existing stu-dies show that the selection of objectives plays a decisive role in the search process.Although many quality objectives and non-quality objectives are proposed,there is no research to analyze that in the same evaluation framework which objectives can guide the software refactoring to achieve results that meet developers’ expectations.Meanwhile,whether the combination of quality objectives and non-quality objectives will have better results is not analyzed.To answer these questions,this paper designs a search-based multi-objective software refactoring evaluation framework.Under the framework,this paper investigates the impact of the combination of seven different objectives on software refactoring with six open-source software projects in different scales;the software quality before and after refactoring is evaluated using a variety of indicators,and the effectiveness of different objective combinations is also analyzed.Experimental results show that the combination of quality objectives and non-quality objectives can better improve the effectiveness of software refactoring than using one single type of objectives,and the combination of the ‘qua-lity’ objective and the ‘similarity with recorded code changes’ objective is important to the improvement of software refactoring effectiveness.

Web Application Page Element Recognition and Visual Script Generation Based on Machine Vision

LI Zi-dong, YAO Yi-fei, WANG Wei-wei, ZHAO Rui-lian

Computer Science. 2022, 49 (11): 65-75. doi:10.11896/jsjkx.220200122

Abstract

PDF(2624KB) ( 11926 )

References | Related Articles | Metrics

In order to provide richer interactive response effect,the visualization elements of the Web application is becoming more complex and diverse.The traditional test based on DOM cannot match the new requirement to test Web application.A new generation test based on computer vision provides an efficient way for the complex elements in Web application.This test for the Web based on computer vision mainly depends on template matching technique to recognize the page elements,so that it can ge-nerate visualization test script.However,the subtle changes of the page elements’ appearance can lead to the failure of template matching technique,so that the Web page elements cannot be recognized and the visualization test script cannot be generated.How to improve the accuracy of Web page element recognition based on machine vision and make it still applicable in complex conditions is a challenging task.Object detection based on deep learning is a research hotspot in the field of today’s computer vision and machine learning.It has been shown from the deep data characteristics gained through the large sample learning that it can recognize the target accurately and has stronger generalization ability.Therefore,this paper targets the Web application,starts from the visual characteristics of the page elements,and proposes a Web page elements recognition method based on deep lear-ning.This method uses YOLOv3 algorithmic structure model based on the target testing to automatically localize the position and boundary of an element,recognize the type of Web page elements as well as describe its function.On the base of the Web page elements recognition,it can automatically generate visualization test script for the Web application.To verify the accuracy of the page elements recognition based on computer vision,experiments are set to test between different versions of the same Web application,and between different Web applications to analyze its accuracy.The results show that the average recall rate of machine vision-based Web page element recognition model is 75.6%.It provides foundation for the Web application’s visualization test script.

Optimization Method of Streaming Storage Based on GCC Compiler

GAO Xiu-wu, HUANG Liang-ming, JIANG Jun

Computer Science. 2022, 49 (11): 76-82. doi:10.11896/jsjkx.211200252

Abstract

PDF(2713KB) ( 11749 )

References | Related Articles | Metrics

To solve the problem of cache pollution and mandatory loss caused by streaming memory access,some high-perfor-mance general-purpose processor platforms provide a dedicated path and supporting instructions for accessing memory directly without accessing the cache.The overall performance of chip memory system can be improved by using direct memory access in common application scenarios such as streaming storage.However,it is a tedious and error-prone task for programmers to determine when direct access to main memory is beneficial,and an effective way is to implement it automatically through the compiler.Therefore,based on the in-depth analysis of the benefits of different types of access operations under the streaming storage access mode,this paper proposes a streaming storage optimization method based on GCC compiler.In the SSA-GIMPLE stage of GCC compiler,the continuous write or step write with stream access characteristics in the program loop is recognized,and optimization objects are screened according to the benefit analysis and dependency relationship.Finally,the direct access main memory instructions are generated by matching instruction templates at the back end of compiler.The continuous/step-write case and STREAM test set and their variants are used for experimental evaluation on SW domestic processor platform.The results show that the optimized method can significantly reduce the execution time of STREAM storage applications,and the average acceleration ratio of STREAM test set after optimization is 1.31.Additionally,in conjunction with loop unwinding optimization,the STREAM test set has an average acceleration ratio of 1.45.

Patch Validation Approach Combining Doc2Vec and BERT Embedding Technologies

HUANG Ying, JIANG Shu-juan, JIANG Ting-ting

Computer Science. 2022, 49 (11): 83-89. doi:10.11896/jsjkx.210900207

Abstract

PDF(2492KB) ( 11160 )

References | Related Articles | Metrics

Automatic program repair is a research hotspot in recent years and has made some progress.Most of the existing automatic program repair methods use the test suite to validate patch correctness.However,using the test suite to validate a large number of candidate patches will not only bring huge costs,but also lead to the overfitting problem of patches.Therefore,how to improve the efficiency of patch validation and effectively validate patch correctness has become an urgent problem.In order to reduce the cost and improve the patch accuracy,this paper proposes an approach combining two embedding techniques to validate patch correctness.Firstly,this approach uses Doc2Vec model to calculate the similarity between the patch and the error code,then it uses the classifier based on BERT model to filter out the error patches from the patches screened by the similarity.To evaluate the effectiveness of this approach,experiments are carried out based on five open source Java benchmarks.Experimental results show that this approach can effectively validate patch correctness and improve the efficiency of patch validation.

Temporal RDF Modeling Based on Relational Database

HAN Xiao, ZHANG Zhe-qing, YAN Li

Computer Science. 2022, 49 (11): 90-97. doi:10.11896/jsjkx.211100065

Abstract

PDF(2509KB) ( 379 )

References | Related Articles | Metrics

With the increase of temporal data,the concept of temporal knowledge graph is popularized,and how to represent temporal knowledge graph efficiently has become an important research direction.Although resource description framework(RDF) is widely used in traditional knowledge graph modeling,it can only represent static semantics and lacks the ability to represent temporal knowledge graph.Therefore,several temporal RDF models have been proposed for temporal knowledge graph,but all these models simply attach temporal information to the predicate of RDF or the whole triple,and lack the accurate positioning of the object to which the temporal information belongs.In order to better represent temporal knowledge graph,firstly,this paper proposes a new temporal RDF representation model called tRDF,which attaches temporal information to the object or predicate according to the type of object.Secondly,by combining the concept of temporal database,this paper presents a tRDF data storage method based on the relational database,PostgreSQL.Finally,the proposed tRDF data storage method is verified from two aspects,the time of storing and the size of space.Experimental results show that the proposed scheme can effectively represent temporal knowledge graph.

Incremental Feature Selection Algorithm for Dynamic Partially Labeled Hybrid Data

YAN Zhen-chao, SHU Wen-hao, XIE Xin

Computer Science. 2022, 49 (11): 98-108. doi:10.11896/jsjkx.210900076

Abstract

PDF(3679KB) ( 358 )

References | Related Articles | Metrics

Many real-world data sets are hybrid data consisting of symbolic,numerical and missing features.For the decision labels of hybrid data,it costs much labor and it is expensive to acquire the decision labels of all data,thus the partially labeled data is generated.Meanwhile,the data in real-world applications change dynamically,i.e.,the feature set is added into and deleted from the feature sets dynamically with different requirements.In this paper,according to the characteristics of high-dimensional,partial labeled and dynamic for the hybrid data,the incremental feature selection algorithms are proposed.Firstly,the information granularity is used to analyze the feature significance for partially labeled hybrid data.Then,the incremental updating mechanisms for information granularity are proposed with the variation of a feature set.On this basis,the incremental feature selection algorithms are proposed for the partially labeled hybrid data.Finally,extensive experimental results on UCI data set demonstrate that the proposed algorithms are feasible and efficient.

Semantic Information Enhanced Network Embedding with Completely Imbalanced Labels

FU Kun, GUO Yun-peng, ZHUO Jia-ming, LI Jia-ning, LIU Qi

Computer Science. 2022, 49 (11): 109-116. doi:10.11896/jsjkx.210900101

Abstract

PDF(2604KB) ( 468 )

References | Related Articles | Metrics

The problem of data incompleteness has become an intractable problem for network representation learning(NRL) methods,which makes existing NRL algorithms fail to achieve the expected results.Despite numerous efforts have done to solve the issue,most of previous methods mainly focused on the lack of label information,and rarely consider data imbalance phenomenon,especially the completely imbalance problem that a certain class labels are completely missing.Learning algorithms to solve such problems are still explored,for example,some neighborhood feature aggregation process prefers to focus on network structure information,while disregarding relationships between attribute features and semantic features,of which utilization may enhance representation results.To address the above problems,a semantic information enhanced network embedding with completely imbalanced labels(SECT)method that combines attribute features and structural features is proposed in this paper.Firstly,SECT introduces attention mechanism in the supervised learning for obtaining the semantic information vector on precondition of considering the relationship between the attribute space and the semantic space.Secondly,a variational autoencoder is applied to extract structural features under an unsupervised mode to enhance the robustness of the algorithm.Finally,both semantic and structural information are integrated in the embedded space.Compared with two state-of-the-art algorithms,the node classification results on public data sets Cora and Citeseer indicate the network vector obtained by SECT algorithm outperforms others and increases by 0.86%~1.97% under Mirco-F1.As well as the node visualization results exhibit that compared with other algorithms,the vector distances among different-class clusters obtained by SECT are larger,the clusters of same class are more compact,and the class boundaries are more obvious.All these experimental results demonstrate the effectiveness of SECT,which mainly benefited from a better fusion of semantic information in the low-dimensional embedding space,thus extremely improves the performance of node classification tasks under completely imbalanced labels.

Modeling User Micro-Behavior via Adaptive Multi-Attention Network for Session-based Recommendation

QIAO Jing-jing, WANG Li

Computer Science. 2022, 49 (11): 117-125. doi:10.11896/jsjkx.210900061

Abstract

PDF(2161KB) ( 544 )

References | Related Articles | Metrics

Session-based recommendation (SR) aims to recommend the next item that matches the user’s preferences based on their short-term session information.It does not need user’s profile and long-term historical information and has a broad application prospect.Existing SR models usually focus on user click behavior or only use a single type of behavior data,ignoring the specific semantics of user click behavior,such as browsing,collecting,carting,purchasing,and so on.These behaviors with different semantics are called micro-behavior,which reflects the transformation of user intention and decision-making process in the shopping process from the micro level and will provide valuable information for improving the recommendation effect.Therefore, an adaptive multi-attention network (AMAN) based on user micro-behavior is proposed for session-based recommendation.First,we model the sequence of micro-behavior as heterogeneous directed graph,and then build three components for session-based recommendation.Directed graph attention network (DGAT) learns item representation from the item level and adaptively captures the associations between items with the same micro-operation.Operation-level heterogeneous graph attention network (OHGAT) learns item representation at the operation-level and adaptively captures the associations between items with different micro-ope-ration.Micro-behavior co-attention network (MBCAT) learns the representation of micro-behavior sequences and adaptively captures the dependencies between different micro-behavior sequences.Experimental results on Yoochoose,Taobao14 and Taobao15 datasets show that AMAN outperforms the state-of-the-art baselines.

Variational Domain Adaptation Driven Semantic Segmentation of Urban Scenes

JIN Yu-jie, CHU Xu, WANG Ya-sha, ZHAO Jun-feng

Computer Science. 2022, 49 (11): 126-133. doi:10.11896/jsjkx.220500193

Abstract

PDF(1978KB) ( 476 )

References | Related Articles | Metrics

Semantic segmentation of urban scenes aims to identify and segment persons,obstacles,roads,signs and other elements from the image,and provide information of free space on the road for vehicles.It is one of the key technologies of automatic dri-ving.High performance semantic segmentation systems rely heavily on a large number of real annotation data required for trai-ning.However,labeling each pixel in the image is costly and often difficult to achieve.One way is to collect photo-realistic synthe-tic data from video games,where pixel-level annotation can be automatically generated at a low cost,to train the machine learning model to segment the images in the real world,which corresponds to domain adaptation.Different from the current mainstream semantic segmentation domain adaptation methods based on Vapnik-Chervonenkis dimension theory or Rademacher complexity theory,our method is inspired by the target domain Gibbs risk upper bound compatible with pseudo labels based on PAC-Bayes theory,and considers the average situation of the hypothetical space rather than the worst situation,so as to avoid excessively constraining the domain discrepancy in the latent space which leads to the problem that the upper bound of target domain genera-lization error cannot be estimated and optimized effectively.Under the guidance of the above ideas,this paper proposes a varia-tional inference method for semantic segmentation adaptation(VISA).The dropout variational family is used for variational infe-rence.While solving the ideal posterior distribution in the hypothesis space,an approximate Bayes classifier can be quickly obtained,and the estimation of the upper bound of risk is more accurate by minimizing the entropy of the target domain and filtering pixels.Experiments show that the mean intersection over the union(mIoU) of VISA is 0.5% ~ 6.6% higher than that of baseline methods,and has high accuracy in pedestrian,vehicle and other urban scene elements.

Granularity-aware and Semantic Aggregation Based Image-Text Retrieval Network

MIAO Lan-xin, LEI Yu, ZENG Peng-peng, LI Xiao-yu, SONG Jing-kuan

Computer Science. 2022, 49 (11): 134-140. doi:10.11896/jsjkx.220600010

Abstract

PDF(2980KB) ( 487 )

References | Related Articles | Metrics

Image-text retrieval is a basic task in visual-language domain,which aims at mining the relationships between different modalities.However,most existing approaches rely heavily on associating specific regions of an image with each word in a sentence with similar semantics and underappreciate the significance of multi-granular information in images,resulting in irrelevant matches between the two modalities and semantically ambiguous embedding.Generally,an image contains object-level,action-le-vel,relationship-level or even scene-level information that is not explicitly labeled.Therefore,it is challenging to align complex visual information with ambiguous descriptions.To tackle this issue,this paper proposes a granularity aware and semantic aggregating(GASA) network to obtain multi-visual representations and narrow the cross-modal gap.Specifically,the granularity-aware feature selection module selects copious multi-granularity information of images and conducts a multi-scale fusion,guided by an adaptive gated fusion mechanism and a pyramid structure.The semantic aggregation module clusters the multi-granularity information from visual and textual clues in a shared space to obtain the residual representations.Experiments are conducted on two benchmark datasets,and the results show our model outperforms the state-of-the-arts by over 2% on R@1 of MSCOCO 1k.Besides,our model outperforms the state-of-the-art by 4.1% in terms of Flickr30k on R@Sum.

Edge Guided Self-correction Skin Detection

ZHENG Shun-yuan, HU Liang-xiao, LYU Xiao-qian, SUN Xin, ZHANG Sheng-ping

Computer Science. 2022, 49 (11): 141-147. doi:10.11896/jsjkx.220600012

Abstract

PDF(4107KB) ( 498 )

References | Related Articles | Metrics

Skin detection has been a widely studied computer vision topic for many years,whereas remains a challenging task.Previous methods celebrate their success in various ordinary scenarios but still suffer from fragmentary prediction and poor generalization.To address this issue,this paper proposes an edge guided network driven by a massive self-corrected skin detection dataset for robust skin detection.To be specific,a multi-task learning based network which conducts skin detection and edge detection jointly is proposed.The predicted edge map is further converged to the skin detection stream via an edge attention module.Meanwhile,to engage a large-scale of low-quality data from the human parsing task to strengthen the generalization of the network,a self-correction algorithm is adapted to prune the side effect of supervised by noisy labels with continuously polishing up those defects during the training process.Experimental results indicate that the proposed method outperforms the state-of-the-art in skin detection.

Handwritten Character Recognition Based on Decomposition Extreme Learning Machine

HE Yu-lin, LI Xu, JIN Yi, HUANG Zhe-xue

Computer Science. 2022, 49 (11): 148-155. doi:10.11896/jsjkx.211200265

Abstract

PDF(3522KB) ( 394 )

References | Related Articles | Metrics

Handwritten character recognition(HCR) is an important branch of image recognition,which recognizes the handwritten characters with the data mining and machine learning technologies.Currently,the HCR methods mainly focus on the improvements of different deep learning models,where the multiple-layer extreme learning machine(ML-ELM) has attracted the wide attention from the academia and industry due to its faster training speed and better recognition performance than deep belief net(DBN) and deep Boltzmann machine(DBM).However,the recognition performance of ML-ELM is severely influenced by the random weights when determining the input weights for each hidden-layer.This paper first proposes a decomposition ELM(DE-ELM) which is a shallow ELM training scheme based on the hidden-layer output matrix decomposition and then applies DE-ELM to deal with HCR problems,i.e.,handwritten digits in MNIST,handwritten digits and English letters in EMNIST,handwritten Japanese characters in KMNIST and K49-MNIST.In comparison with ML-ELM,DE-ELM reduces the randomness of ELM-based HCR model.Meanwhile,DE-ELM can obtain higher recognition accuracy than ML-ELM with the same training time and faster training speed than ML-ELM with the equal recognition accuracy.Experimental results demonstrate the feasibility and effectiveness of the proposed DE-ELM when dealing with HCR problems.

Temporal Relation Guided Knowledge Distillation for Continuous Sign Language Recognition

XIAO Zheng-ye, LIN Shi-quan, WAN Xiu-an, FANGYu-chun, NI Lan

Computer Science. 2022, 49 (11): 156-162. doi:10.11896/jsjkx.220600036

Abstract

PDF(2645KB) ( 361 )

References | Related Articles | Metrics

Previous researches in continuous sign language recognition mainly focus on the RGB modality and achieve remarkable performance on real-world and laboratory datasets,but they usually require high computation intensity.On the other hand,the skeleton is a modality with small input data and fast computation speed,but poor at the real-world datasets.This paper proposes a cross-modal knowledge distillation method named temporally related knowledge distillation(TRKD) to alleviate the contradiction between RGB and skeleton modality in performance and calculation speed.TRKD utilizes the RGB modality network as a teacher to guide the skeleton modality network for fast and accurate implementation.We notice that the teacher’s understanding of sign language context is worth learning by student.It proposes to employ the graph convolutional network(GCN) to learn and align the temporally related features of teacher networks and student networks to achieve this goal.Moreover,since the supervised information from the teacher network is not available for traditional loss functions due to the learnable parameters of GCN in the teacher network,we introduce contrastive learning to provide self-supervised information.Multiple ablation experiments on the Phoenix-2014 dataset demonstrate the effectiveness of the proposed method.

Handwritten Image Binarization Based on Background Estimation and Local Adaptive Integration

HE Huang-xing, CHEN Ai-guo, WANG Jiao-long

Computer Science. 2022, 49 (11): 163-169. doi:10.11896/jsjkx.210900225

Abstract

PDF(4157KB) ( 428 )

References | Related Articles | Metrics

In handwritten document images,there are often uneven lighting,ink stains,paper degradation,shadows and other complex conditions.In order to solve the problem that OCR effect of document image is not ideal after binarization in complex background,a binarization method of improved background estimation and local adaptive integration is proposed.In this method,the local adaptive binarization method is first used to obtain the binarization image with high recall rate,then the improved background estimation method is used to obtain the binarization image with high accuracy rate,and finally the two types of binarization images are integrated based on the connected domain method to obtain the final binarization image.Experimental results on DIBCO2013 and DIBCO2016 handwritten data sets show that the proposed method has better overall performance than Otsu,Wolf,Niblack,Sauvola,Singh and Howe.

Driver Distraction Detection Based on Multi-scale Feature Fusion Network

ZHANG Yu-xin, CHEN Yi-qiang

Computer Science. 2022, 49 (11): 170-178. doi:10.11896/jsjkx.211000040

Abstract

PDF(2813KB) ( 555 )

References | Related Articles | Metrics

The occurrence of road traffic accidents has increased year by year.Driver inattention during driving is one of the major causes of traffic accidents.In this paper,we utilize multi-source data to detect driver distraction.However,the correlations derived from multi-source data will generate feature of high-dimensional entanglement.Existing methods perform similar processing for data of different sources or simply stick to concatenate multi-source features,which are not easy to catch the key feature of high-dimensional entanglement.And distracted driving can be affected by many factors.Supervised methods might cause misclassification when the type of driver distraction does not exist in the set of the known categories.Therefore,we propose a multi-dcale feature fusion network approach to tackle these challenges.Basically,it first learns low-dimensional representations from multi-source data through multiple embedding subnetworks,and then proposes a multi-scale feature Fusion method to aggregate these representations from the perspective of spatial-temporal correlation,thereby reducing the entanglement of feature.Finally,we utilize a ConvLSTM encoder-decoder model to detect driver distraction.Experimental results on a public loaded drive dataset show that the proposed method outperforms the existing methods.

Traffic Sign Detection and Recognition Method Based on Optimized YOLO-V4

PAN Hui-ping, WANG Min-qin, ZHANG Fu-quan

Computer Science. 2022, 49 (11): 179-184. doi:10.11896/jsjkx.220300251

Abstract

PDF(3021KB) ( 694 )

References | Related Articles | Metrics

Traffic sign detection and recognition is the core function of automatic driving system.In order to identify traffic signs in real time and accurately,a method is improved on the basis of YOLO-V4 and combined with the spatial pyramid pool(SPP) module.Firstly,to increase the resolution and receptive field,the resolution of the three scales of the original feature map is changed to 26×26 and 52×52.Then,SPP module is added to the connection layer to eliminate the constraints of the network on the fixed scale,obtain the optimal characteristics in the maximum pooling layer and improve the network performance.Experiment uses the tachograph to collect various traffic sign images,compared with other excellent methods,the proposed method achieves better performance.The average detection and recognition accuracy of the proposed method is 99.0%,and the average detection time is 0.449 s,which meets the requirements of real-time detection.

Methods of Patent Knowledge Graph Construction

DENG Liang, CAO Cun-gen

Computer Science. 2022, 49 (11): 185-196. doi:10.11896/jsjkx.211100063

Abstract

PDF(3779KB) ( 843 )

References | Related Articles | Metrics

Patent knowledge graph plays a important role in patent accurate retrieval,patent in-depth analysis and patent know-ledge training.This paper proposes a practical patent knowledge graph construction method based on seed knowledge graph,text mining and relationship completion.In this method,to ensure the quality,a seed patent knowledge graph is first established ma-nually,then the concept and relation extraction method of patent text pattern is used to expand the seed patent knowledge graph,and finally the extended patent knowledge graph is quantitatively evaluated.In this paper,artificial extraction of seed knowledge and manual summarization of lexical and syntactic patterns are carried out for patents in the field of traditional Chinese medicine.After obtaining new lexical and syntactic patterns by machine learning,the knowledge graph of seed patent is expanded and completed.Experimental results show that the number of nodes and relationships in the knowledge graph of traditional Chinese medicine are 19 453 and 194 775 respectively.After expansion,they reach 558 461 and 7 275 958 respectively,representing an increase of 27.7 and 36.3 folds respectively.

Prediction Model of Enterprise Resilience Based on Bi-directional Long Short-term Memory Network

SONG Mei-qi, FU Xiang-ling, YAN Chen-wei, WU Wei-qiang, REN Yun

Computer Science. 2022, 49 (11): 197-205. doi:10.11896/jsjkx.210900195

Abstract

PDF(2809KB) ( 420 )

References | Related Articles | Metrics

Traditional risk management methods focus on identifying,predicting and assessing potential risks.However,when enterprises are exposed to uncertainty and unexpected risks,traditional methods cannot deal with those risks.Therefore,the academia gradually shifts the perspective of risk management from predicting and avoiding risks to improving the ability of enterprises to withstand and recover from risks,that is,the enterprise resilience.This paper proposes a prediction method to predict the enterprise resilience based on temporal features,which utilizes Bi-LSTM to encode the temporal features to obtain the feature representation of every enterprise,and the classification results of enterprise resilience are obtained by a softmax classifier.The proposed method is validated on the real-world datasets from listed companies in China,and the macro-F1 value reaches 89.0%,which is improved compared with those models without considering temporal features,such as RF,XGBoost and LightGBM.This paper further discusses the importance of various factors that have an influence on the enterprise resilience.In this paper,the machine learning methods are applied to the evaluation and prediction of enterprise resilience for the first time,which provides theoretical and methodological guidance for enterprises to deal with unexpected risks.

Relation Extraction Based on Multidimensional Semantic Mapping

CHENG Hua-ling, CHEN Yan-ping, YANG Wei-zhe, QIN Yong-bin, HUANG Rui-zhang

Computer Science. 2022, 49 (11): 206-211. doi:10.11896/jsjkx.210900120

Abstract

PDF(2053KB) ( 434 )

References | Related Articles | Metrics

Relation extraction aims to identify relation types between entities from texts.In the field of relation extraction,most of existing methods use deep learning methods,but they do not have in-depth discussion of word vectors in the input layer.To further exploit word vectors,this paper proposes a relation extraction method based on multi-dimensional semantic mapping.The core idea of the method is to reduce dimensionality of text feature matrix before the word vector enters the input layer.Experimental results show that the proposed method not only can reduce dimensionality effectively,but also can represent the semantic information of the same sentence in different dimensions,with its F1 of 75.3% and 88.9% on the Chinese Literature Text and SemEval-2010 Task8 datasets,respectively.

Universal Multi-class Ensemble Method with Self Adaptive Weights

WEI Jun-sheng, LIU Yan, CHEN Jing, DUAN Shun-ran

Computer Science. 2022, 49 (11): 212-220. doi:10.11896/jsjkx.210900054

Abstract

PDF(2968KB) ( 551 )

References | Related Articles | Metrics

Ensemble learning has always been one of the strategies to build a powerful and stable predictive model.It can improve the accuracy and stability of the results by fusing multiple models.However,existing ensemble methods still have certain shortcomings in the calculation of weights.When facing a variety of classification problems,they cannot adaptively select ensemble weights,and they are not universal.In view of the above problems,a universal multi-class ensemble method with self-adaptive weights(UMEAW) is proposed.Different to usual ensemble classification method that only targets one kind of classification task,when facing different classification problems,firstly,UMEAW calculates the weight allocation coefficient according to the number of classification,and then the weights of base classifiers is automatically calculated according to the model evaluation index and the weight allocation coefficient by using the distribution characteristics of exponential function.Finally,the weights is adjusted adaptively through continuous iteration to realize model ensemble under different classification tasks.Experimental results show that UMEAW can achieve model ensemble on 9 datasets with different classification numbers,different fields and different scales,and the effect of UMEAW is better than the baselines in most tasks.Compared with a single model,F₁ value increases by 3%~25% after UMEAW fusion.Compared with other ensemble methods,the F₁ value improves by 1%~2%.It is proved that UMEAW is universal and effective.

Incorporating Part of Speech and Tonal Features for Vietnamese Grammatical Error Detection

ZHANG Zhou, ZHU Jun-guo, YU Zheng-tao

Computer Science. 2022, 49 (11): 221-227. doi:10.11896/jsjkx.210900247

Abstract

PDF(2783KB) ( 609 )

References | Related Articles | Metrics

The BERT pre-trained language model removes the tones of the syllables when segmenting Vietnamese words,which leads to the loss of some semantic information during the training process of grammatical error detection model.To address this problem,an approach combining part of speech and tonal features is proposed to complete the semantic information of the input syllables.Grammatical error detection task confronts the problem of insufficient training data due to the scarcity of labeled Vietnamese data.To address this problem,a data augmentation algorithm is designed to generate a large number of error texts from the correct corpus.Experimental results on Vietnamese Wikipedia and news corpus show that the proposed method achieves the highest F_0.5 and F₁ score on the test set,which proves it improves the detection performance.Both the proposed method and the baseline model method have a gradual improvement with the increasing scales of the generated data,which proves that the proposed data augmentation algorithm is effective.

Detection Method of Rebar in Concrete Diameter Based on Improved Grey Wolf Optimizer-based SVR

LU Chun-yi, YU Jin, YU Zhong-dong, DING Shuang-song, ZHANG Zhan-long, QIU Ke-cheng

Computer Science. 2022, 49 (11): 228-233. doi:10.11896/jsjkx.210800039

Abstract

PDF(3079KB) ( 389 )

References | Related Articles | Metrics

The traditional reinforced concrete detection method uses linear fitting or standard value look-up table method,which can only roughly estimate the diameter of rebar.In view of the fact that there are few sample data of the diameter detection,and the detection result changes non-linearly due to the influences of the buried depth and the distance between adjacent rebars,a SVR detection method based on IGWO is proposed(IGWO-SVR).Firstly,the inverse learning strategy is used to optimize the initial population distribution,which improves the GWO global search ability.And he random differential mutation strategy is used to expand the search range,which can avoid the GWO algorithm from falling into the local optimum.Then,the IGWO algorithm is applied to the core parameter optimization of the SVR to improve the detection performance.Finally,the comparison and analysis of experimental results with the other three algorithm models show that the accuracy of the proposed method in the detection of rebar diameter has been effectively improved.

Survey of Resource Management Optimization of UAV Edge Computing

YUAN Xin-wang, XIE Zhi-dong, TAN Xin

Computer Science. 2022, 49 (11): 234-241. doi:10.11896/jsjkx.211100015

Abstract

PDF(2724KB) ( 800 )

References | Related Articles | Metrics

To meet the needs of intensive computing and low latency,mobile edge computing pushes the service resources of cloud computing to the edge,where is closer to the terminal.The ground network faces challenges in scenarios such as complex terrain and equipment failure.With the assistance of unmanned aerial vehicles,the flexibility and robustness of network deployment can be improved.Unmanned aerial vehicle has the advantages of low cost,convenient operation and flexible mobility.Due to the limitations of volume and weight,the power,communication and computing resources are often limited,the heterogeneity and dynamic characteristics gradually emerge in multi-unmanned aerial vehicle collaboration.Therefore,how to make efficient use of the resources become a research hotspot.From the perspective of overview,the problems and challenges faced in the promotion and application of UAV edge computing networks are combed,the current research status in power control,channel allocation,computing service resource management,and resource joint optimization are analyzed and summarized,the feasible optimization solutions of resource management are summarized and compared.Finally,the future development trend of resource management optimization is analyzed and prospected.

Review of Mobile Air-Ground Crowdsensing

CHENG Wen-hui, ZHANG Qian-yuan, CHENG Liang-hua, XIANG Chao-can, YANG Zhen-dong, SHEN Xin, ZHANG Nai-fan

Computer Science. 2022, 49 (11): 242-249. doi:10.11896/jsjkx.220400264

Abstract

PDF(2333KB) ( 753 )

References | Related Articles | Metrics

As an emerging sensing mode,mobile crowdsensing can realize low-cost and large-scale urban sensing by reusing a large number of existing mobile sensing resources of air and ground.Therefore,it is of great significance to improve the utilization of mobile sensing resources and promote the development of smart cities by jointly utilizing air-ground mobile sensing resources to realize air-ground cooperative mobile crowdsensing.To this end,this paper reviews the recent research on air-ground cooperative mobile crowdsensing.Firstly,it introduces the rising background and development status of air-ground cooperative mobile crowdsensing.Then it analyzes the existing research work on mobile crowdsensing from two dimensions of ground-based mobile devices and air-based mobile devices,and summarizes the current problems.Finally,three important future research directions for air-ground cooperative mobile crowdsensing in cross-platform user information learning,cross-air-ground mobile device scheduling,and cross-task sensing resource allocation are proposed to provide valuable reference for relevant researchers.

Survey of Multi-cloud Workflow Scheduling

YU Hao-wen, LIU Bo, ZHOU Na-qin, LIN Wei-wei, LIU Peng

Computer Science. 2022, 49 (11): 250-258. doi:10.11896/jsjkx.211200234

Abstract

PDF(3419KB) ( 431 )

References | Related Articles | Metrics

Traditional cloud providers provide services solely for users,which has problems of insufficient local cloud resources and high expansion costs.While the emerging multi-cloud combines services of cloud providers with different geographical locations providing users with more choices and has gradually become a research hotspot.At the same time,workflow scheduling is one of the key problems in multi-cloud research.Therefore,this paper firstly makes a thorough investigation and analysis on workflow scheduling technology in multi-cloud environment,then compares and classifies the workflow scheduling method inclu-ded.This paper focuses on single objective workflow scheduling optimization problem for cost and completion time,multi-objective optimization workflow scheduling problem for cost and completion time,for response time and cost,for reliability,cost and completion time,and for other multi-objective optimization.Finally,the future research directions of workflow scheduling in the multi-cloud environment are discussed,including uncertain workflow scheduling,joint scheduling optimization of energy consumption and other objectives,scheduling optimization with edge servers,and hybrid scheduling of virtual machine and Serverless platform.

WiPasLoc:A Novel Passive Indoor Human Localization Method Based on WiFi

WANG Dong-zi, GUO Zheng-xin, GUI Lin-qing, HUANG Hai-ping, XIAO Fu

Computer Science. 2022, 49 (11): 259-265. doi:10.11896/jsjkx.220500098

Abstract

PDF(2735KB) ( 771 )

References | Related Articles | Metrics

Passive indoor human localization is the basis for implementing ubiquitous wireless sensing systems.However,commercial WiFi signals are easily affected by the surrounding environment in our life,which makes it difficult for existing WiFi-based indoor localization works to accurately separate the dynamic human components from the complex received signals.To address this problem,this paper proposes the WiPasLoc,a passive indoor human localization system,which achieves high accuracy indoor localization by using the channel state information(CSI) extracted from commercial WiFi devices.Firstly,the Doppler frequency shift(DFS) estimation is carried out in combination with the signal quality of the CSI subcarriers.Then,the target person signal component is precisely separated from the channel state information by using a double-window-based angle of arrival(AoA) estimation method.Finally,combined with the initial position information of the personnel,accurate passive indoor human localization is achieved by the proposed trajectory fitting algorithm.Experimental results show that the median error of WiPasLoc for indoor personnel trajectory positioning is 80cm,which is 25.9% higher than the existing typical Widar2.0 positioning accuracy.

Workload Characteristics Based Performance Optimization for Edge Intelligence

HU Zhao-xia, HU Hai-zhou, JIANG Cong-fengand WAN Jian

Computer Science. 2022, 49 (11): 266-276. doi:10.11896/jsjkx.211000067

Abstract

PDF(4741KB) ( 493 )

References | Related Articles | Metrics

Edge intelligence refers to a form of service that uses artificial intelligence algorithms to provide data analysis capabilities for network edge devices.However,the edge computing environment is more complex and changeable than cloud computing.There are many problems in the process of building edge intelligence,such as the lack of quantitative evaluation standards,heterogeneous computing platforms,complex network topologies,and changing user needs.Among them,the more prominent is the contradiction between the high resource demand of the algorithm model and the low resource reserve of edge devices.Machine lear-ning is the main workload of edge intelligence.It requires a lot of computing resources.However,the computing resources of edge devices are limited,and the supply and demand between the two do not match.The deployment and optimization of edge intelligent load has become a problem.Therefore,in response to the problem of edge intelligent load performance optimization,this paper proposes cloud-edge collaborative inference(CECI) based on load characteristics,which is optimized for different machine learning loads in terms of model selection,batch adaptive adjustment and cloud-side collaboration.In terms of model selection,a model adaptive selection strategy based on target weights is used to comprehensively weigh the effects of multiple performance optimization targets under multiple constraints.In the aspect of batch adaptive adjustment,a batch adaptive adjustment algorithm based on overhead feedback is proposed,so that the model can achieve better performance at runtime.In terms of cloud-side collaboration,a cloud-side collaboration strategy is designed by combining network status and user delay requirements to achieve the effect of dynamic utilization of cloud computing resources.Experimental results show that compared with cloud intelligence,the edge intelligence based on load characteristics proposed in this paper can reduce program running time by 50.79%,reduce system energy consumption by 42.46%,and improve model accuracy by 4.52%.

Novel Predictive Approach to Trajectory-aware Online Edge Service Allocation in Edge Environment

LI Xiao-bo, CHEN Peng, SHUAI Bin, XIA Yun-ni, LI Jian-qi

Computer Science. 2022, 49 (11): 277-283. doi:10.11896/jsjkx.211100029

Abstract

PDF(3542KB) ( 390 )

References | Related Articles | Metrics

The rapid development of mobile communication technology promotes the emergence of mobile edge computing(MEC).As the key technology of the fifth generation(5G) wireless network,MEC can use the wireless access network to provide the services and cloud computing functions required by telecom users nearby,so as to create a service environment with high performance,low delay and high bandwidth and accelerate various contents,services and applications in the network.However,it remains a great challenge to provide an effective and performance guaranteed strategies for services offloading and migration in the MEC environment.To solve this problem,most existing solutions tend to consider task offloading as an offline decision making process by employing transient positions of users as model inputs.In this paper instead,we consider a predictive-trajectory-aware online MEC task offloading strategy called PreMig.The strategy first predicts the future trajectory of edge users to whom the edge service belongs by a polynomial sliding window model,then calculates the dwell time of users within the signal coverage of the edge server,and finally performs the edge service assignment with a greedy strategy.To verify the effectiveness of the designed approach,we conduct simulation experiments based on real-world MEC deployment dataset and campus mobile trajectory dataset,and experimental results clearly demonstrate that the proposed strategy outperforms the traditional strategy in two key performance metrics,namely,the average service rate and the number of user service migrations.

Construction Algorithm of Completely Independent Spanning Tree in Dragonfly Network

BIAN Qing-rong, CHENG Bao-lei, FAN Jian-xi, PAN Zhi-yong

Computer Science. 2022, 49 (11): 284-292. doi:10.11896/jsjkx.211000037

Abstract

PDF(4213KB) ( 376 )

References | Related Articles | Metrics

Dragonfly network,proposed by Kim et al.,is a topology for high-performance computer systems.In dragonfly network,compute nodes are attached to switches,the switches are organized into groups,and the network is organized as a two-level clique.There is a link between any two nodes in a group,and there is a global link between different groups.Completely independent spanning trees(CISTs) play important roles in reliable information transmission,parallel transmission and safe distribution of information.In practical application,with the continuous increase of network scale,the requirements for information transmission efficiency and security are becoming higher and higher.Therefore,it is meaningful to study the construction of completely independent spanning trees in dragonfly network.At present,there are many results on completely independent spanning trees in networks,but completely independent spanning trees in dragonfly network have not been studied.In this paper,construction algorithm for the global link of dragonfly network is proposed,which is divided into completely independent spanning tree under re-lative link,absolute link and circulant link.Based on this division,a construction algorithm for the edge set of completely independent spanning tree is given,and the correctness of the above algorithms is proved.Finally,the time complexity of these algorithms is analyzed.

Application Layer Protocol Recognition Based on Residual Network and Recurrent Neural Network

WU Ji-sheng, HONG Zheng, MA Tian-tian, LIN Pei-hong

Computer Science. 2022, 49 (11): 293-301. doi:10.11896/jsjkx.210800252

Abstract

PDF(3062KB) ( 339 )

References | Related Articles | Metrics

Existing protocol recognition methods cannot effectively extract the temporal and spatial characteristics of protocol data,which leads to low accuracy of protocol recognition.An application layer protocol recognition method based on one dimensional residual network and recurrent neural network is proposed.The proposed model consists of one dimensional preactivated residual network(PreResNet) and bidirectional gated recurrent neural network(BiGRU).The PreResNet is used to extract spatial characteristics of the protocol data,and the BiGRU is used to extract temporal characteristics of the protocol data.The attention mechanism is used to extract the key features related to protocol recognition to improve the accuracy of protocol recognition.The proposed method firstly extracts the application layer protocol data from network traffic,and the data is preprocessed and transformed into one dimensional vectors.Then the classification model is trained with the training data and a mature protocol recognition model is obtained.Finally,the trained classification model is used to identify the application layer protocols.Experimental results on public dataset ISCX2012 show that the proposed protocol recognition model has an overall accuracy of 96.87% and an average F value of 96.81%,which are higher than those of other protocol recognition models.

Shard Load Balancing Method Using State Reduction

CHEN Jing, LI Zhi-huai, GAO Dong-xue, LI Min

Computer Science. 2022, 49 (11): 302-308. doi:10.11896/jsjkx.210800109

Abstract

PDF(2372KB) ( 347 )

References | Related Articles | Metrics

Sharding technology is one of the core technologies to solve the scalability problem of blockchain.When transactions in the P2P network are aggregated into established shards according to the rules,and the verification nodes are randomly and evenly distributed to each shard,the transactions in this shard may be congested because the transaction verification load of individual shard may far exceed the average load.In order to solve the load imbalance between shards,a shard load balancing method using state reduction is proposed.Firstly,the state reduction model is given,which allows high-performance nodes to store more adjacent states,and the node performance is roughly classified according to this model.Secondly,according to the transaction verification situation in each timeslot,unverified transactions are taken as the remaining load,which is used as the basis for adjusting the verification ability in the next timeslot.Finally,the nodes are scored and graded,and the node selection strategy is given based on the remaining load and the average score of the consensus verification node set.Nodes are allocated reasonably and randomly,and the remaining loads of high load fragments are reduced upward.Experimental verification shows that the shard load balancing method using state reduction effectively balances the unusual load of one shard without reducing the transaction verification rate of a single shard.

Communication Efficient Asynchronous ADMM for General Form Consensus Optimization

WANG Dong-xia, LEI Yong-mei, ZHANG Ze-yu

Computer Science. 2022, 49 (11): 309-315. doi:10.11896/jsjkx.211200006

Abstract

PDF(2610KB) ( 604 )

References | Related Articles | Metrics

The distributed alternating direction method of multipliers(ADMM) is one of the most widely used methods for solving large-scale machine learning applications.However,most distributed ADMM algorithms are based on full model updates.With the increasing of system scale and data volume,the communication cost has become the bottleneck for the distributed ADMM when big data are involved.In order to reduce the communication cost in a distributed environment,a general form consensus asynchronous distributed alternating direction method of multipliers(GFC-ADADMM) is proposed in this paper.First,in the GFC-ADADMM,the associated model parameters rather than full model parameters are transmitted among nodes to reduce the transmission load,and the associated model parameters are filtered according to the characteristics of high-dimensional sparse data sets to further reduce the transmission load.Second,the GFC-ADMM is implemented by an asynchronous allreduce framework,which combines the advantage of the asynchronous communication protocol and the allreduce communication mode.Third,combining the advantages of the stale synchronous parallel(SSP) computing model,allreduce communication model,and hybrid programming model,the asynchronous allreduce framework and MPI/OpenMP hybrid programming model are adopted to implement the GFC-ADADMM,which improves calculation efficiency and communication efficiency of the algorithm.Finally,the sparse logistic regression problem is solved by the GFC-ADADMM.Evaluation with large-scale datasets shows that compared with state-of-the-art asynchronous distributed ADMM algorithms,the GFC-ADADMM can reduce the total running time by 15%-63%,and has higher accuracy in convergence.

Misinformation Correction Maximization Problem with Edge Addition in Social Networks

SONG Xin-yue, SHUAI Tian-ping, CHEN Bin

Computer Science. 2022, 49 (11): 316-325. doi:10.11896/jsjkx.211000043

Abstract

PDF(2388KB) ( 323 )

References | Related Articles | Metrics

The popularity of online social networks such as Wechat has aroused people’s more attention to information diffusion.The spread of misinformation in social networks may lead to serious consequences,such as economic losses and public panic.Therefore,relevant measures need to be taken to control the spread of misinformation.The classical misinformation containment problem aims to reduce the impact of misinformation by launching a set of nodes as real information seeds to compete against misinformation.This paper combines the spread of real information with the edge addition,proposes a misinformation correction maximization problem.The proposed problem is NP-hard and the computation of its objective function is #P-hard.Then,we use the sandwich approximation strategy to get an approximation solution since the objective function is neither submodular nor supermodular.We first find submodular lower and upper bound functions of the objective function,and then get corresponding approximation solutions under the cardinality constraint by the reverse influence sampling technique.At last,combined with lower and upper bound function’s solution,we obtain an approximation solution for misinformation correction maximization problem with a data-dependent approximation ratio.Experiments on three realistic data sets indicate that the proposed algorithm is efficient and performs better than classical heuristic methods.

Automatic Analysis Technology of Kernel Vulnerability Attack Based on Finite State Machine

LIU Pei-wen, SHU Hui, LYU Xiao-shao, ZHAO Yun-tian

Computer Science. 2022, 49 (11): 326-334. doi:10.11896/jsjkx.211200039

Abstract

PDF(3471KB) ( 401 )

References | Related Articles | Metrics

Kernel vulnerability attack is a common attack way for operating systems,and the analysis of each attack stage is the key to defend against such attacks.Due to the complexity and variety of kernel vulnerability types,trigger paths,and exploit modes,it is difficult to analyze the attack process of kernel vulnerability.Moreover,the existing analysis work mainly focuses on forward program analysis methods such as taint analysis,and the efficiency is low.In order to improve the analysis efficiency,this thesis implements an automatic analysis technology of kernel vulnerability attack based on finite state machine.Firstly,the state transition diagram of kernel vulnerability attack is constructed as the key basis for analysis.Secondly,the idea of reverse analysis is introduced,and a reverse analysis model of kernel vulnerability attack process based on finite state machine is established,which can reduce the unnecessary analysis cost.Finally,based on the model,a reverse analysis method of kernel vulnerability attack is implemented,which can automatically and quickly analyze the kernel vulnerability attack process.By testing 10 attack samples,the results show that the reverse analysis method can accurately obtain the key code execution information,and compared with the traditional forward analysis method,the analysis efficiency is greatly improved.

Privacy-preserving Scheme of Energy Trading Data Based on Consortium Blockchain

SHI Kun, ZHOU Yong, ZHANG Qi-liang, JIANG Shun-rong

Computer Science. 2022, 49 (11): 335-344. doi:10.11896/jsjkx.220300138

Abstract

PDF(4770KB) ( 489 )

References | Related Articles | Metrics

Blockchain technology could effectively solve the problems of lack of trust,malicious tampering and false transactions.However,the open and transparent characteristics of the blockchain make the distributed energy trading model based on the blockchain extremely vulnerable to be attacked,leading to the disclosure of user’s privacy.Therefore,a privacy-preserving scheme BLDP-AM based on differential privacy algorithm and account mapping technology is proposed to protect the privacy information of trading data.Our scheme redesigns the data perturbation mechanism of the local differential privacy algorithm to make it applicable to blockchain technology,and constructs the BLDP algorithm based on this perturbation mechanism to protect the privacy of transaction data.At the same time,in order to ensure the correctness of trading and hide the characteristics of the trading curve,our scheme first associates users with multiple accounts through account mapping technology,then uses the exponential smoo-thing prediction algorithm to calculate the trading prediction value of each account,and finally uses the BLDP algorithm to perturb the trading prediction value to obtain the real trading value and conduct trading.Our scheme not only guarantee the correctness of transactions but also achieve the purpose of protecting the privacy of trading data.The privacy analysis proves the feasibility of the scheme in protecting user privacy,and the experimental analysis shows that the scheme has better performance.

Secure Multi-party Computing Protocol Based on Efficient Fully Homomorphic Encryption

ZHU Zong-wu, HUANG Ru-wei

Computer Science. 2022, 49 (11): 345-350. doi:10.11896/jsjkx.210900047

Abstract

PDF(1715KB) ( 592 )

References | Related Articles | Metrics

In view of the problem of large ciphertext size and low efficiency of the current secure multi-party computation protocol based on fully homomorphic encryption,this paper proves that the fully homomorphic encryption scheme that supports multi-bit encryption proposed by Chen et al. satisfies the key homomorphism.Based on this scheme and threshold decryption,an efficient and secure multi-party computation protocol with three rounds of interaction under the common random string(CRS) model is designed.The protocol can be concluded from the non-interactive zero knowledge proof that the protocol is safe under the malicious model,and its security can be boiled down to the variants of the learning with errors problem(LWE).Compared with the existing protocol of the CRS model,the protocol supports multi-bit encryption,which can effectively reduce the complexity of the NAND gate.At the same time,the size of the ciphertext is smaller,the amount of calculation is reduced,and the time and space efficiency are improved.

Differential Privacy Based Fingerprinting Obfuscation Mechanism Towards NetworkReconnaissance Deception

HE Yuan, XING Chang-you, ZHANG Guo-min, SONG Li-hua, YU Hang

Computer Science. 2022, 49 (11): 351-359. doi:10.11896/jsjkx.220400285

Abstract

PDF(3013KB) ( 344 )

References | Related Articles | Metrics

Network fingerprinting detection is an important network reconnaissance method,which can be used by attackers to obtain the fingerprinting characteristics of the target network,and then provide support for subsequent targeted attacks.Fingerprinting obfuscation technology enables attackers to form fake fingerprinting views by actively modifying the fingerprinting features in response packets.However,existing obfuscation methods are still insufficient in dealing with attackers’ strategic detection and analysis.To this end,a differential privacy based fingerprinting obfuscation mechanism(DPOF) towards network reconnaissance deception is proposed.Taking the idea of data privacy protection as a reference,DPOF first establishes a utility-driven differential privacy fingerprinting obfuscation model,and calculates the obfuscation probability of fake fingerprints with different utilities through the differential privacy exponential mechanism.On this basis,a fingerprinting obfuscation decision method under resource constraint is further designed,and an obfuscation strategy solving algorithm based on particle swarm optimization is implemented.Simulation results show that compared with the existing typical fingerprinting obfuscation methods,DPOF has better fingerprinting obfuscation effect with different problem scales and budgets,and can obtain a better approximate optimal strategy at a faster speed.

Improvement of PBFT Algorithm Based on Consortium Blockchain

XIE Zhuo, ZHANG Zhi-hong, LI Lei, FENG Ying-jie, CHEN Jing

Computer Science. 2022, 49 (11): 360-367. doi:10.11896/jsjkx.210900178

Abstract

PDF(2620KB) ( 489 )

References | Related Articles | Metrics

As a new technology,blockchain has attracted the attention of all walks of life since its birth.Consensus algorithm is one of the core technologies of blockchain technology,and the research of consensus algorithms is also the top priority of blockchain development.Aiming at the problems of PBFT,which is widely used in consortium blockchain,such as the random selection of primary node and the inability of nodes to join and exit dynamically,an dynamic PBFT algorithm(DPBFT) is proposed.Firstly,the selection method of primary node of PBFT is improved,and the trust score is set for every node.The trust score is dynamically updated according to the behavior of the nodes in every round of consensus,and the primary node is selected according to the integral value,which improves the probability that an honest node is elected as the primary node.Secondly,four sub-protocols(JOIN,EXIT,PCLEAR,RCLEAR) are set for PBFT algorithm to solve the problem of joining and exiting nodes respectively and punish the offending nodes,so that the system has a dynamic network structure.It can be proved that the newly added four sub-protocols have good safety and liveness without affecting the safety and liveness of the original PBFT algorithm.At last,experimental results show that DPBFT algorithm has better consensus efficiency than traditional PBFT algorithm.