Computer Science

Error Log Analysis and System Optimization for Lustre Cluster Storage

CHENG Wen, LI Yan, ZENG Ling-fang, WANG Fang, TANG Shi-cheng, YANG Li-ping, FENG Dan, ZENG Wen-jun

Computer Science. 2022, 49 (10): 1-9. doi:10.11896/jsjkx.220100134

Abstract

PDF(2684KB) ( 2910 )

References | Related Articles | Metrics

Cluster storage system error messages can help to optimize the availability and reliability of storage system.Previous research of storage system error analysis focuses on the local file system or a part of the cluster storage system.There is a lack of research on storage system error messages for a long-time and multi-dimension in practical applications.With the continuous integration of new functional modules,the cluster storage system is becoming more and more complex,and the errors caused by cluster storage system emerge endlessly,which brings troubles and challenges to the researcher and developer.To address the pro-blems,we conduct a comprehensive study of the Lustre system error log.By collecting the error log in 1 673 consecutive days,we study nearly 2.26 GB of Lustre error logs,analyze the characteristics and problems of the Lustre system errors in multiple Lustre versions.We show that correlated errors between different subsystems and study the possible impacting factors on different Lustre versions.We also summarize the common errors in the Lustre system and show the corresponding solutions.We derive nume-rous new insights into the Lustre system development process and report 14 findings.Finally,we collect new error logs for 333 consecutive days to verify the 14 findings and give some cases about error optimization.Experimental results show that the error optimization cases can significantly reduce the number of errors and improve the availability and stability of the system.Our results and suggestions should be useful for both the development of the cluster storage system themselves as well as the Lustre operation and maintenance.

Study on Implementation and Optimization of ARM-based Image Geometric Transformation Library

WANG Lu-han, JIA Hai-peng, ZHANG Yun-quan, ZHANG Guang-ting

Computer Science. 2022, 49 (10): 10-17. doi:10.11896/jsjkx.220100128

Abstract

PDF(4816KB) ( 2893 )

References | Related Articles | Metrics

Intel integrated performance primitives is a high-performance multimedia acceleration library for signal and image processing.However,as of now,there is no high-performance IPP library based on the ARM architecture.This paper implements a high-performance algorithm library PerfIPP based on the ARM computing platform for basic image geometric transformation algorithms such as mirror,remap,and affine/perspective transformation.The PerfIPP,optimized through SIMD assembly,memory alignment,data pre-calculation,high-performance matrix optimization techniques,has significantly improved the performance of the above algorithms.At the same time,This paper summarizes the key technologies for the realization and optimization of image geometric transformation algorithms on the ARM computing platform by comparing the performance differences brought about by different instruction combinations,different instruction arrangements,and different access and storage methods.Experimental results show that,on the Huawei Kunpeng 920 platform,thePerfIPP proposed in this paper can achieve 108.08%~435.5% performance improvement in image transformation compared with the open source computer vision library while meeting accuracy.It also achieves 83.79% of the average performance of Intel IPP library on Intel Xeon E5-2640 processor.

Prediction of Optimal Loop Tiling Size for stencil Computation Based on Neural Network Model

BAO Yi-kun, ZHANG Peng, XU Xiao-wen, MO Ze-yao

Computer Science. 2022, 49 (10): 18-26. doi:10.11896/jsjkx.220100147

Abstract

PDF(3004KB) ( 2843 )

References | Related Articles | Metrics

Stencil computation is one kind of the most important loop kernels in scientific and engineering computing applications.Loop tiling can effectively improve the data locality of stencil computation and the degree of computational parallelism,but the best tile size is hard to choose.Traditional tile size selection methods usually have shortcomings in some ways of time overhead,labor cost and model accuracy.In this paper,a tile size selection method based on artificial neural network is proposed to predict the optimal tile size of three-dimensional Jacobi stencil loop programs.Experimental results show that,for 11 real stencil programs,the performance improvement of the programs using the model prediction tile size compared with the non tiling is 2% and 35% in serial and parallel tests respectively.Compared with the well-known grid search method,our method has a similar prediction accuracy,but only takes one 30 thousandth of the online time cost.In addition,compared with the Turbo-tiling method,our method improves the performance of tiled codes nearly 9% in average.

Design and Implementation of Multithreaded Reproducible DGEMV for Phytium Processor

CHEN Lei, TANG Tao, QI Hai-jun, JIANG Hao, HE Kang

Computer Science. 2022, 49 (10): 27-35. doi:10.11896/jsjkx.220100125

Abstract

PDF(2069KB) ( 2717 )

References | Related Articles | Metrics

In high-performance computing,the accumulation of rounding error in the process of solving the large-scale,long time and ill-conditioned problem will lead to invalidated results.These results are useful for the developers to debug programs and check their correctness.Therefore,the reproducibility of the numerical results of the algorithm becomes very important.Based on the OpenBLAS’s framework,combining with Demmel’s reproducible method in ReproBLAS and multilayer block technology proposed by Castaldo,this paper designs a reproducible algorithm of multithreaded DGEMV for Phytium processor with rounding error analysis and error free transformation.Numerical experiments show that the output of the algorithm is the same as that of the ReproBLAS,which verifies the reproducibility.Our algorithm is up to 2x faster than that in ReproBLAS.Compared with the DGEMV function of OzBLAS proposed by Mukunoki,our algorithm runs at least 20x faster than that in OzBLAS with single thread,and 9x faster than that in OzBLAS with multi-threads.Theoretical analysis and numerical experiments illustrate that improved algorithm is accurate,validated and efficiency.

“AI+HPC”-based Time Prediction for the First Principle Calculations and Its Applications in Biomed Community

LI Zhi-ying, MA Shuo, ZHOU Chao, MA Ying-jin, LIU Qian, JIN Zhong

Computer Science. 2022, 49 (10): 36-43. doi:10.11896/jsjkx.220100129

Abstract

PDF(3636KB) ( 2568 )

References | Related Articles | Metrics

In the commonly used first-principles methods,density functional theory(DFT) has the characteristics of low scale and high accuracy,so it has been more and more widely used in the fields of chemistry,biology,medicine and so on.However,in practical applications,its relatively high computational cost has posed new challenges to the decision-making on calculation parameters for users and the assignment of tasks for the computing centers.We have recently developed a time prediction system for DFT calculations based on machine learning technique,which can predict the actual computational cost before calculations.The mean relative errors are normally less than 0.15,so that it meets the prediction accuracy requirements in actual scenarios.In this work,we further promote and improve the prediction system,providing multi-GPU parallel computing functions and modular additions to the machine learning models;combined it with the biomed community to realize real-time display of the computing tasks submitted to the platform,which will be convenient for users to coordinate;an intelligent load balancing module is developed,which can improve the efficiency of first-principles calculations for the super-large molecules and cluster systems.These efforts improve the practicalities of the forecasting system,and the preliminary applications are reported in both the community platform and parallel computing.

Matrix Multiplication Vector Code Generation Based on Polyhedron Model

WANG Bo-yang, PANG Jian-min, XU Jin-long, ZHAO Jie, TAO Xiao-han, ZHU Yu

Computer Science. 2022, 49 (10): 44-51. doi:10.11896/jsjkx.210800247

Abstract

PDF(3412KB) ( 2742 )

References | Related Articles | Metrics

Matrix multiplication is the core of many scientific calculations,and vectorized programming is one of the main means to improve its performance.In view of the existing vectorization optimization problems that often require manual tuning and need to be mapped to the hardware structure,based on the polyhedron compiler PPCG,a vector code generation framework is introduced into the polyhedron model,and a matrix multiplication vector code generation framework based on the polyhedron model is proposed.Through the profit analysis of the matrix multiplication vectorization program,the vectorization program is determined,and the code generation of the application framework is guided.Based on this framework,it is conducive to the rapid optimization of vectorization of matrix multiplication.Selecting 13 matrix multiplication cases with a scale between 64×64×64 and 1 024×1 024×1 024 for experiments.The results show that the framework can generate vectorized code correctly.Compared with the automatic vectorization of the basic compiler ICC,the vectorized code generated by the framework has a speedup of 5.09 times and an average speedup of 3.39 times.

Distributed Lock with Inter-core Passing for SW26010 Processor

LI Ming-liang, PANG Jian-min, YUE Feng

Computer Science. 2022, 49 (10): 52-58. doi:10.11896/jsjkx.210800091

Abstract

PDF(2412KB) ( 2645 )

References | Related Articles | Metrics

In parallel programs,a mutual exclusive lock is often used to avoid conflict when accessing shared resources.The SW26010 processor,which is deployed on the Sunway TaihuLight supercomputer,is a heterogeneous many-core processor and there is no hardware lock mechanism for the co-processing cores.Developers have developed a software lock mechanism based on atomic instructions,but the software lock will lead to significant overhead and affect the performance of parallel programs.To solve this issue,the HDT-LOCK designed as distributed lock mechanism with inter-core passing is proposed.Firstly,the hybrid distributed lock is proposed and implemented based on scratchpad memory on co-processing cores to mitigate memory congestion.Furthermore,the inter-core passing mechanism using register communication and the single-instruction multiple-data instruction is developed to improve the throughput of HDT-LOCK.Experimental results show that the proposed HDT-LOCK mechanism mitigates memory congestion,and has better scalability.In addition,the lock passing mechanism improves HDT-LOCK throughput up to 5.6X.

CPU Power Model for ARM Architecture Cloud Servers

JIN Yu-yan, YU Tian-hao, WANG Song-bo, LIN Wei-wei, PAN Yu-cong

Computer Science. 2022, 49 (10): 59-65. doi:10.11896/jsjkx.210800103

Abstract

PDF(2200KB) ( 2594 )

References | Related Articles | Metrics

The power model of cloud server is one of the important contents of the research on the energy consumption optimization of cloud data center.The CPU power model is an important part of the power models of cloud servers.However,the existing CPU power models do not consider the CPU heterogeneity,such as lack of research on the CPU power model of ARM architecture cloud servers.Based on the investigation and analysis of existing ARM architecture CPU power models,this paper proposes a new CPU power model oriented to the ARM architecture,namely the hybrid based model(HBM).HBM comprehensively considers modeling features such as CPU utilization and CPU performance events.Compared with existing PMC based model with high measurement accuracy,HBM has similar measurement accuracy and lower model training cost.Thus,HBM is more suitable for CPU power modeling of ARM servers.This paper uses the Sysbench benchmark to verify HBM,and experimental results show that the mean relative error(MRE) of HBM is within 1%,which means HBM has high measurement accuracy.Cross-experiments are also conducted for x86 and ARM architecture servers.,and experimental results show that the CPU power beha-viors of servers with different architectures are not the same,thus different CPU power modeling methods should be used.

Parallel Optimization of Computational Fluid Dynamics Application Palabos Based on NextGeneration Sunway Supercomputer

LIU An-jun, YIN Hong-hui, WANG Li, LIU Zhi-xiang, KONG Bo, GUO Meng, CHEN Cheng-min, YANG Mei-hong

Computer Science. 2022, 49 (10): 66-73. doi:10.11896/jsjkx.220100089

Abstract

PDF(2735KB) ( 2769 )

References | Related Articles | Metrics

Parallel lattice Boltzmann(Palabos)software is a widely used computational fluid dynamics software based on lattice Boltzmann method(LBM),which is widely used in the field of porous media,free interface,particle motion,blood flow and so on due to its excellent computing power.Palabos has a wide range of user needs,which makes it urgent to transplant,optimize and accelerate parallel on Sunway supercomputer to serve the energy and chemical industry.In this paper,the heterogeneous parallel design of Palabos software is carried out on the new generation Sunway supercomputer system(SW26010pro).The data structure and template programming of Palabos are not suitable for the heterogeneous parallel of Sunway supercomputer system.So we design the parallel optimization techniques called direct getting address,polymorphic tag processing and data slicing to deal with the Palabos data structure and template programming.Combined with the characteristics of the new generation of Sunway supercomputer system,the optimization technology of shared memory and register memory access(RMA) is also adopted.The acceleration efficiency of 64 computing processing elements(CPEs) is 2~6 speed up.The Palabos software is realized the parallel computing of one million core scale of two-phase flow algorithm in the field of complex multi-scale chemical process in the new generation Sunway supercomputer system.The one million cores parallel efficiency is more than 40% compared with 64 000 cores.

Implementation of FPGA-based High-performance and Scalable SM4-GCM Algorithm

ZHAI Jia-qi, LI Bin, ZHOU Qing-lei, CHEN Xiao-jie

Computer Science. 2022, 49 (10): 74-82. doi:10.11896/jsjkx.210900137

Abstract

PDF(3407KB) ( 2917 )

References | Related Articles | Metrics

In the context of vigorous development of big data and 5G technology,information encryption in high-speed communication systems has become a new research hotspot.How to increase data throughput and reduce the difficulty of adapting encryption algorithms to different application scenarios while ensuring high data security has become important research topics.Aiming at the problem that traditional software’s SM4-GCM algorithm has a low throughput rate and is difficult to apply in changing 5G and big data scenarios,this paper analyzes the characteristics of SM4-GCM algorithm based on the reconfigurable characteristics of FPGA,using Mastrovito,Karatsuba and fast remainder algorithms.Two high-performance,CNC-separated and expandable circuit structures are designed.Full-pipeline technology and four-degree parallel technology are used to accelerate the optimization of SM4-GCM algorithm.While ensuring high security,it can achieve a high throughput rate,and can be flexibly transplanted to various application scenarios.Experimental results show that the throughput rates of the proposed two solutions in this paper for a single SM4-GCM module have reach 28.16 Gbps and 28.8 Gbps,respectively,which are superior to similar published designs in terms of performance and scalability.

Research Advances in Knowledge Tracing

CHEN Zhi-yu, SHAN Zhi-long

Computer Science. 2022, 49 (10): 83-95. doi:10.11896/jsjkx.211000119

Abstract

PDF(2019KB) ( 416 )

References | Related Articles | Metrics

Educational data mining is an interdisciplinary subject of computer science,statistics and pedagogy,and it mainly deals with the problems of educational research and teaching practice through the theory and technology of computer science and statistics.For example,it can reduce the learning cost of students and the educational cost of teachers as much as possible under the condition of obtaining the maximum learning gain.The rapid development of computer-assisted education environments and online education platforms has generated a wealth of data,which has also posed a major challenge,of course,but it cannot provide resources for students’ specific needs.Knowledge tracing is an individual method for recommending teaching resources and diagnosing learning paths in the field of intelligent tutoring education.With the time going on,students’ knowledge states can be mo-deled to predict their future performance based on their historical response sequences.This paper focuses on the analysis of relevant literature from two aspects:knowledge tracing model on training process with interpretability,prediction results with high precision,and then introduces the public datasets,evaluation metrics and applications in this field.Finally,the challenges of knowledge tracing are prospected.

Edge Bundling Method Based on Homologous Control Points

LIU Meng-xin, ZHANG Fan, LI Tian-rui

Computer Science. 2022, 49 (10): 96-102. doi:10.11896/jsjkx.220300066

Abstract

PDF(2156KB) ( 397 )

References | Related Articles | Metrics

Edge bundling is an effective method to reduce the visual clutter caused by the visualization of the node-link diagram with a large number of complex connections.Generally,the edge bundling based on spatial proximity will lead to independent edge ambiguity and give users a wrong perception.However,focusing only on topological structure of graphs cannot reduce visual clutter caused by dense connections to a large extent.The method based on edge path can control and bundle the edges by using the original nodes in the graph,avoid independent edge ambiguity,and show the advanced mode of data.Therefore,an edge bundling method based on homologous control points is proposed to improve the edge path method.Based on the topology structure information of the graph,the method can calculate homologous control points and select edge control points by using the shortest path algorithm.Then the degree of edge aggregation is optimized with the thinking of gradation.Finally,the edges are smoothed through Bezier curves and colored according to the direction of the edges.The edge bundling method based on homologous control points is used in the US migration dataset and the Chinese railway line dataset.Experimental results show that this method has a good effect on improving the problem of over-bundling.Compared with the original method,this method retains more local data details,balances the bundling degree between the whole and local edges,and can be effectively used for the visualization of complex connected graphs.

Local Random Walk Based Label Propagation Algorithm

LIU Yang, ZHENG Wen-ping, ZHANG Chuan, WANG Wen-jian

Computer Science. 2022, 49 (10): 103-110. doi:10.11896/jsjkx.220400145

Abstract

PDF(4471KB) ( 482 )

References | Related Articles | Metrics

Community structure is one of the important characteristics of complex networks.Identifying communities of different functions in a network plays an important role for revealing important characteristics of complex networks.The community discovery algorithm based on label propagation uses the community label of direct neighbors of a node to updates its label,which might obtain inaccurate community structures.Furthermore,the results of multiple runs of the algorithm might be unstable.To solve this problem,a local random walk based label propagation algorithm(LRW-LPA) is proposed.First,the local importance of each node in its k-step neighborhood is calculated.Then,the node with the lowest local importance is selected as the starting node to perform the local random walk process.When walking out of the specified neighborhood,the random walker will return to the starting node and start the random walk again.Finally,the algorithm selects the label with the most occurrences in the local neighborhood to update the label of the starting node,and selects the label of the node with highest importance when there are multiple labels with the most occurrences.Due to LRW-LPA can determine an appropriate neighborhood of a node by adopting the local random walk process with restart,the stability of the algorithm improves greatly.Compared with LPA,BGLL,Infomap,Leiden,Walktrap and other classical algorithms on 12 real networks and 12 synthetic networks,it shows that the proposed LRW-LPA algorithm performs well in terms of normal mutual index(NMI),adjusted rand index(ARI) and modularity(Q).

Study on Data Filling Based on Global-attributes Attention Neural Process Model

CHEN Kai, LIU Man, WANG Zhi-teng, MAO Shao-chen, SHEN Qiu-hui, ZHANG Hong-jun

Computer Science. 2022, 49 (10): 111-117. doi:10.11896/jsjkx.210800038

Abstract

PDF(3887KB) ( 357 )

References | Related Articles | Metrics

The attention neural process(ANP) model which adopts the method of generative model,takes any number context points of the sample as input,and outputs the distribution function of the entire sample,so as to approximate the function of Gaussian process regression(GPR) to realize the data fullfilling task.In reality,many scenes or datasets containe the attributes or labels data which are critical for generating the missing data.However,the ANP ignores full use of them.Inspired by CVAE model which control sample generation with lable as condition,this paper proposes global attribute attentional neural process(GANP),which embeds sample attributes or labels into ANP network to make the model generate samples more accurately,especially when the number of input context points are scarce.In detail,the sample attributes are embedded into the encoder network,so that the latent variables contain the sample attribute information.At the same time,the sample attributes are added as features in the decoder network to help generate more accurate samples.Finally,experimental results prove the superiority of GANP in both qualitative and quantitative,and it also reveals that GANP expands the application of NP families which can solve the Gaus-sian process regression problem more flexibly,quickly and accurately.

Adaptive Grouping Fusion Improved Arithmetic Optimization Algorithm and Its Application

LIU Cheng-han, HE Qing

Computer Science. 2022, 49 (10): 118-125. doi:10.11896/jsjkx.210800008

Abstract

PDF(2549KB) ( 416 )

References | Related Articles | Metrics

The arithmetic optimization algorithm(AOA) has slow convergence speed and low convergence accuracy,and is easy to fall into local extremum.In order to solve these problems,an adaptive grouping fusion improved arithmetic optimization algorithm(AG-AOA) is proposed.Firstly,Halton sequence is used to initialize individual positions to improve the diversity of algorithm at the initial iteration stage.Then,an adaptive grouping strategy is introduced to group the population,and the adaptive individuals are divided into dominant group,equilibrium group and inferior group according to the fitness value.Finally,the teaching and learning optimization strategy,elite reverse learning strategy and oscillating disturbance operator are used to update the position of each group of individuals to improve the searching ability of AOA and reduce the influence of local extreme points on the algorithm.The performance of AG-AOA is validated using test suites containing problems of wide varieties of complexities.Various analyses are conducted,including benchmark function,Wilcoxon ranksum test for statistical significance and part of CEC2014 test function.Finally,AG-AOA is applied to two practical engineering optimization problems,the obtained results are then analysed and compared and with other metaheuristics algorithms to show the superiority of the proposed AG-AOA.

Recommendation Method Based on Attention Mechanism Interactive Convolutional Neural Network

REN Sheng-lan, GUO Hui-juan, HUANG Wen-hao, TANG Zhi-hong, Qi Hui

Computer Science. 2022, 49 (10): 126-131. doi:10.11896/jsjkx.220700064

Abstract

PDF(2258KB) ( 454 )

References | Related Articles | Metrics

In order to capture the dynamic interaction between users and items during online shopping and improve the accuracy of recommendation systems(RS),a user rating prediction method combining user preference and item attractiveness is proposed.The reviews are divided into user review texts and product review texts,which are fed into two convolutional neural networks(CNN),and combined with an attention mechanism to dynamically capture semantic and contextual information in the texts,and obtain user and item adaptive representations.Subsequently,using the interactive attention network,the dynamic interaction between the item features and the user features is analyzed to calculate the user preference on specific items and the attractiveness of the items to a specific user.Finally,the prediction module is used to provide accurate predictions about user ratings to unseen items.Results on experimental datasets show that the proposed method achieves optimal performance,with at least 15.1% and 13.6% improvement in MAE and RMSE performance compared to other advanced methods.In addition,the statistical metrics based on Top-K further validate the accuracy of the proposed method for product recommendation.

Prediction of Insulation Deterioration Degree of Cable Joints Based on Temperature and Operation Data

XU Si-qin, HUANG Xiang-qian, YANG Kun, ZHANG Zhan-long, GAN Peng-fei

Computer Science. 2022, 49 (10): 132-137. doi:10.11896/jsjkx.210900139

Abstract

PDF(3635KB) ( 339 )

References | Related Articles | Metrics

The deterioration of cable joints will lead to the increase of heat loss,and then lead to the rise of surface temperature of the joints.At the same time,the surface temperature is affected by many factors such as operating load,environmental temperature.In general,the relationship between deterioration degree and temperature data shows a non-linear distribution.For this reason,a prediction method based on improved sparrow search algorithm(ISSA) optimization for kernel extreme learning machine(KELM) is proposed to predict the insulation deterioration degree of cable joints.Firstly,based on the experimental validation of the multi-physical coupling model of cable joints,the surface temperature distribution data of cable joints at different deterioration levels,loads and ambient temperatures are obtained for building the training set,validation set and test set.Secondly,the sparrow search algorithm is optimized based on the idea of flight behavior in the bird swarm algorithm(BSA),which ensures global convergence without losing population diversity and effectively jumps out of local optimum.Then,ISSA algorithm is used to optimize the penalty coefficient C and the kernel function σ of KELM and the prediction model of insulation deterioration state is obtained.Research results show that the predictive effect of ISSA-KELM is much better than that of other models.

Overview of Person Re-identification for Complex Scenes

ZHANG Min, YU Zeng, HAN Yun-xing, LI Tian-rui

Computer Science. 2022, 49 (10): 138-150. doi:10.11896/jsjkx.211200207

Abstract

PDF(3098KB) ( 1471 )

References | Related Articles | Metrics

Person re-identification(Re-ID) aims to study the matching of specific persons among multiple disjoint cameras.To the best of our knowledge,it’s the first work that uses the types of challenges that the Re-ID technology needs to overcome in complex scenes as the classification basis,and classifies the Re-ID articles published during 2010-2021 into seven categories:person posture issues,occlusion issues,lighting issues,viewpoint issues,background issues,resolution issues and other open issues.This classification method is convenient for researchers to start from actual needs and find corresponding solutions according to the problems.Firstly,it reviews the research background,significance and research status of Re-ID,summarizes the current mainstream Re-ID framework,counts the papers published in the three top conferences of computer vision,i.e.CVPR,ICCV and ECCV,and counts the Re-ID related projects in the national fund projects since 2013.Secondly,with regard to the seven types of challenges faced in complex scenarios,the existing literatures are classified and analyzed in detail from the two aspects:the cause of the problems and the solutions.The mainstream methods for dealing with various challenges are summarized and listed again.Afterwards,we summarize the Re-ID methods with high generalization and list the difficulties of the current Re-ID research.Finally,the future development trend of Re-ID is discussed.

Spatial Encoding and Multi-layer Joint Encoding Enhanced Transformer for Image Captioning

FANG Zhong-jun, ZHANG Jing, LI Dong-dong

Computer Science. 2022, 49 (10): 151-158. doi:10.11896/jsjkx.210900159

Abstract

PDF(2444KB) ( 434 )

References | Related Articles | Metrics

Image captioning is one of the hot research topics in the field of computer vision.It is a cross-media data analysis task that combines computer vision and natural language processing.It describes the image by understanding the content of the image and generating captions that are both semantically and grammatically correct.Existing image captioning methods mostly use the encoder-decoder model.This kind of methods mostly ignore the relative position relationship between visual objects when extracting the visual object features in image,and the relative position relationship between objects is very important for generating accurate captioning.Based on this,this paper proposes a spatial encoding and multi-layer joint encoding enhanced transformer for image captioning.In order to make better use of the position information contained in the image,this paper proposes a spatial encoding mechanism for visual objects,which converts the independent spatial relationship of each visual object into the relative spatial relationship between visual objects to help the model to recognize the relative spatial relationship between each visual object.At the same time,in the encoder part of visual objects,the top encoding feature retains more semantic information that fits the image but loses part of the visual information of the image.Taking this into account,this paper proposes a multi-level joint encoding mechanism to improve the semantic information contained in the top encoding layer by integrating the image feature information contained in each shallow encoding layer,so as to obtain richer semantic features that fit the image.This paper evaluates the proposed image captioning method by multiple evaluation indicators(BLEU,METEOR,ROUGE-L,CIDEr,etc.) on the MSCOCO dataset.The ablation experiment proves that the spatial encoding mechanism and the multi-level joint encoding mechanism proposed in this paper can be helpful in generating more accurate and effective image captions.Comparative experimental results show that the proposed method in can produce accurate and effective image caption and is superior to most of the latest methods.

Robust Hash Learning Method Based on Dual-teacher Self-supervised Distillation

MIAO Zhuang, WANG Ya-peng, LI Yang, WANG Jia-bao, ZHANG Rui, ZHAO Xin-xin

Computer Science. 2022, 49 (10): 159-168. doi:10.11896/jsjkx.210800050

Abstract

PDF(4472KB) ( 516 )

References | Related Articles | Metrics

In order to improve the performance of unsupervised hash learning and achieve robust hashing image retrieval,this paper proposes a novel robust hash learning method based on dual-teacher self-supervised distillation.Specifically,the proposed method contains two stages:a self-supervised dual-teacher learning stage and a robust hash learning stage.In the first stage,a modified cluster algorithm is designed to effectively improve the accuracy of hard pseudo labels.Then,we fine-tune the teacher networks by hard pseudo labels to get the initial soft pseudo labels.In the second stage,we filter the initial soft pseudo labels by our soft pseudo label denoising method,which combines a hybrid denoising strategy and a dual-teacher denoising strategy.Then,we train the student network with the denoised soft pseudo labels by knowledge distillation,so that robust hash codes for label-free images are obtained.Extensive experiments on CIFAR-10,FLICKR25K and EuroSAT datasets show that the proposed robust hash learning method outperforms the state-of-the-art methods.In detail,the MAP of our method is 18.6% higher than that of the TBH method on CIFAR-10,2.4% higher than that of the DistillHash method on FLICKR25K,and 18.5% higher than that of the ETE-GAN method on EuroSAT.

Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network

HUANG Zhong-hao, YANG Xing-yao, YU Jiong, GUO Liang, LI Xiang

Computer Science. 2022, 49 (10): 169-175. doi:10.11896/jsjkx.210800250

Abstract

PDF(2029KB) ( 553 )

References | Related Articles | Metrics

Aiming at the problems of insufficient knowledge distillation efficiency,single stage training methods,complex training processes and difficult convergence of traditional knowledge distillation methods in image classification tasks,this paper designs a mutual learning knowledge distillation based on multi-stage multi-generative adversarial networks(MS-MGANs).Firstly,the whole training process is divided into several stages,teacher models of different stages are obtained to guide student models to achieve better accuracy.Secondly,the layer-wise greedy strategy is introduced to replace the traditional end-to-end training mode,and the layer-wise training strategy based on convolution block is adopted to reduce the number of parameters to be optimized in each iteration process,and further improve the distillation efficiency of the model.Finally,a generative adversarial structure is introduced into the knowledge distillation framework,with the teacher model as the feature discriminator and the student model as the feature generator,so that the student model can better follow or even surpass the performance of the teacher model in the process of continuously imitating the teacher model.The proposed method is compared with other advanced knowledge distillation methods on several public image classification data sets,and the experimental results show that the new knowledge distillation method has better performance in image classification.

Study on 3D Motion-in-Depth Perception Based on Binocular Vision

LU Ping, ZHANG Di, XIAO Jun-feng, BI Ke

Computer Science. 2022, 49 (10): 176-182. doi:10.11896/jsjkx.220500265

Abstract

PDF(2493KB) ( 723 )

References | Related Articles | Metrics

Obtaining stereoscopic information is one of the basic abilities of human beings to perceive the world.Through stereo vision,we can judge the shape,size,distance,relative position of objects,as well as the direction and speed of changes in object motion information.Among them,the perception information of moving objects plays an important role in stereo vision perception.The acquisition of motion visual information is not only the key ability of biological vision systems to survive in a dynamic world,but also an important means for artificial vision systems to efficiently process stereoscopic video.Therefore,in order to design a 3D depth motion perception model that conforms to the visual characteristics of human eyes,it is necessary to explicitly excavate the salient features of human perception of stereoscopic motion,so as to design experiments to explore.In this paper,motion stereo videos are designed as visual stimuli based on monocular and binocular cues,and subjective experiments are designed using the control variable method.The experiment explores two parts:the influence of the relative distance between the target and the reference sphere on the subjects’ perception ability,and the relationship between the actual movement direction of the target and the subjects’ perception direction.Experimental data is analyzed by using two behavioral measures:the percentage of successfully intercepted targets and the perceived bias.The conclusion shows that,firstly,the smaller the relative distance between the target and the reference,the higher the interception success rate.The target velocity and the reference’s motion radius affect the relative distance of the target and reference spheres.This indicates that the relative positional relationship between the target and the reference plays an important role in the human eye’s perception of moving objects.Motion perception has a certain relativity,and motion is easier to perceive at the position with reference point and the position close to the reference point.Second,we find that perceptions elicited by deep motion are more pronounced than those induced by lateral motion.The correct intercept rate of perceived depth direction is 42.67%～47.01% higher than that of lateral motion.This shows that the visual stimulation brought by deep motion is more obvious,and the perception ability of objects moving in different directions is asymmetric.However,when there is an interception error in the depth direction,the perceptual deviation is larger,and the deviation is about 0.1583~0.3665.This study explores the salient features of human perception of motion and provides insights into the observer’s process of motion perception in 3D environments.This study explores the significant characteristics of human motion perception,and provides a new subjective contrast standard to judge the perception effect of 3D motion perception model for the subsequent design of 3D motion perception model work,which makes the original stereo perception ability index more refined.

Neural Architecture Search for Light-weight Medical Image Segmentation Network

ZHANG Fu-chang, ZHONG Guo-qiang, MAO Yu-xu

Computer Science. 2022, 49 (10): 183-190. doi:10.11896/jsjkx.210800052

Abstract

PDF(2977KB) ( 490 )

References | Related Articles | Metrics

Most of the existing medical image segmentation models with excellent performance are manually designed by domain experts.The design process usually requires a lot of professional knowledge and repeated experiments.In addition,the over complex segmentation model not only has high requirements for hardware resources,but also has low segmentation efficiency.An neural architecture search method named Auto-LW-MISN(Automatically Light-weight Medical Image Segmentation Network) is proposed for automatic construction of light-weight medical image segmentation network.In this paper,by constructing a light-weight search space,designing a search super network for medical image segmentation,and designing a differentiable search stra-tegy with complexity constraints,a neural architecture search framework for automatic search of light-weight medical image segmentation network is established.Experimental results on microscope cell images,liver CT images and prostate MR images show that Auto-LW-MISN can automatically construct light-weight segmentation models for different modes of medical images,and its segmentation accuracy is improved compared with U-net,Attention U-net,Unet+＋and NAS-Unet.

Cross-scale Feature Fusion Self-attention for Image Captioning

WANG Ming-zhan, JI Jun-zhong, JIA Ao-zhe, ZHANG Xiao-dan

Computer Science. 2022, 49 (10): 191-197. doi:10.11896/jsjkx.220600009

Abstract

PDF(3284KB) ( 404 )

References | Related Articles | Metrics

In recent years,the encoder-decoder framework based on self-attention mechanism has become the mainstream model in image captioning.However,self-attention in the encoder only models the visual relations of low-scale features,ignoring some effective information in high-scale visual features,thus affecting the quality of the generated descriptions.To solve this problem,this paper proposes a cross-scale feature fusion self-attention(CFFSA) method for image captioning.Specifically,CFFSA integrates low-scale and high-scale visual features in self-attention to improve the range of attention from a visual perspective,which increases effective visual information and reduces noise,thereby learning more accurate visual and semantic relationships.Experiments on MS COCO dataset show that the proposed method can more accurately capture the relationship between cross-scale visual features and generate more accurate descriptions.In addition,CFFSA is a general method,which can further improve the performance of the model by combining with other self-attention based image captioning methods.

Object Detection Algorithm Based on Improved Split-attention Network

PAN Yi, WANG Li-ping

Computer Science. 2022, 49 (10): 198-206. doi:10.11896/jsjkx.210800214

Abstract

PDF(3750KB) ( 420 )

References | Related Articles | Metrics

Recently,most object detection algorithms based on convolutional neural network have the problems of lacking of reasonable use of meaningful contextual information and are easy to miss the detection of hard targets.In order to solve these problems,this paper proposes an object detection algorithm based on improved split-attention networks.Firstly,the split attention mechanism is introduced,and the multi-path structure is combined with feature-map attention mechanism to improve its feature representations.Then,in the convolution layer,poly-scale convolution is used to replace the vanilla convolution to enhance the scale-sensitivity of the neural network.Finally,the proposed algorithm is applied to Faster R-CNN.Experiments are carried out on Pascal VOC and MS COCO datasets.Compared with the original algorithm,the mAP of the proposed algorithm has improved 1.6% and 2.4% respectively without introducing additional parameters and computational complexities,and the mAP of the proposed algorithm is also higher than that of other algorithms,which verifies its good performance.

Voxel Deformation Network Based on Environmental Information Mining

LIU Na-li, TIAN Yan, SONG Ya-dong, JIANG Teng-fei, WANG Xun, YANG Bai-lin

Computer Science. 2022, 49 (10): 207-213. doi:10.11896/jsjkx.210900066

Abstract

PDF(3690KB) ( 349 )

References | Related Articles | Metrics

The technique of 3D deformation is one of the hot topics in the field of computer graphics.Current 3D deformation methods mainly learn the changes before and after deformation by aggregating localized adjacent voxel features,and fail to exploit the interrelationship between non-local voxel features,and the absence of contextual information prevents the model from capturing more discriminative features.To address the above problems,this paper designs a voxel deformation network based on environmental information mining,which can extract local and environmental information simultaneously,and extract environmental information from different spatial domains to improve the representation performance of the network,further modeling the relationship before and after the deformation of the object.Firstly,a novel self-attention mechanism is introduced.Specifically,the learning of the non-local dependence of different voxels is proposed to improve the ability of voxel discrimination.Then,a multi-scale analysis method is introduced to extract environmental information in different perceptual fields via multiple dilated convolution with different dilation rates,which provides more informative contextual features for the subsequent models.In addition,this paper analyzes the impact of feature fusion on the model and designs a method based on encoder-decoder feature fusion,which adaptively fuses the features extracted from the encoder and decoder to improve the nonlinear mapping capability of the model.Extensive experiments are conducted on our tooth dataset.The results show that the deformation prediction accuracy of the proposed method is improved compared to existing methods.

Face Image Synthesis Driven by Geometric Feature and Attribute Label

DAI Fu-yun, CHI Jing, REN Ming-guo, ZHANG Qi-dong

Computer Science. 2022, 49 (10): 214-223. doi:10.11896/jsjkx.210900080

Abstract

PDF(6412KB) ( 448 )

References | Related Articles | Metrics

Aiming at the problems in current face image synthesis,such as the lack of diversity of synthetic appearances and expressions,the low reality of the facial expressions and the low synthesis efficiency,this paper proposes a novel face synthesis network model driven by facial geometric feature and attribute label.Given a source face image,a target face image and the attribute(e.g.,hair color,gender,age) label,the new face synthesis model can generate a highly realistic face image which owns the expression of the source face,the identity of the target face and the specified attribute.The new model consists of two parts:facial landmark generator(FLMG) and geometry and attribute aware generator(GAAG).FLMG uses the facial geometric feature points to encode the expression information,and transfers the expression from the source to the target face in the form of feature points.Combining the transferred feature points,the specified attribute label and the target face image,GAAG generates a face image with specified appearance and expression.A novel soft margin triplet perception loss is introduced to GAAG,which can make the synthesized face more natural and keep the identity of the target face well,and makes the GAAG converge faster.Experimental results show that the face images generated by our approach have more diverse appearances and more realistic expressions.In addition,our model only needs to be trained once to realize the transfer between any arbitrary different expressions,so its efficiency is high.

Survey of Document-level Entity Relation Extraction Methods

FENG Jun, WEI Da-bao, SU Dong, HANG Ting-ting, LU Jia-min

Computer Science. 2022, 49 (10): 224-242. doi:10.11896/jsjkx.211000057

Abstract

PDF(6110KB) ( 1003 )

References | Related Articles | Metrics

As the core task of text mining and information extraction,entity relation extraction intends to identify and determine the specific relation between entity pairs from natural language texts,provides basic support for intelligent retrieval and semantic analysis,and helps to improve search efficiency.It is a research hotspot in the field of natural language processing.Compared with relation extraction from single sentence,documents contain richer entity relation semantics.Therefore,recently many new extraction methods have shifted their research focus from sentence-level to document-level,and achieved rich research results.This paper systematically summarizes the mainstream methods and research progress of document-level entity relation extraction in recent years.Firstly,the paper summarizes the problems and challenges of document-level relation extraction,and then introduces a variety of document-level relation extraction methods from three aspects:sequence based,graph based and pre-trained language model based.Finally,the data sets and experiments used by each method are compared and analyzed,and the possible research directions in the future are discussed and prospected.

Chinese Keyword Extraction Method Combining Knowledge Graph and Pre-training Model

YAO Yi, YANG Fan

Computer Science. 2022, 49 (10): 243-251. doi:10.11896/jsjkx.210800176

Abstract

PDF(2534KB) ( 623 )

References | Related Articles | Metrics

Keywords represent the theme of the text,which is the condensed concept and content of the text.Through keywords,readers can quickly understand the gist and idea of the text and improve the efficiency of information retrieval.In addition,keyword extraction can also provide support for automatic text summarization and text classification.In recent years,research on automatic keyword extraction has attracted wide attention,but how to extract keywords from documents accurately remains a challenge.On the one hand,the keyword is people’s subjective understanding,judging whether a word is a keyword itself is subjective.On the other hand,Chinese words are often rich in semantic information and it is difficult to accurately extract the main idea expressed in the text by solely relying on traditional statistical features and thematic features.Aiming at the problems of low accuracy,information redundancy and information missing in Chinese keyword extraction,this paper proposes an unsupervised keyword extraction method combining knowledge graph and pre-training model.Firstly,topic clustering is carried out by using the pre-training model,and a sentence-based clustering method is proposed to ensure the coverage of the final selected keyword.Then,the knowledge graph is used for entity linking to achieve accurate word segmentation and semantic disambiguation.After that,the semantic word graph is constructed based on the topic information to calculate the semantic weight between words.Finally,keywords are sorted by the weighted PageRank algorithm.Experiments are conducted on two public datasets,DUC 2001 and CSL,and a separate annotated CLTS dataset,the prediction accuracy,recall rate and F1 score are taken as indicators in comparative experiments.Experimental results show that the accuracy of the proposed method has improved compared with other baseline methods,F1 value is increased by 9.14% compared with the traditional statistical method TF-IDF,and increased by 4.82% compared with the traditional graph method TextRank on CLTS dataset.

Fast DOM Object Search and Location Algorithm for RPA System

MENG Yuan, QIN Yun-chuan, CAI Yu-hui, LI Ken-li

Computer Science. 2022, 49 (10): 252-257. doi:10.11896/jsjkx.210900210

Abstract

PDF(2212KB) ( 398 )

References | Related Articles | Metrics

Robot process automation(RPA) is a business process automation technology based on software robot and artificial intelligence.It can replace or assist human beings to complete repetitive work in computers and other equipments.When applying RPA software to automate the browser page elements,how to quickly locate and search the target DOM elements on the premise of ensuring accuracy is the key technical difficulty to complete a complete automation process.The existing location methods,such as XPath and Css-Selector,will have problems such as slow location speed or inaccurate path location in the face of web pages with complex structure.In order to solve the above problems,a fast DOM object search and location algorithm for RPA system is proposed:the optimal XPATH path algorithm,which analyzes the attributes of elements and generates the optimal path to uniquely locate elements during automatic operation.Experimental results show that the time required to locate elements using the optimal path is only 23.14% of that using the complete XPATH path.It has the advantages of reducing the difficulty of path generation and improving the element positioning speed,and improves the automation efficiency.

Trajectory Prediction Method Based on Fusion of Graph Interaction and Scene Perception

FANG Yang, ZHAO Ting, LIU Qi-lie, HE Dong, SUN Kai-wei, CHEN Qian-bin

Computer Science. 2022, 49 (10): 258-264. doi:10.11896/jsjkx.211000172

Abstract

PDF(2803KB) ( 577 )

References | Related Articles | Metrics

To accurately perceive the environment and predict the trajectory of the surrounding traffic participants for autonomous driving,we propose a real-time end-to-end trajectory prediction framework based on bird eye view(BEV) to learn both interaction and scene information simultaneously.The framework consists of two essential modules:graph interaction network and pyramid perception network.The former encodes the interaction patterns among traffic participants through a spatiotemporal graph convolutional network,and the latter adopts a spatiotemporal pyramid network to model the surrounding information and obtain the scene features.Next,interactive features and scene features are fused at a unified scale to perform classification and trajectory prediction tasks.Experiments and analysis on Nuscenes,a large open-source dataset,indicate that the proposed framework achieves a higher classification accuracy of 3.1% and 1.43% less predicted trajectory loss than MotionNet.Hence,our framework outperforms state-of-the-art algorithms in terms of generalization and robustness,and is more in line with perception requirements in actual autonomous driving scenes.

Multi-turn Dialogue Technology and Its Application in Power Grid Data Query

WANG Kai, LI Zhou-jun, SHENG Wen-bo, CHEN Shu-wei, WANG Ming-xuan, LIU Jian-qing, LAN Hai-bo, ZHANG Rui

Computer Science. 2022, 49 (10): 265-271. doi:10.11896/jsjkx.200600078

Abstract

PDF(2612KB) ( 396 )

References | Related Articles | Metrics

With the integration of information technology and traditional industries,it has become a trend to use computer-controlled machines instead of humans to perform repetitive,boring and even dangerous tasks.In order to effectively interact with computers in natural language,human-computer interaction and dialogue systems based on multi-turn dialogue technology have become a research hotspot in the field of artificial intelligence and natural language processing.In the grid control system,the dispatcher needs to do a large number of query operations manually.To reduce the complexity of existing dispatching system and improve the speed of emergency handling of dispatchers,multi-turn dialogue technology can be applied to realize intelligent voice query of power grid data.This paper first describes the basic architecture of the task-oriented multi-turn dialogue system,including functions and related algorithms of its three modules:natural language understanding,dialogue management,and natural language generation.Next,in order to meet the demand of power grid companies for specific scenarios such as intelligent data queries,this paper designs and implements a multi-module task-oriented multi-turn dialogue system which consists of natural language understanding module,dialogue management module,natural language generation module and knowledge base as core mo-dules.The grid dispatcher can ask the system questions and get answers in the form of natural language.This process does not require keyboard or mouse operations,which greatly improves the rapidity and convenience of the grid information query.

Research on Verifiable Keyword Search over Encrypted Cloud Data:A Survey

ZHOU Qian, DAI Hua, SHENG Wen-jie, HU Zheng, YANG Geng

Computer Science. 2022, 49 (10): 272-278. doi:10.11896/jsjkx.220500285

Abstract

PDF(1682KB) ( 532 )

References | Related Articles | Metrics

The convenience and efficiency of cloud computing have brought great potential for its development.More and more enterprises and individuals obtain real benefits by using various outsourcing services provided by cloud computing.In order to protect the confidentiality and integrity of outsourced data in the cloud,the keyword search over encrypted cloud data with privacy protection and integrity verification is becoming a research hotspot in the field of cloud computing.In this paper,we focus on the issue of the verifiable keyword search over encrypted data.The system models,threat models and frameworks adopted in the existing works are firstly introduced.Related works are overviewed from the aspects of verifiable single keyword search and verifiable multi-keyword search over encrypted data,and the ideas of these works are briefly described together with the advantages and disadvantages.At last,the conclusion is presented through a comprehensive analysis and comparison of the related works,and the possible research directions and trends in the future are prospected.

Defense Method Against Code Reuse Attack Based on Real-time Code Loading and Unloading

HOU Shang-wen, HUANG Jian-jun, LIANG Bin, YOU Wei, SHI Wen-chang

Computer Science. 2022, 49 (10): 279-284. doi:10.11896/jsjkx.220500091

Abstract

PDF(2578KB) ( 441 )

References | Related Articles | Metrics

In recent years,code reuse attack has become a mainstream attack against binary programs.The code reuse attack such as ROP uses the instruction gadgets in the memory space to construct an instruction sequence that can realize specific functions and achieve malicious purposes.According to the basic principle of the code reuse attack,this paper proposes a defense method based on real-time function loading and unloading.More specifically,the method shrinks the code space by the dynamic loading/unloading,to reduce the attack surface and defend the code reuse.First,it extracts sufficient function information in the dependent libraries of the target program by static analysis,and uses this information in the form of replacement libraries.Second,it introduces real-time loading in the dynamic loader in Linux,and proposes an auto-triggerable and auto-restorable loading/unloading.In order to reduce the high overhead caused by frequent unloading,a randomized batch unloading mechanism is designed.Finally,experiments are carried out in a real environment to verify the effectiveness of the scheme against code reuse attacks,and the significance of the randomized unloading strategy is demonstrated.

Locally Black-box Adversarial Attack on Time Series

YANG Wen-bo, YUAN Ji-dong

Computer Science. 2022, 49 (10): 285-290. doi:10.11896/jsjkx.210900254

Abstract

PDF(2286KB) ( 509 )

References | Related Articles | Metrics

Deep neural networks(DNNs) for time series classification have potential security concerns due to their vulnerability to adversarial attacks.The existing attack methods on time series performglobal perturbation based on gradient information,and the generated adversarial examples are easy to be perceived.This paper proposes a locally black-box method to attack DNNs without gradient information.First,the attack is described as a constrained optimization problem with the assumption that the method cannot get any inner information of the model,then the genetic algorithm is employed to solve it.Second,since time series shapelets provides the most discriminative information among different categories,it is designed as a local perturbation interval.Experimental results on UCR datasets that have potential security concerns indicate that the proposed method can effectively attack DNNs and generate adversarial samples.In addition,compared with the benchmark,the method significantly reduces the mean squared error while keeping a high success rate.

Distributed Privacy Protection Data Search Scheme

LIU Ming-da, SHI Yi-juan, RAO Xiang, FAN Lei

Computer Science. 2022, 49 (10): 291-296. doi:10.11896/jsjkx.210900233

Abstract

PDF(2589KB) ( 591 )

References | Related Articles | Metrics

Aiming at the problem of data island caused by high-sensitivity data in the cloud,which makes the data unable to search,discover and share with each other,a distributed privacy protection data search scheme is proposed to realize the two-way confidentiality of data and search conditions in distributed scenarios,and a trusted search certificate could be established.Firstly,the data model,the objectives and application scenarios of scheme protection are defined.Next,the design framework and protocol flow of the scheme are proposed,focusing on the overall flow of three parts:trusted data interaction channel based on blockchain,trusted key sharing module and ciphertext search engine.Then,a full-text search engine tantivy SGX in ciphertext state based on trusted execution environment is proposed,and the principle and implementation method are analyzed in detail.Finally,the overall process and core methods are implemented and verified.Experiments show that the scheme is efficient and feasible,and can effectively enhance the security of data discovery and search in distributed environment.

Reputation-based Blockchain Sharding Consensus Scheme

WANG Meng-nan, HUANG Jian-hua, SHAO Xing-hui, MAI Yong

Computer Science. 2022, 49 (10): 297-309. doi:10.11896/jsjkx.210800227

Abstract

PDF(4149KB) ( 722 )

References | Related Articles | Metrics

Sharding is a technology that solves the problem of blockchain capacity expansion.However,sharding may make it ea-sier for malicious nodes to be concentrated in a single shard,thus hindering the safe operation of the entire system.This paper proposes a reputation-based sharding consensus protocol(RBSCP),which establishes a reputation mechanism to measure node behavior and encourage nodes to follow the protocol.The reputation level-based sharding method reduces the difference in the reputation level distribution in different shards,so as to prevent malicious nodes from concentrating on a single shard to do evil.A double-chain model combining verification chain and record chain is proposed.Through the differentiated storage of transactions,the storage capacity of the blockchain is expanded while the security of the blockchain is improved.By associating the vo-ting shares with the node reputation and differentiating the node commitments,a reputation-based fast Byzantine fault tolerance(RFBFT) algorithm is proposed,which enables honest nodes to reach consensus faster and reduces the impact of malicious nodes.Security analysis shows that RBSCP can guarantee the rationality of node distribution in shards and the security of consensus process,and prevent double spend attack and nothing at stake attack.Experimental results show that RBSCP can achieve low sharding latency,low consensus latency and high throughput under the premise of ensuring security.

PGNFuzz:Pointer Generation Network Based Fuzzing Framework for Industry Control Protocols

WANG Tian-yuan, WU Shu-hong, LI Zhao-ji, XIN Hao-guang, LI Xuan, CHEN Yong-le

Computer Science. 2022, 49 (10): 310-318. doi:10.11896/jsjkx.210700248

Abstract

PDF(3440KB) ( 655 )

References | Related Articles | Metrics

Industrial security issues have always been an important and urgent issue globally.Industrial control protocols are widely used in the communication between industrial control system(ICS) components.Their security is related to the safe and stable operation of the entire system,and there is an urgent need to ensure the security of all industrial control protocols.The network protocol fuzzing plays an important role in ensuring the security and reliability of ICS.Traditional fuzzing methods can improve the security testing of industrial control protocols,and many of which have practical applications.However,most traditional fuzzing methods rely heavily on specifications of industrial control protocols,making the test process costly,time-consuming,cumbersome and boring.If the norm does not exist,the task is difficult to carry out.This paper proposes an intelligent and automatic protocol fuzzing method based on pointer-generation networks(PGN),and gives a series of performance indicators.On the basis of this method,an intelligent and automatic fuzzing framework based on PGNFuzz for application is designed,which can be used for various industrial control protocols.Several typical industrial control protocols such as Modbus and EtherCAT are used to test the validity and efficiency of our framework.Experiment results show that our method is superior to other general purpose fuzzers(GPF) and other deep learning based fuzzing methods in terms of convenience,effectiveness and efficiency.

Field Segmentation of Binary Protocol Based on Probability Model

YANG Zi-ji, PAN Yan, ZHU Yue-fei, LI Xiao-wei

Computer Science. 2022, 49 (10): 319-326. doi:10.11896/jsjkx.210800268

Abstract

PDF(2714KB) ( 501 )

References | Related Articles | Metrics

Field segmentation is the basis of protocol format inference.The subsequent steps of protocol format inference,such as message structure identification,field semantic inference and field value constraint inference,highly depend on the quality of field segmentation.Field segmentation of binary protocol is a big challenge because of the lack of character coding and delimitation,the flexibility of field length and the expansiveness of field range.To improve feature construction and decision rules,this paper proposes a novel binary protocol field segmentation method based on probability model.First,it constructs the field boundary constraint relationship of binary protocol messages from the internal structure of message and the value change between messages.Then,it combines various constraints in the way of probability,calculating the probability of each position becoming the boundary by factor graph model.Finally,the most likely field boundaries are obtained from probability.Experiments show that the proposed method can achieve more accurate and robust results than the traditional methods in binary protocol field segmentation.

Lazy-mode Ciphertext-update Based Approach for CP-ABE Attribute Change

LEI Xue-jiao, WANG Yin-long, Nurmamat HELIL

Computer Science. 2022, 49 (10): 327-334. doi:10.11896/jsjkx.211000189

Abstract

PDF(2227KB) ( 393 )

References | Related Articles | Metrics

Ciphertext-policy attribute-based encryption(CP-ABE) can be used to realize secure data sharing in cloud computing environments.However,user attribute change(attribute revocation and addition) in CP-ABE is a tricky problem.Generally,attribute change is realized via the proxy server’s secondary encryption of ciphertext and key update.However,when enforcing an attribute change,all ciphertexts related to this attribute should be updated.This paper proposes a user attribute change approach based on lazy-mode ciphertext-update.It analyzes the user’s access ability(before attribute revocation or after attribute addition) to the ciphertexts involved in attribute change and determines if these ciphertexts need to be updated,minimizing the scope of the ciphertexts that need to be updated and reducing the number of updates.This approach improves its efficiency by avoiding unnece-ssary ciphertext updates and shortening the ciphertext while preserving the original security features of the CP-ABE.Finally,a small-size test is conducted to verify the correctness of the proposed approach.

ZKFERP:Universal and Efficient Range Proof Scheme with Constant Computational Cost

LI Yi-cong, ZHOU Kuan-jiu, WANG Zi-zhong, XU Lin

Computer Science. 2022, 49 (10): 335-343. doi:10.11896/jsjkx.210900044

Abstract

PDF(2038KB) ( 414 )

References | Related Articles | Metrics

The decentralization of blockchain can easily lead to the leakage of users’ private data at the transaction layer,which in turn leads to information security issues.The zero-knowledge range proof is designed to confidentially verify that the transaction data belongs to a legal positive integer range without revealing the transaction data.It effectively solves the problem of blockchain privacy leakage.The existing blockchain range proof scheme can still be further optimized in terms of proof speed,verification speed and calculation cost.In addition,the existing solutions cannot handle the floating-point number problem,thus limiting the application fields of range proofs.This paper proposes an efficient range proof scheme with constant computational cost and universal for floating-point numbers and integers,ZKFERP.It improves the zero-knowledge protocol based on Bulletproofs to optimize the proof structure,and a Lagrangian inner product vector generation method is designed to make the witness generation time constant and the commitment is constructed according to the floating-point number range relationship to implement floating-point range proof.ZKFERP only relies on the discrete logarithm assumption,and third-party credibility is not required.The communication cost and time complexity of ZKFERP are constant.Experimental results show that,compared with the most advanced known range proof scheme,ZKFERP’s proof speed is increased by 40.0%,and the verification speed is increased by 29.8%.

Adaptive Histogram Publishing Algorithm for Sliding Window of Data Stream

WANG Xiu-jun, MO Lei, ZHENG Xiao, GAO Yun-quan

Computer Science. 2022, 49 (10): 344-352. doi:10.11896/jsjkx.210700242

Abstract

PDF(2210KB) ( 353 )

References | Related Articles | Metrics

As one of the most effective privacy protection mechanisms,differential privacy has been widely used in many fields.The existing histogram publishing methods for either static data set or dynamic data set mainly protect the privacy of sliding windows in data streams by adding unified noise.This leads to low data availability,high time complexity and weak privacy protection in their practical applications.In this paper,we tackle this problem by integrating the approximate counting techniques into the differential privacy and proposing an adaptive histogram publishing method for sliding window(APS).Firstly,the proposed APS predicates the distributional information of the sliding windows in the data stream by using an approximate counting method.Secondly,it computes an appropriate value suitable for publishing by checking the difference between estimated values and actual values.Finally,it reduces statistical errors within each interval by clustering.Theoretical analysis shows that the APS algorithm can effectively improve data availability and reduce running time while reducing the privacy budget.Experimental results on two different real data sets also verify the superiority of APS algorithm over existing grouping-based histogram publishing algorithms in terms of data availability and running time.

Study on Distributed Intrusion Detection System Based on Artificial Intelligence

WANG Lu, WEN Wu-song

Computer Science. 2022, 49 (10): 353-357. doi:10.11896/jsjkx.220700095

Abstract

PDF(1881KB) ( 466 )

References | Related Articles | Metrics

In order to solve the problems of data processing defects and low system intrusion accuracy existing in the current dynamic loading system,a distributed intrusion detection system with complete functions and strong practicability is designed by taking the application of “artificial intelligence technology” as an example.Firstly,on the basis of completing the system architecture and database design,comprehensively analyze the control center and the extended network host of the subregional control center,and then formulate corresponding response countermeasures in strict accordance with the relevant response rules of the response library.Secondly,through the use of the communication module,the intrusion behavior is judged to determine whether the intrusion behavior is abnormal.Again,use the S5720S-28P-SI-AC24-port core switch to exchange related data.Then,through the selection of AD2032 alarm responder,a comprehensive monitoring of external intrusion behavior is carried out.In addition,based on the comprehensive analysis of the main body communication implementation,the Libpcap library function is used to complete the scientific design of the intrusion detection process test.The results show that,under the application background of artificial intelligence technology,the distributed intrusion detection system designed in this paper can obtain high detection accuracy,and its accuracy reaches 99%,which provides an important platform for the later security and stable use of the network support.