Computer Science

Many-core Optimization Method for the Calculation of Ab initio Polarizability

LUO Haiwen, WU Yangjun, SHANG Honghui

Computer Science. 2023, 50 (6): 1-9. doi:10.11896/jsjkx.220700162

Abstract

PDF(3054KB) ( 4184 )

References | Related Articles | Metrics

Density-functional perturbation theory(DFPT) based on quantum mechanics can be used to calculate a variety of physicochemical properties of molecules and materials and is now widely used in the research of new materials.Meanwhile,heteroge-neous many-core processor architectures are becoming the mainstream of supercomputing.Therefore,redesigning and optimizing DFPT programs for heterogeneous many-core processors to improve their computational efficiency is of great importance for the computation of physicochemical properties and their scientific applications.In this work,the computation of first-order response density and first-order response Hamiltonian matrix in DFPT is optimized for many-core processor architecture and verified on the new generation Sunway processors.Optimization techniques include loop tiling,discrete memory access processing and colla-borative reduction.Among them,loop tiling divides tasks so that they can be executed by many cores in parallel;discrete memory access processing converts discrete accesses into more efficient continuous memory accesses;collaborative reduction solves the write conflict problem.Experimental results show that the performance of the optimized program improves by 8.2 to 74.4 times over the pre-optimization program on one core group,and has good strong scalability and weak scalability.

Implementation and Optimization of Apache Spark Cache System Based on Mixed Memory

WEI Sen, ZHOU Haoran, HU Chuang, CHENG Dazhao

Computer Science. 2023, 50 (6): 10-21. doi:10.11896/jsjkx.220900261

Abstract

PDF(3181KB) ( 3978 )

References | Related Articles | Metrics

With increasing data scale in the “big data era”,in-memory computing frameworks have grown significantly.The mainstream in-memory computing framework Apache Spark uses memory to cache intermediate results,which greatly improves data processing performance.At the same time,non-volatile memory (NVM) with fast read and write performance has great development prospects in the field of in-memory computing,so there is huge promise in building Spark's cache with a mix of DRAM and NVM.In this paper,a Spark cache system based on DRAM-NVM hybrid memory is proposed,which selects the flat hybrid cache model as the design scheme,and then designs a dedicated data structure for the cache block management system,and proposes the overall design architecture of the hybrid cache system for Spark.In addition,in order to save frequently accessed cache blocks in the DRAM cache,a hybrid cache management strategy based on the minimum reuse cost of cache blocks is proposed.First,the future reuse of RDD is obtained from the DAG information,and the cache blocks with high future reuse times will be stored in the DRAM cache first,and the migration cost is considered when the cache block is migrated.The design experiments show that the DRAM-NVM hybrid cache has an average performance improvement of 53.06% compared to the original cache system,and the proposed strategy has an average improvement of 35.09%compared to the default cache strategy for the same hybrid memory.At the same time,the hybrid system designed in this paper only needs 1/4 of the DRAM and 3/4 of the NVM as the cache,and the running time of the total DRAM cache can be achieved by 85.49%.

Parallel DVB-RCS2 Turbo Decoding on Multi-core CPU

ZHAI Xulun, ZHANG Yongguang, JIN Anzhao, QIANG Wei, LI Mengbing

Computer Science. 2023, 50 (6): 22-28. doi:10.11896/jsjkx.230300005

Abstract

PDF(2813KB) ( 3915 )

References | Related Articles | Metrics

DVB-RCS2 is widely used in satellite broadcasting,maritime satellite communication and military satellite communication fields.For high-throughput software decoding of dual binary Turbo codes in DVB-RCS2 and application of software-defined radio platform,a high-speed parallel software decoding scheme based on multi-core CPU is proposed.Firstly,the computational complexity of dual binary Turbo codes and traditional binary Turbo codes is compared and analyzed.Then,a parallel decoding implementation based on multi-core CPU is designed.The memory footprint and the input quantization method in parallel computing with 8-bit integer data are analyzed and optimized.Finally,our software decoder exceeds 169 Mbps information throughput using the SSE instruction on the Intel 12-core CPU,and the BER performance degradation is less than 0.1dB compared to the floating-point decoder.The results show that proposed implementation is a challenging alternative to GPU implementation in terms of throughput and energy efficiency,and it has an extremely high application value in high-speed satellite receivers.

Lock-free Parallel Semi-naive Algorithm Based on Multi-core CPU

YU Ting, WANG Lisong, QIN Xiaolin

Computer Science. 2023, 50 (6): 29-35. doi:10.11896/jsjkx.220800050

Abstract

PDF(2488KB) ( 3817 )

References | Related Articles | Metrics

Datalog system has a wide range of applications in many fields,such as graph databases,networks,and static program analysis.When dealing with massive data,the serial-based Datalog evaluation strategy cannot fully utilize the computational resources of existing multicore processors.To address these problems,a lock-free parallelized semi-naive algorithm,parallel semi-naive(PSN) based on multi-core CPU is proposed to support efficient Datalog evaluation.PSN uses the B⁺ tree index to partition data and allocates data to different threads to perform calculations.The intermediate result tuples generated from each partition are different from each other,which is conducive to the realization of lock-free parallelism during calculation.PSN uses a double-layer hash table structure to index intermediate results to improve the efficiency of duplicate checking.Each thread only performs related calculations in a specific area,without using locks to prevent write conflicts.PSN adopts task queue and thread pool to allocate tasks to idle threads to achieve multi-thread load balancing.Experimental results on the public datasets of Koblenz network collection(KONECT) and Stanford network analysis platform( SNAP) show that the PSN algorithm can optimize the query performance of datalog rules.

Virtual Machine Consolidation Algorithm Based on Decision Tree and Improved Q-learning by Uniform Distribution

SHI Liang, WEN Liangming, LEI Sheng, LI Jianhui

Computer Science. 2023, 50 (6): 36-44. doi:10.11896/jsjkx.220300192

Abstract

PDF(3252KB) ( 3813 )

References | Related Articles | Metrics

As the scale of cloud data centers expands,problems such as high energy consumption,low resource utilization,and reduced quality of service caused by sub-optimal virtual machine consolidation algorithm becomes increasingly prominent.Therefore,this paper proposes DTQL-UD,a virtual machine consolidation algorithm based on decision tree and improved Q-learning by uniform distribution.It uses the decision tree to characterize the states and selects the next action by uniform distribution when evaluating the next state-action value.At the same time,it can optimize decision-making with real-time feedback directly from the state of the cloud data center to the virtual machine migration process.Besides,aiming at the difference between the simulator and real world in reinforcement learning,we train the simulator by supervised learning model based on a large amount of real cluster load tracking data to enhance the degree of the simulator.Compared with the existing heuristic methods,experiment results show that DTQL-UD can optimize energy consumption,resource utilization,quality of service,number of virtual machine migrations,and remaining active hosts,by 14%,12%,21%,40%,and 10%,respectively.Meanwhile,due to the stronger feature extraction capability of decision tree on tabular data,DTQL-UD can learn better scheduling strategy than other existing deep reinforcement learning(DRL)methods.And in our experiments,as the cluster size increases,the proposed algorithm can gradually reduce the training time of traditional reinforcement learning models by 60% to 92%.

QR Decomposition Based on Double-double Precision Gram-Schmidt Orthogonalization Method

JIN Jiexi, XIE Hehu, DU Peibing, QUAN Zhe, JIANG Hao

Computer Science. 2023, 50 (6): 45-51. doi:10.11896/jsjkx.230200209

Abstract

PDF(1989KB) ( 3810 )

References | Related Articles | Metrics

The Gram-Schmidt orthogonalization algorithm and its related modified algorithms often show numerical instability when computing ill-conditioned or large-scale matrices.To solve this problem,this paper explores the cumulative effect of round-off errors of modified Gram-Schmidt algorithm(MGS),and then designs and implements a double-double precision modified Gram-Schmidt orthogonalization algorithm(DDMGS) based on the error-free transformation technology and double-double precision algorithm.A variety of accuracy tests illustrate that DDMGS algorithm has better numerical stability than the varients of BMGS_SVL,BMGS_CWY,BCGS_PIP and BCGS_PIO algorithms,which proves that DDMGS algorithm can effectively reduce the loss of orthogonality of matrix,improve the numerical accuracy,and demonstrate the stability of our algorithm.In the performance test, the floating point computations(flops) of different algorithms are calculated and then compared DDMGS algorithm with the modified Gram-Schmidt algorithm on ARM and Intel processors,the runtime of the DDMGS algorithm proposed in this paper isabout 5.03 and 18.06 times that of MGS respectively,but the accuracy is improved significantly.

Lattice QCD Calculation and Optimization on ARM Processors

SUN Wei, BI Yujiang, CHENG Yaodong

Computer Science. 2023, 50 (6): 52-57. doi:10.11896/jsjkx.230200159

Abstract

PDF(1750KB) ( 3727 )

References | Related Articles | Metrics

Lattice quantum chromodynamics(lattice QCD) is one of the most important applications of large-scale parallel computing in high energy physics,researches in this field usually consume a large amount of computing resources,and its core is to solve the large scale sparse linear equations.Based on the domestic Kunpeng 920 ARM processor,this paper studies the hot spot of lattice QCD calculation,the Dslash,which is applied on up to 64 nodes(6 144 cores) and show the linear scalability.Based on the roofline performance analysis model,we find that lattice QCD is a typical memory bound application,and by using the compression of 3×3 complex unitary matrices in Dslash based on symmetry,we can improve the performance of Dslash by 22%.For the solving of large scale sparse linear equations,we also explore the usual Krylov subspace iterative algorithm such as BiCGStab and the newly developed state-of-art multigrid algorithm on the same ARM processor,and find that in the practical physics calculation the multigrid algorithm is several times to a magnitude faster than BiCGStab,even including the multigrid setup time.Moreover,we consider the NEON vectorization instructions on Kunpeng 920,and there is up to 20% improvement for multigrid algorithm.Therefore,the use of multigrid algorithm on ARM processors can speed up the physics research tremendously.

CP2K Software Porting and Optimization Based on Domestic c86 Processor

FAN Lilin, QIAO Yihang, LI Junfei, CHAI Xuqing, CUI Rongpei, HAN Bingyu

Computer Science. 2023, 50 (6): 58-65. doi:10.11896/jsjkx.230200213

Abstract

PDF(2603KB) ( 3867 )

References | Related Articles | Metrics

CP2K is currently the fastest open source first-principles materials calculation and simulation software,and the part of the source code that calls the coprocessor is written based on the CUDA architecture.Due to the different underlying hardware architecture and compilation environment of the platform,the native CP2K software cannot call the DCU on the domestic c86 processor platform to achieve cross-platform applications.In order to solve this problem,a CP2K porting scheme for this platform is proposed.The core idea is to analyze the code of the DBCSR library mainly based on the CUDA interface in CP2K software,disassemble the encapsulation method of the corresponding structure and class,and implement and package it based on the programming standard of HIP.The DBCSR library of HIP version is compiled and installed on the domestic c86 processor platform,and the CP2K software is linked to finally realize the CP2K software running the DCU version.Then,two test studies are selected and optimized based on the compilation level and run-level.It is found that removing the FFTW library automatically installed by CP2K script chain can improve the accuracy of calculation results.Experimental results show that the optimized method used can significantly improve the computational efficiency and calculation accuracy of CP2K software,and contribute to the porting optimization and localization of open source software for domestic platforms.

Online Service Function Chain Orchestration Method for Profit Maximization

HUANG Hua, JIANG Jun, YANG Yongkang, CAO Bin

Computer Science. 2023, 50 (6): 66-73. doi:10.11896/jsjkx.220400156

Abstract

PDF(3266KB) ( 3699 )

References | Related Articles | Metrics

With the development of network function virtualization technology,how to deploy service function chain flexibly to maximize the profit has become one of the major challenging issues for network service providers.In this paper,we formulate the service function orchestration problem for multi-data center as 0-1 integer programming with the aim to maximize the profit,and propose a two-stage heuristic algorithm to solve this problem.In the first stage,the weights of nodes and links are calculated according to the load condition and deployment cost,the service function chain is deployed on the node with the highest priority,then the link with the highest priority that meets the bandwidth constraint is selected according to the load condition.In the se-cond stage,by analogy with the longest effective function sequence method,a virtualized network function migration strategy is proposed to reduce the consumption of deployment resources.Simulation experiment is designed based on NSFNET and USNET network topology.Experimental results show that,compared with existing algorithms,the proposed method has a certain improvement in both total profit and deployment success rate.

Simulation Implementation of HHL Algorithm Based on Songshan Supercomputer System

XIE Haoshan, LIU Xiaonan, ZHAO Chenyan, LIU Zhengyu

Computer Science. 2023, 50 (6): 74-80. doi:10.11896/jsjkx.220500108

Abstract

PDF(1547KB) ( 3713 )

References | Related Articles | Metrics

Quantum computing is a new computing mode that follows the laws of quantum mechanics to regulate and control quantum information units to perform calculations,while the quantum algorithm is composed of a series of quantum gates whose realized form is quantum circuit.Quantum circuits are circuits that operate on qubits,using qubits as the basic storage unit to connect quantum logic gates to achieve specific computing functions.This paper uses the MPI+OpenMP hybrid parallel programming model on the “Songshan” supercomputer to realize the construction of large-scale quantum circuits by splitting them into different nodes,which speeds up the construction of circuits.For the communication problem between nodes,serialization and deserialization functions are designed to ensure the transmission of data between nodes.Aiming at the exponential difference in the amount of tasks allocated by each node,an optimized method of splitting the task amount and round-robin processing of each node is designed to achieve load balance between nodes.Finally,the construction of a large-scale quantum phase estimation circuit is successfully implemented on the supercomputer CPU cluster.Compared with a single node,the speedup ratio of 8.63 is achieved.The HHL algorithm is used to verify the correctness of the designed parallel phase estimation sub-module,which provides a reference for the implementation of large-scale HHL algorithm on the supercomputing platform.

Study of Iterative Solution Algorithm of Response Density Matrix in Density Functional Perturbation Theory

LIU Renyu, XU Zhiqian, SHANG Honghui, ZHANG Yunquan

Computer Science. 2023, 50 (6): 81-85. doi:10.11896/jsjkx.220500252

Abstract

PDF(1498KB) ( 3766 )

References | Related Articles | Metrics

For the problem of calculating the response density matrix in density-functional perturbation theory(DFPT),a new parallel solution method for the Sternheimer equation is proposed,i.e.,the Sternheimer equation is solved by the conjugate gra-dient algorithm and the matrix direct decomposition algorithm,and the two algorithms are implemented in the first-principles molecular simulation software FHI-aims.Experimental results show that the computational results using conjugate gradient algorithm and matrix direct decomposition algorithm are more accurate,with less error than those of traditional methods,and scalable,which verifies the correctness and validity of the solution of linear equations in the new Sternheimer equation.

GPU Shared Scheduling System Under Deep Learning Container Cloud Platform

WANG Zhuang, WANG Pinghui, WANG Bincheng, WU Wenbo, WANG Bin, CONG Pengyu

Computer Science. 2023, 50 (6): 86-91. doi:10.11896/jsjkx.220900110

Abstract

PDF(1834KB) ( 3900 )

References | Related Articles | Metrics

In recent years,containers have gradually replaced virtual machines and are widely used in deep learning cloud platforms due to their lightweight and high scalability.However,the deep learning cloud platform still has deficiencies in GPU resource management,which are mainly manifested as multiple containers cannot share GPU resources due to the limitation of container orchestration technology.For some small-scale model training tasks and model inference tasks,a single task cannot fully utilize the computing resources of the entire GPU card.The current exclusive mode will result in a waste of expensive GPU resources,reduce resource efficiency and service availability.In response to this problem,this paper proposes a GPU sharing sche-duling system.On the one hand,the Kubernetes-based Operator mechanism extends the existing cluster functions,enabling multiple Pods to share GPU resources,and designs an agent mechanism to ensure that compatibility with native Kubernetes.On the other hand,based on the GPU time slice and preemption mechanism,the dynamic management and scheduling of GPU resources is realized,fine-grained coordination among multiple tasks is performed,and task interference is reduced.Experimental results show that compared with the native Kubernetes scheduling system,the proposed system can reduce the completion time of a group of deep learning training tasks by about 20% on average,and increase the utilization of cluster GPU resources by about 10% on average.When the GPU is shared,the performance loss of high-priority tasks is less than 5% compared to the exclusive GPU,and the low-priority tasks can run on the same GPU with 20% of the performance.

Three-way Decision Movement Strategy Based on Hierarchical Clustering

XU Yi, LUO Fan, WANG Min

Computer Science. 2023, 50 (6): 92-99. doi:10.11896/jsjkx.220900037

Abstract

PDF(2731KB) ( 280 )

References | Related Articles | Metrics

Acting is an important step in three-way decision TAO model,and it is also an important method to realize object movement.By implementing the strategies,the object is moved from a disadvantageous area to the advantageous area.In recent years,scholars have proposed two kinds of movement strategies,one is region-based movement,the other is object-based movement.However,the two movement strategies are analyzed and formulated from a single-level perspective,and the formulation of movement strategy is not considered from a multi-level perspective.Therefore,in order to make multi-level movement strategy,this paper introduces hierarchical clustering and proposes a three-way decision movement strategy based on hierarchical clustering.Firstly,it uses hierarchical clustering to divide the objects in the disadvantageous area into different levels,and the clustering results are different at each level.Then,according to the highest frequency of global attribute value criterion,a movement strategy is formulated for clusters in each hierarchy,and different clusters have different movement strategies.In addition,the paper also uses the benefit and cost of the movement process to evaluate the different levels of the movement strategies.Finally,experimental results prove the validity of the proposed model.

Three-way DBSCAN Algorithm Based on Local Eps

SHEN Qiuping, ZHANG Qinghua, GAO Man, DAI Yongyang

Computer Science. 2023, 50 (6): 100-108. doi:10.11896/jsjkx.220800074

Abstract

PDF(5819KB) ( 254 )

References | Related Articles | Metrics

Density-based spatial clustering of applications with noise(DBSCAN) is a classical density-based clustering algorithm.It clusters dataset of arbitrary shape through two global parameters:radius Eps and minimum number of points MinPts,and automatically determines the number of clusters.However,DBSCAN with global Eps has poor clustering effect on datasets with uneven density,and cannot cluster overlapping datasets.Therefore,the clustering principle of decreasing density and local Eps are defined in this paper.The local Eps is automatically determined according to the k-nearest neighbor distance,and the DBSCAN algorithm based on local Eps(LE-DBSCAN) is proposed.Then,by considering the labels of the nearest neighbors,the border points and noises in the two-way clustering results are redivided,and the three-way DBSCAN algorithm based on local Eps(LE3W-DBSCAN) is proposed.LE-DBSCAN and LE3W-DBSCAN are compared with the related algorithms in this field on UCI datasets and artificial datasets.Experimental results show that the proposed algorithm has good performance both in common hard clustering indices and soft clustering indices.

Boolean Matrix Representation of Triadic Concepts

WANG Xia, LI Junyu, WU Weizhi

Computer Science. 2023, 50 (6): 109-115. doi:10.11896/jsjkx.220900111

Abstract

PDF(1474KB) ( 242 )

References | Related Articles | Metrics

The Boolean matrix method is introduced into triadic concept analysis to study the Boolean matrix representation me-thod of triadic context and triadic concept.Firstly,the relation matrix of triadic context is defined.The triadic context under each condition is regarded as a Boolean matrix,then the triadic context is a Boolean block matrix,which is the relation matrix of triadic context.Next,the Boolean matrix representation method of induced operators on triadic context is investigated by using the relation matrix.Then,the Boolean matrix representation method of the extent,intent and modus of a triadic concept is obtained.This method only uses some basic Boolean matrix operations when generating triadic concepts,and does not involve the induced operator of triadic context.Finally,the Boolean matrix representation methods are given according to enumeration method of triadic concept construction and triadic concept construction based on object-conditional triadic concepts respectively.Triadic contexts and triadic concepts are explored from the viewpoint of matrix based on the Boolean matrix representation method of triadic concepts,which gives a new perspective to study triadic concept analysis.

Three-way k-means Clustering Based on Artificial Bee Colony

XU Tianjie, WANG Pingxin, YANG Xibei

Computer Science. 2023, 50 (6): 116-121. doi:10.11896/jsjkx.220800150

Abstract

PDF(1319KB) ( 243 )

References | Related Articles | Metrics

Clustering plays an important role in data mining technology.Traditional clustering algorithms are hard clustering algorithms,namely,objects either belong to a class or do not belong to a class.However,when dealing with uncertain data,forced division will lead to decision-making errors.Three-way k-means clustering algorithm can divide the data into several groups with uncertain boundary reasonably.But it is still sensitive to the initial clustering center.In order to solve this problem,this paper presents a three-way k-means clustering algorithm based on artificial bee colony by integrating artificial bee colony algorithm with three-way k-means clustering algorithm.The fitness function of honey source is constructed by class cohesion function and inter class dispersion function to guide the bee colony to search for high-quality honey source globally.Using the cooperation and exchange of different roles between bee colonies,the data set is clustered repeatedly to find the optimal honey source location,which is used as the initial clustering center,and on this basis,iterative clustering is carried out alternately.Experiments show that this method improves the performance index of clustering results.The effectiveness of the algorithm is verified on UCI data set.

Dual Three-way Concept Lattice Based on Composition of Concepts and Its Concept Reduction

LIU Jin, MI Jusheng, LI Zhongling, LI Meizheng

Computer Science. 2023, 50 (6): 122-130. doi:10.11896/jsjkx.220800109

Abstract

PDF(1497KB) ( 254 )

References | Related Articles | Metrics

Three-way concept lattice not only represents the information jointly owned,but also represents the information that is not owned by each other by combining positive operators and negative operators.It is an extension of the classical concept lattice.However,when dealing with some practical problems,we sometimes start from the reverse,considering the information that the complementary sets may not have and the information that they may have,so the dual three-way concept lattice came into being.In this paper,a novel method to construct dual three-way concept lattices based on the composition of dual concepts in formal context and its complementary context is proposed.It is proved that the dual three-way concepts obtained by concepts composition are the same as those obtained by dual three-way operators.Then,we discuss the attribute reduction method of dual three-way concept lattices based on discernibility matrices.With the help of this idea,this paper proposes an approach to reducing the dual three-way concepts based on concept discernibility matrices.

Optimal Scale Selection and Rule Acquisition in Inconsistent Generalized Decision Multi-scale Ordered Information Systems

YANG Ye, WU Weizhi, ZHANG Jiaru

Computer Science. 2023, 50 (6): 131-141. doi:10.11896/jsjkx.220800149

Abstract

PDF(1437KB) ( 237 )

References | Related Articles | Metrics

Granular computing imitates human being's thinking.It shows great promise as a new way for data mining and know-ledge discovery in the context of big data.To solve the problem of knowledge acquisition in inconsistent generalized decision multi-scale ordered information systems,by employing evidence theory,the optimal scale combination and rule extraction in inconsistent generalized decision multi-scale ordered information systems are studied.Dominance relations are first introduced into decision multi-scale information systems,and some basic concepts in decision multi-scale ordered information systems are introduced.With reference to the notion of scale combinations in inconsistent generalized decision multi-scale ordered information systems,representations of information granules as well as lower and upper approximations of sets under different scale combinations are presented and their relationships are examined.With different scales of decisions,several types of optimal scale combinations in inconsistent generalized decision multi-scale ordered information systems are further defined and their relationships are clarified.Finally,a method of discernibility matrix attribute reduction and rule acquisition based on generalized dominance decision functions are explored.

Intelligent Mapping Recommendation-based Knowledge Graph Instance Construction and Evolution Method

ZHANG Yaqing, SHAN Zhongyuan, ZHAO Junfeng, WANG Yasha

Computer Science. 2023, 50 (6): 142-150. doi:10.11896/jsjkx.230300071

Abstract

PDF(2496KB) ( 305 )

References | Related Articles | Metrics

With the development of big data technology,a large amount of heterogeneous data has been generated in various fields.Constructing knowledge graph is an important means to realize semantic intercommunication of heterogeneous data.It is a common method to generate instance model by matching structured data with ontology model mapping.However,most of the existing construction methods require users to manually complete all mapping matching,and the mapping operation is time-consuming and error-prone,unable to perform intelligent matching.In addition,the existing methods do not support incremental updates of the instances.This paper analyzes the existing instance construction methods,and proposes an instance construction and evolution method based on intelligent mapping recommendation to solve the problem of cumbersome manual mapping.Before manually mapping by users,the mapping reuse recommendation mechanism performs multilevel similarity calculation,including element-level similarity,table-level similarity and inter-table propagation similarity,and generates recommendation mapping according to the sorting result of matching.In addition,the incremental discovery mechanism can automatically discover redundant and conflicting instances and generate system background tasks for processing,so as to realize efficient and repeatless import of instances.Experiments are carried out on Shandong government open dataset and Shenzhen medical emergency dataset.With the help of the mapping reuse recommendation module,the interaction time is 3~4 times shorter than that of the traditional mode,and the matching accuracy of field recommendation reaches 98.1%.In the experiment of incremental discovery mechanism,the time required to import 13.94 million instance nodes and 21.58 million relationship edges is reduced from 31.21h to 2.23h,which proves the availability and matching accuracy of intelligent mapping reuse recommendation,and improves the efficiency of instance layer construction and growth.

Noise Estimation and Filtering Methods with Limit Distance

JIANG Gaoxia, QIN Pei, WANG Wenjian

Computer Science. 2023, 50 (6): 151-158. doi:10.11896/jsjkx.220600130

Abstract

PDF(2445KB) ( 268 )

References | Related Articles | Metrics

Machine learning has made remarkable progress and has been successfully applied to many fields in recent years.However,many learning models or algorithms are highly dependent on data quality.Complex label noise usually exists in a large number of datasets in practical applications,so machine learning faces severe challenges in low-quality data modeling and label noise processing.To solve the numerical label noise problem in regression,this paper studies the correlation between label estimation interval and the noise from the perspectives of theoretical analysis and simulation experiments,and proposes a limit distance noise estimation method.Under the optimal sample selection framework,a limit distance noise filtering(LDNF) algorithm is proposed based on this noise estimator.Experimental results show that the proposed noise estimation method has a higher correlation and a lower estimation bias with the true label noise.The proposed LDNF algorithm can effectively identify label noises and reduce the test error of the model in different noise environments on benchmark datasets and real-age estimation datasets,and it outperforms other latest filtering algorithms.

Filtered Feature Selection Algorithm Based on Persistent Homology

YIN Xingzi, PENG Ningning, ZHAN Xueyan

Computer Science. 2023, 50 (6): 159-166. doi:10.11896/jsjkx.220500169

Abstract

PDF(3102KB) ( 360 )

References | Related Articles | Metrics

The existing filtering feature selection algorithm ignores the correlation between features.This paper proposes a new filtering feature selection algorithm —the feature selection algorithm based on persistent homology(Rel-Betti algorithm),which can consider the correlation between features and the combined effect.This paper gives a new definition named by relevant Betti numbers,which can filter out the important topological features in the dataset.The algorithm first preprocesses the data set,classifies the data set according to the class labels,calculates the relevant Betti numbers in different classes,obtains the feature mean of the data information,and uses the feature mean difference to sort the importance of the features.Using eight data in UCI,the algorithm is compared with other common algorithms under four learning models:decision tree,random forest,K-nearest neighbor and support vector machine.Experimental results show that the Rel-Betti algorithm is an effective method that can improve classification accuracy and F1 value,and does not depend on a specific machine learning model.

Multimodal Data Fusion Algorithm Based on Hypergraph Regularization

CUI Bingjing, ZHANG Yipu, WANG Biao

Computer Science. 2023, 50 (6): 167-174. doi:10.11896/jsjkx.220900144

Abstract

PDF(3087KB) ( 328 )

References | Related Articles | Metrics

The multi-modal data fusion improves the performance of data classification and prediction by learning the correlation information and complementary information between multiple datasets.However,existing data fusion methods are based on feature pattern learning of single dataset and ignore structural information among different heterogeneous datasets.This paper proposes a multi-modal data fusion algorithm based on hypergraph regularization(sHMF),acquiring hyper-order relationship of inter-and cross-modality by combining hypergraph and manifold regularization,i.e.homogeneous and heterogeneous high-order networks.Specifically,it firstly generates a hypergraph similarity matrix to represent the high-order relationships among subjects.In the proposed method,the sparse representation of hypergraph is used to build hypergraph for reducing redundant hype-redges.sHMF is validated on the simulated data and real imaging genetic data of schizophrenia patients.Experiment results show that our algorithm outperforms several widely used methods in the classification accuracy of simulated data and real data,and reveals some biomarkers significantly associated with schizophrenia and the potential links between risk genes,methylation factors and abnormal brain regions.

Construction and Automatic Classification of CS1 Test Questions Dataset Based on Bloom's Taxonomy

DONG Rongsheng, WEI Chenyu, HU Jie, QIAO Yucheng, LI Fengying

Computer Science. 2023, 50 (6): 175-182. doi:10.11896/jsjkx.230200182

Abstract

PDF(1549KB) ( 218 )

References | Related Articles | Metrics

Curriculum evaluation is a key link of teaching reform,which involves the evaluation of teaching cases,test questions and classroom teaching.In order to evaluate the test questions of computing courses,this paper introduces Bloom's taxonomy,and takes the test questions of “Introduction to Computer Science” course(CS1) of Princeton University and Guilin University of Electronic Science and Technology as corpus,and the corresponding verb seed bank and noun seed bank for the cognitive process dimension and knowledge dimension of Bloom's taxonomy for CS1 are given,the positions of the two-dimensional matrix of Bloom's taxonomy that could be reached by the test questions are manually labeled,classification dataset for CS1 test questions is constructed.Machine learning technology is used,the automatic classification model TFERNIE-LR of CS1 test questions is given,which is composed of CSTFPOS-IDF algorithm,ERNIE model and LR classifier.CSTFPOS-IDF algorithm is based on TFPOS-IDF algorithm,by the weight factor of the keywords in computing discipline,CSTFPOS-IDF algorithm pays more attention to the keywords improves and generates the weight of words.At the same time,the entity knowledge enhanced pre-training model ERNIE is used to embed the word level vector of test questions,and the combined word weight and word level vector are used to generate the text vector of test questions for automatic classification.Finally,the LR classifier is used to automatically classify test questions into Bloom's taxonomy two-dimensional matrix.Experimental results show that the proposed TFERNIE-LR model has good performance,and weighted-P in the cognitive process dimension and knowledge dimension reaches 83.3% and 96.1% respectively.

Online Semi-supervised Cross-modal Hashing Based on Anchor Graph Classification

QIN Liang, XIE Liang, CHEN Shengshuang, XU Haijiao

Computer Science. 2023, 50 (6): 183-193. doi:10.11896/jsjkx.220400038

Abstract

PDF(3824KB) ( 289 )

References | Related Articles | Metrics

In recent years,hashing algorithm have been widely concerned in efficient cross-modal retrieval of large-scale multimedia data due to small storage costs and high retrieval speed.Most of the existing cross-modal hashing algorithms are supervised or unsupervised methods,and supervised methods usually achieve better performance.However,in real world applications,it is not feasible to require all data to be labeled.In addition,most of these methods are offline,which need to pay high training costs and are very inefficient when facing input of large stream data.This paper proposes a new semi-supervised cross-modal hashing me-thod－－online semi-supervised anchor graph cross-modal hashing(OSAGCH),which builds a semi-supervised anchor graph cross-modal hashing model.It uses regularized anchor graphs to predict data labels in the case where only part of the data has labels,and uses subspace relationship learning to learn hash functions,generating a unified hash code by one step.Then the model is expanded to online version for streaming data input,allowing it to process streaming data.Experiments on public multi-modal data sets indicate that the performance of proposed method is superior to other existing methods.

PSwin:Edge Detection Algorithm Based on Swin Transformer

HU Mingyang, GUO Yan, JIN Yangshuang

Computer Science. 2023, 50 (6): 194-199. doi:10.11896/jsjkx.220700145

Abstract

PDF(2106KB) ( 461 )

References | Related Articles | Metrics

As a traditional computer vision algorithm,edge detection has been widely used in real-world scenarios such as license plate recognition and optical character recognition.When edge detection is used as the basis for higher-level algorithms,such as target detection,semantic segmentation and other algorithms.Edge detection can also be applied to urban security,autonomous driving and other fields.A good edge detection algorithm can effectively improve the efficiency and accuracy of the above compu-ter vision tasks.The difficulty of the edge extraction task lies in the size of the target and the difference of edge details,so the edge extraction algorithm needs to be able to effectively deal with edges of different scales.In this paper,the Transformer is applied to the edge extraction task for the first time,and a novel feature pyramid network is proposed to make full use of the multi-scale and multi-level features of the backbone network.PSwin uses a self-attention mechanism,which can extract global structural information in images more efficiently than convolutional neural network architectures.When evaluated on the BSDS500 dataset,the proposed PSwin edge detection algorithm achieves the best performance,with an ODS F-measure of 0.826 and an OIS of 0.841.

Adaptive Image Dehazing Algorithm Based on Dynamic Convolution Kernels

LIU Zhe, LIANG Yudong, LI Jiaying

Computer Science. 2023, 50 (6): 200-208. doi:10.11896/jsjkx.220400288

Abstract

PDF(3864KB) ( 378 )

References | Related Articles | Metrics

Existing image dehazing methods generally have problems such as incomplete dehazing and color distortion.Image dehazing methods based on traditional deep learning models mostly use static inference during testing,which use the same and fixed parameters for different samples,thereby inhibiting the expressive ability of the model and decreasing the dehazing performance.Aiming at the above problems,this paper proposes an adaptive image dehazing algorithm based on dynamic convolution kernel.The proposed model includes three parts:encoding network,adaptive feature enhancement network and decoding network.This paper combines dynamic convolutions,dense residual connections,and attention mechanism to complete the adaptive feature enhancement network,which mainly includes dynamic residual components and dynamic skip-connected feature fusion components.The dynamic residual component is composed of a dynamic residual dense block,a convolutional layer and a dual attention mo-dule.The dynamic residual dense block introduces dynamic convolutions into the residual dense block,and an attention-based weight dynamic aggregator is designed at the same time,which dynamically generates adaptive convolution kernel parameters.The dynamic convolutions have reduced the loss of information and enhanced the expressive ability of the model.The dual attention module combines channel attention and pixel attention to make the model pay more attention to the differences between image channels and areas with uneven distribution of haze.The dynamic skip-connected feature fusion component learns rich contextual information by dynamically fusing the features of different stages via skip-connections,preventing the early features of the network from being forgotten when the information flows into deeper layers.Meanwhile,the feature representations are greatly enriched,which benefits the restorations of the details for fog-free images.Extensive experiments on synthetic datasets and real datasets show that our method not only achieves better objective evaluation scores,but also reconstructs dehazing images with better visual effects,surpassing the performance of compared methods.

Hyperspectral Image Denoising Based on Group Sparse and Constraint Smooth Rank Approximation

ZHANG Lihong, YE Jun

Computer Science. 2023, 50 (6): 209-215. doi:10.11896/jsjkx.220300236

Abstract

PDF(3589KB) ( 317 )

References | Related Articles | Metrics

In the process of hyperspectral image(HSI) acquisition,there will produce many kinds of noise,and the more the number of noise,the less effective information HSI has.In order to recover HSI's effective messages more effectively from a large number of mixed noises,a constrained smoothing rank approximation for HSI recovery method based on group sparse regularization is proposed in this paper.Among them,the group sparse regularization is defined as the spacial-spectral total variation(SSTV) which based on weighted $\ell_{2,1}$-norm.This regularization not only utilizes the information of spacial-spectral dimension,but also considers the group sparsity inside HSI,which enhances the model's removal effect of mixed noise and the smoothness of spacial-spectral dimension.In addition,the constrained smoothing function is used to approximate the rank function,which makes better use of the low-rank property of HSI and improves the efficiency of the algorithm.The optimization problem is solved by iterative algorithm based on alternating direction multiplier.The results of two simulated data expe-riments and one real data experiment show that compared with the five current mainstream methods,the proposed method has obvious improvement in visual effect and evaluation index.

GAN-generated Face Detection Based on Space-Frequency Convolutional Neural Network

WANG Jinwei, ZENG Kehui, ZHANG Jiawei, LUO Xiangyang, MA Bin

Computer Science. 2023, 50 (6): 216-224. doi:10.11896/jsjkx.220400268

Abstract

PDF(3227KB) ( 309 )

References | Related Articles | Metrics

The rapid development of generative adversarial networks(GANs) has led to unprecedented success in the field of image generation.The emergence of new GANs such as StyleGAN makes the generated images more realistic and deceptive,posing a greater threat to national security,social stability,and personal privacy.In this paper,a detection algorithm based on a space-frequency joint two-stream convolutional neural network is proposed.Since GAN images will leave clearly discernible artifacts on the spectrum due to the up-sampling operation during the generation process,a learnable frequency-domain filter kernel and frequency domain network are designed to fully learn and extract frequency-domain features.In order to reduce the influence of the information discarded from the image transformation to the frequency domain,a spatial domain network is also designed to learn that the image content itself has differentiated spatial domain features.Finally,the two features are fused to detect the face image generated by GAN.Experimental results on multiple datasets show that the proposed model outperforms existing algorithms in detection accuracy on high-quality generated datasets and generalization across datasets.And for JPEG compression,random cropping,Gaussian blur,and other operations,this method has stronger robustness.In addition,the proposed method also performs well on the local face dataset generated by GAN,which further proves that this model has better generality and wider application prospects.

Stratified Pseudo-label Based Image Clustering

CAI Shaotian, CHEN Xiaojun, CHEN Longteng, QIU Liping

Computer Science. 2023, 50 (6): 225-235. doi:10.11896/jsjkx.220900197

Abstract

PDF(2941KB) ( 264 )

References | Related Articles | Metrics

Image clustering is an important and open problem in image processing.Recently,some methods combine the powerful representation ability of contrastive learning to carry out end-to-end clustering learning and utilize the pseudo-label technique to improve the robustness of clustering methods.In the existing pseudo-label methods,we need to set a large threshold parameter to obtain highly confident samples to generate one-hot pseudo-labels and often cannot obtain enough highly confident samples.To make up for these defects,we propose a stratified pseudo-label clustering(SPC) method,which aims to train and refine the classification model using both structure and pseudo-labels information.We first introduce three assumptions for designing of deep clustering methods,i.e.,local smoothing assumption,self-training assumption,and low-density separation assumption.The me-thod consists of two stages:1)manifold based consistency learning,which is used to initialize the classification model in the trai-ning stage;and 2)stratified pseudo-label based model tefinement,which generates stratified pseudo-labels to improve the robustness of the clustering model.We first generate a strong pseudo-label dataset and a weak pseudo-label dataset with a threshold parameter,and then propose a label-propagation method and a mix-up method to improve the weak pseudo-label dataset.Finally,we use both strong pseudo-label dataset and weak pseudo-label dataset to refine the clustering model.Compared with the best baseline,the averaged ACC of SPC improves by 7.6% and 5.0% on STL10 and CIFAR100-20 benchmark datasets,respectively.

Study on Volume Cloud Simulation Based on Weather Data and Multi-noise Fusion

LU Chunhai, XU Xinhai, ZHANG Shuai, LI Hao

Computer Science. 2023, 50 (6): 236-242. doi:10.11896/jsjkx.220500070

Abstract

PDF(3009KB) ( 291 )

References | Related Articles | Metrics

In order to build a realistic simulation environment in the smart drone swarm simulation system,it is necessary to consider modeling and rendering clouds based on weather data.At present,cloud simulations based on real weather data generally adopt physical modeling methods,such as solving NS equations and particle system,which are burdened by heavy calculus equation solving tasks,so they have the disadvantages of large computational volume and inability to achieve real-time simulation in large-scale scenarios.Aiming at this problem,a method of modeling volumetric cloud is proposed.Firstly,weather data is used to generate a texture,then combined with the height dependent functions to define changes in the shape and density of clouds in height,and finally the cloud is modeled in combination with multi-noise.So the weather data is effectively combined with non-physical modeling methods.In the rendering,the color and transparency of each sample point are calculated by using raymarching algorithm to accumulate the density of the cloud from the line of sight direction and the sun,in combination with the law of light absorption and scattering,and finally the cloud is drawn.Experiments show that the simulated volumetric clouds are consistent with the cloud information in the weather data,highly efficient,and close to the real cloud in terms of shape and color.

Chinese Medical Named Entity Recognition Based on Multi-feature Embedding

HUANG Jiange, JIA Zhen, ZHANG Fan, LI Tianrui

Computer Science. 2023, 50 (6): 243-250. doi:10.11896/jsjkx.220400115

Abstract

PDF(1531KB) ( 270 )

References | Related Articles | Metrics

Aiming at the problems of single embedding information,lacking of word boundary and text structure information in Chinese medical named entity recognition(NER) model based on character representation,this paper presents a medical named entity recognition model integrating multi-feature embedding.Firstly,the characters are mapped to a fixed-length embedding representation.Secondly,external resources are introduced to construct lexical feature,which can supplement the potential phrase information of characters.Thirdly,according to the characteristics of Chinese pictographs and text sequences,character structure feature and sequence structure feature are introduced,respectively.The convolutional neural networks are used to encode the two structural features to obtain radial-level word embedding and sentence-level word embedding.Finally,the obtained multiple feature embeddings are concatenated and input into the long short-term memory network encoding,and the entity result is output by the CRF layer.Taking the self-built Chinese medical data and the CHIP_2020 data as the datasets,experimental results show that compared with the benchmark models,the proposed model integrating both lexical feature and text structure feature can effectivelyidentify named entities in the medical field.

Text Classification Method Based on Anti-noise and Double Distillation Technology

GUO Wei, HUANG Jiahui, HOU Chenyu, CAO Bin

Computer Science. 2023, 50 (6): 251-260. doi:10.11896/jsjkx.220500100

Abstract

PDF(1892KB) ( 325 )

References | Related Articles | Metrics

Text classification is an important and classic problem in the field of natural language processing,and it is often used in news classification,sentiment analysis and other scenarios.The existing deep learning-based classification methods have the following three problems:1)There are a large number of noisy labels in real-life datasets,and directly using these data to train the model will seriously affect the performance of the model.2)With the introduction of the pre-training model,the accuracy of model classification has improved,but the scale of the model and the number of inference calculations have also increased significantly,which make it a challenge to use pre-training models on devices with limited resources.3)The pre-training model has a large number of redundant calculations,which will lead to low prediction efficiency when the amount of data is large.To address these issues,this paper proposes a text classification method that combines anti-noise and double distillation(including knowledge distillation and self-distillation).Through the threshold anti-noise method based on confidence learning and a new active learning sample selection algorithm,the quality of the data is improved with a small amount of labeling cost.Meanwhile,the combination of knowledge distillation and self-fistillation reduces the scale of the model and redundant calculation,thereby it can flexibly adjust the inference speed according to the demand.Extensive experiments are performed on real datasets to evaluate the performance of the proposed method.Experimental results show that the accuracy of the proposed method after anti-noise increases by 1.18%,and it can be 4～8 times faster than BERT under small accuracy losses.

Extrapolation Accelerated Linear Alternating Direction Multiplier Method for Split Feasibility Problems and Its Global Convergence

LIU Yang, XUE Zhonghui, WANG Yongquan, CAO Yongsheng

Computer Science. 2023, 50 (6): 261-265. doi:10.11896/jsjkx.230100009

Abstract

PDF(1283KB) ( 219 )

References | Related Articles | Metrics

This paper deals with a linear alternating direction multiplier method (ADMM) for Split feasibility problems (SFP).More specifically,the SFP has been formulated as a separable convex minimization problem with linear constraints,and then extrapolation accelerated linear ADMM has been proposed,which takes advantage of the separable structure,and then rising to sub-problems with closed-form solutions have been given.Furthermore,the global convergence of the proposed method is proved under some suitable conditions.Moreover,the algorithm has been tested by applying to two SFP examples in our theoretical results.

Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning

WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei

Computer Science. 2023, 50 (6): 266-273. doi:10.11896/jsjkx.230300044

Abstract

PDF(2319KB) ( 356 )

References | Related Articles | Metrics

Modular robots are composed of a certain number of standard modules with independent functions.At present,self reconfiguration is a hot and difficult problem in the field of modular robot research.For complex problems,the traditional graph theory algorithm or search algorithm cannot find its optimal solution in polynomial time,and the complexity increases exponentially with the increase of the number of modules.From the perspective of deep reinforcement learning of swarm agents,the research regards each isomorphic module as a single agent with learning and perception ability,and proposes a modular robot self reconfiguration algorithm based on QMIX.For this algorithm,a new type of reward function is designed and the parallel movement of the agent on the basis of limiting the action space of the agents is realized,which solves the problem of coordination and cooperation between multiple agents to a certain extent,thereby realizing the transition from the initial configuration to the target configuration.In addition,in experiments,9 modules are taken as examples to compare the success rate and average steps between this algorithm and the traditional search algorithm based on A^*.Experimental results show that when the time step limit is reasonable,the success rate of the modular robot self-reconfiguration algorithm based on QMIX can reach more than 95%,and the average number of steps of the two algorithms is about 12 steps.The QMIX self-reconfiguration algorithm can approach the effect of the traditional algorithm.

Optimization Algorithms for Job Shop Scheduling Problems Based on Correction Mechanisms and Reinforcement Learning

MIAO Kuan, LI Chongshou

Computer Science. 2023, 50 (6): 274-282. doi:10.11896/jsjkx.220900112

Abstract

PDF(1664KB) ( 301 )

References | Related Articles | Metrics

In recent years,research on using deep reinforcement learning to solve job shop scheduling problems has concentrated on construction techniques,which model the scheduling problem as sequential choice problems and gradually select scheduling nodes for a complete solution.Although this algorithmic theory has produced impressive results,it still suffers from complicated reward formulation and poor solution quality,which prevents its future development.In this study,we design a reinforcement learning construction framework based on graphical neural networks and proximal policy optimisation algorithms,and an innovative and efficient search correction mechanism with a modified swap operator is proposed to enhance the solution quality.It searches the area around a known solution using a Monte Carlo tree,correcting the issue of suboptimal solution selection caused by the discrepancy between training and testing data.The proposed algorithm is comprehensively investigated on public and synthetic datasets.Experimental results demonstrate that the algorithm outperforms the state-of-the-art reinforcement learning framework on both small and medium-sized examples.It not only fully exploits the advantages of rapid solution of constructive reinforcement learning framework,but also effectively corrects the sub-optimal choice through the correction mechanism,reducing the maximum completion time in worst cases.

Function Level Code Vulnerability Detection Method of Graph Neural Network Based on Extended AST

GU Shouke, CHEN Wen

Computer Science. 2023, 50 (6): 283-290. doi:10.11896/jsjkx.220600131

Abstract

PDF(2037KB) ( 334 )

References | Related Articles | Metrics

With the increase of software vulnerabilities year by year,security problems are becoming more and more serious.Vulnerability detection of original code in the delivery stage of software project can effectively avoid security vulnerabilities in later run-time,and the discovery of code vulnerability depends on effective code characterization.The traditional characterization me-thods based on software metrics have weak correlation with vulnerabilities,so it is difficult to characterize vulnerability information efficiently.In recent years,machine learning has provided a new idea for intelligent discovery of vulnerabilities,but this method also has the problem of missing key information of code feature.To solve the above problems,control flow edge,data flow edge and next token edge are added to the traditional abstract syntax tree(AST) to generate an expanded abstract syntax tree (EXAST) graph structure,characterize the original code to better process the code structure information,and the word vector embedding model(word2vec) is used to initialize the code information into a numerical vector that the machine can recognize and learn.At the same time,the gate recurrent unit(GRU) is introduced into the traditional graph neural network(GNN) to build the model,which can alleviate the disappearance of the gradient,enhance the dissemination of long-term information in the graph structure to strengthen the timing relationship of code execution and improve the accuracy of vulnerability detection.Finally,the model is trained and tested on the SARD data sets to realize the function granularity code vulnerability detection,which can improve the accuracy of 32.54% and the F1 score of 44.99 compared with the traditional vulnerability detection method.Experimental results confirm the effectiveness of the method for code vulnerability detection.

Quantum Auction Protocol Based on Semi-quantum Private Comparison

YANG Han, FENG Yan, XIE Sijiang

Computer Science. 2023, 50 (6): 291-296. doi:10.11896/jsjkx.220500063

Abstract

PDF(1496KB) ( 222 )

References | Related Articles | Metrics

A quantum sealed-bid auction protocol using high-energy single particles as information carriers,based on semi-quantum private comparison is proposed to address the situation of insufficient privacy protection for quotations,the untrustworthy third party,and high quantum capability requirements for both participants in quantum sealed-bid auction protocols.This protocol does not involve third parties and uses a semi-quantum approach,requiring only that the auctioneer be a strong quantum capable party and that the bidder only have the ability to reflect particles and prepare single particles.This protocol achieves privacy protection for quotations through semi-quantum private comparison,where the auctioneer only has access to the size relationship between quotations,but not to specific quotations.It is theoretically analyzed that the protocol has high security and can resist the attacks of measure-resend,intercept-resend,entanglement,and collusion.And the communication efficiency of this protocol is stable,which is not affected by the number of bidders.

Machine Learning Based Environment Assumption Automatic Generation for Compositional Verification of SCADE Models

ZHANG Zelun, YANG Zhibin, LI Xiaojie, ZHOU Yong, LI Wei

Computer Science. 2023, 50 (6): 297-306. doi:10.11896/jsjkx.220500207

Abstract

PDF(3264KB) ( 223 )

References | Related Articles | Metrics

Safety critical application development environment(SCADE) is a common tool in industry for modeling,simulation,testing and verification of the safety-critical software.How to solve the state space explosion problem in formal verification of the SCADE model is an important challenge.The contract-based compositional verification method solves this problem by learning the context and external environment of each component of the software,then using environmental assumptions to constrain the state space of components,but manually obtaining of environmental assumptions is time-consuming and labor-intensive.This paper proposes an environment assumption automatic generation method for SCADE model.First,an automatic simulation method is used for the SCADE model to generate the data set required for the machine learning method.Secondly,the decision tree and genetic algorithm are used to generate environmental assumptions automatically.Finally,a prototype tool with SCADE model analysis and environment assumption automatic generation is implemented,and the ejection seat control system is used as a case to verify the effectiveness of the proposed method.

Multi-factor Blockchain Private Key Protection Scheme Based on Secret Sharing

XIAO Jian, YANG Min

Computer Science. 2023, 50 (6): 307-312. doi:10.11896/jsjkx.220600069

Abstract

PDF(1489KB) ( 232 )

References | Related Articles | Metrics

Aiming at the problem that the user's private key is difficult to retrieve once lost due to the lack of a recovery mechanism in the blockchain,a multi-factor blockchain private key protection scheme based on passwords,secret questions and fingerprints is proposed.The scheme does not require users to store additional information and can be implemented completely online,and adopts an anti-forgetting factor access strategy.During the registration phase,users need to provide all factor information(including password,secret question and fingerprint) and blockchain private key,and use a secret sharing scheme to assign a secret share to a group of servers.In the recovery phase,users only need to provide some factors and send recovery applications to multiple servers to obtain the information of their secret shares and reconstruct the private key of the blockchain.Experimental results and heuristic security analysis show that the computing cost of both client and server in this scheme is in milliseconds,and it can resist known attacks and provide better security by supporting multiple factors.

LTTFAD:Log Template Topic Feature-based Anomaly Detection

SUN Xuekui, DAI Hua, ZHOU Jianguo, YANG Geng, CHEN Yanli

Computer Science. 2023, 50 (6): 313-321. doi:10.11896/jsjkx.220500020

Abstract

PDF(2495KB) ( 356 )

References | Related Articles | Metrics

In the field of system security,using logs to detect software of system anomalies is a very popular method.With the rapid development of software and hardware,it is hard to perform manual detection on the huge scale of logs.There has been a lot of researches on log anomaly detection.Existing automatic log anomaly detection approaches are all based on log template,which is unstable when log template is modified.This paper proposes a log anomaly detection model based on topic feature of log template.Firstly,it utilizes an LDA topic model to extract topic feature of log template and implements anomaly detection through LSTM recurrent neural network.Experimental results show that the proposed anomaly detection model outperforms the existing models on HDFS and OpenStack datasets in most metrics,such as the precision,recall and F1 Score.In addition,LTTFAD model still has high stability for new log template injection.

Raft Consensus Algorithm Based on Credit Evaluation Model

LIU Wei, GUO Lingbei, XIA Yujie, SHE Wei, TIAN Zhao

Computer Science. 2023, 50 (6): 322-329. doi:10.11896/jsjkx.220500171

Abstract

PDF(2853KB) ( 272 )

References | Related Articles | Metrics

In the Internet of Vehicles,the sharing and interaction of traffic information is required between vehicle nodes.How-ever,at present,there are still problems such as the difficulty in efficiently updating traffic data information between vehicle nodes and the dissemination of false information by malicious vehicle nodes.Aiming at the above problems,this paper proposes a Raft consensus algorithm based on credit evaluation model(CE-Raft).First,the credit evaluation model is constructed,and Byzantine vehicle nodes are detected and eliminated based on the isolated forest anomaly detection algorithm,and then an honest node number table is generated.Then the leader election is performed,and the honest node is elected as the leader by modifying the voting process of the follower node.Finally,log replication is performed,the leader node send an information synchronization request according to the honest node number table to ensure that the correct message reaches a consensus among the nodes.Experimental results show that the CE-Raft algorithm can effectively exclude Byzantine nodes,improve the prediction accuracy of honest nodes,and has lower latency and higher throughput,so that the Internet of Vehicles can still complete data sharing safely and efficiently in the presence of malicious nodes.

Cybersecurity Threat Intelligence Mining Algorithm for Open Source Heterogeneous Data

WEI Tao, LI Zhihua, WANG Changjie, CHENG Shunhang

Computer Science. 2023, 50 (6): 330-337. doi:10.11896/jsjkx.220700073

Abstract

PDF(1536KB) ( 323 )

References | Related Articles | Metrics

To address the problem of how to efficiently mine threat intelligence from open source network security reports,a TI-NER-based intelligence mining(TI-NER-IM) method is proposed.Firstly,the IoT cybersecurity reports of nearly 10 years are collected and annotated to construct a threat intelligence entity identification dataset.Secondly,in view of the lack of performance of traditional entity recognition models in the field of threat intelligence mining,a threat intelligence entity identification based on self-attention mechanism and character embedding(TIEI-SMCE) model is proposed,which fuses character embedding information.The potential dependency weights between words,contexts and other characteristics are then captured through self-attention mechanism to accurately identify threat intelligence entities.Thirdly,a threat intelligence named entity recognition(TI-NER) algorithm based on TIEI-SMCE model is proposed.Finally,a TI-NER-based intelligence mining(TI-NER-IM) method is designed and proposed.TI-NER-IM method enables automated mining of threat intelligence from unstructured and semi-structured security reports.Eexperimental results show that compared with the BERT-BiLSTM-CRF model,TI-NER-IM's F1 value increases by 1.43%.

Review on Similarity of Business Process Models

JIAN Kaiyu, SHI Yaqing, HUANG Song, XU Shanshan, YANG Zhongju

Computer Science. 2023, 50 (6): 338-350. doi:10.11896/jsjkx.220700061

Abstract

PDF(1453KB) ( 373 )

References | Related Articles | Metrics

With the increase of the scale of business process model management database,traditional model management methods are unable to meet the expectations in terms of efficiency and accuracy,and the technology that can improve the efficiency of business process model management has become an urgent demand.Technology of business process similarity can effectively improve efficiency and accuracy of model analysis in scenarios like model search and consistency judge.Therefore,the research on techno-logy of business process similarity has become a research hotspot in the model analysis field.In recent years,researchers have got many valuable achievements,the technologies of business process similarity have developments in many branches involved in different areas.Although there are comparison of technologies in specific branch,there is a lack of systematic research on technologies of business process model similarity.We analyze the calculations of business process model similarity from these dimensions include text similarity,semantic similarity,structure similarity,behavior similarity and human estimation-based similarity,and summarizes the features of these measurements.We find that the technology of business process model similarity is commonly put into these applications include conformance,standardization,search and reuse,then we analyze the research on these scenarios.At last,the challenges of business process model similarity research are analyzed.

Solving Graph Coloring Problem Based on Grover Algorithm

LIU Xiaonan, LIU Zhengyu, XIE Haoshan, ZHAO Chenyan

Computer Science. 2023, 50 (6): 351-357. doi:10.11896/jsjkx.220400051

Abstract

PDF(3397KB) ( 308 )

References | Related Articles | Metrics

Grover quantum search algorithm is a famous quantum algorithm designed for unstructured search problems.It can be used to solve problems such as graph coloring and shortest path sorting,and can also effectively decipher cryptosystems.Graph coloring problem is one of the most famous NP complete problems.In this paper,the graph coloring problem is transformed into an undirected graph in mathematics,and then it is transformed into a Boolean satisfiability problem by using Boolean expression.The steps and principles of solving Boolean expression in quantum circuit diagram and the transformation process from graph co-loring problem to Boolean satisfiability problem are introduced.Finally,on the IBMQ cloud platform,for the three nodes,2-coloring problem and 4-coloring problem are simulated.Experimental results verify the feasibility of using Grover algorithm to solve the graph coloring problem.In the 2-coloring problem with search space of 8 and the 4-coloring problem with search space of 64,the target items are searched with nearly 82% and 97% success probability respectively.In this paper,Grover algorithm is used to solve the 4-coloring problem,expand the experimental scale of the algorithm in this problem field,and improve the quantum circuit of the existing experiments,so that the qubit cost is lower and the result success rate is higher,which shows the remarkable acceleration effect of Grover algorithm in large-scale search problems.

Research Progress of Multi-agent Path Finding Based on Conflict-based Search Algorithms

WANG Zihan, TONG Xiangrong

Computer Science. 2023, 50 (6): 358-368. doi:10.11896/jsjkx.220800151

Abstract

PDF(1949KB) ( 379 )

References | Related Articles | Metrics

Multi-agent path finding is a classic search problem in the field of artificial intelligence.Conflict-based search algorithm is one of the best algorithms to solve this problem.This paper discusses the basic research of multi-agent path finding,and classifies the research results based on conflict search algorithms and their variants in recent years.According to the improved ways,the variants are divided into four categories,including segmentation strategy improvement,heuristic algorithm,bounded suboptimal algorithm and typical conflict processing.It also introduces the application of the conflict-based search algorithm to the extended problem of multi-agent path finding.Finally,according to the advantages and disadvantages of the current algorithm,the existing challenges are pointed out.In view of these challenges,the possible research directions in the future are given.