Computer Science

Review of Key Technologies,Research Progress and Applications of Metaverse

WANG Wentong, ZHANG Zhijun, ZHANG Mingyang

Computer Science. 2024, 51 (12): 2-11. doi:10.11896/jsjkx.240400166

Abstract

PDF(2719KB) ( 397 )

References | Related Articles | Metrics

With the rapid development of digital technology,metaverse has become one of the focuses of people’s attention.As a new kind of virtual world,the metaverse will redefine the way people live and work.In this paper,the concept and significance of the metaverse are introduced,and the key technologies of the metaverse are studied in depth from the technical characteristics of the metaverse.Specifically,this paper analyzes the six technologies of blockchain,interaction technology,artificial intelligence,Internet of things,computing power and operations,and digital twin,summarizes the research progress,problems and challenges of the key technologies of the metaverse,and prospects the research direction,development trend,and application prospect of the future metaverse.

Review of Digital Twin Based Satellite Network Mobile Edge Computing

SUN Yunhe, WANG Yu, ZHAO Liang, YANG Dongsheng, GUAN Yunchong

Computer Science. 2024, 51 (12): 12-19. doi:10.11896/jsjkx.240700046

Abstract

PDF(1846KB) ( 345 )

References | Related Articles | Metrics

With the rapid development of communication technology,satellite communication has become a crucial part of modern information and communication landscape.Satellite networks,with their extensive coverage,can provide ubiquitous low-latency services to users worldwide.However,the construction of satellite networks still faces significant challenges,such as high satellite design costs,the risks associated with launches,and the expensive testing and maintenance of satellite networks.The rise ofdigi-tal twin(DT) technology presents a perfect synergy with satellite networks,offering powerful data support and decision-making tools for satellite network operations.This paper reviews the research progress on DT-based mobile edge computing in satellite networks.It begins by introducing the limitations of terrestrial base station communications,which leads to the discussion of satellite network applications.Then,the composition of satellite networks,satellite classifications,and the concept of satellite edge computing are thoroughly explained.Subsequently,DT technology is introduced,with a focus on the DT-based satellite edge computing platform and DT-based mobile edge computing algorithms for satellite networks.Finally,this paper summarizes the exis-ting issues in current research and outlines future development directions.

Knowledge-defined Intelligent Traffic Scheduling Mechanism in Computing Network

NIAN Yingpu, YI Bo, LI Peichen, WANG Xingwei, HUANG Min

Computer Science. 2024, 51 (12): 20-29. doi:10.11896/jsjkx.240300064

Abstract

PDF(2721KB) ( 324 )

References | Related Articles | Metrics

At present,the knowledge-defined network empowers the development of AI technology,and the computing power network provides the computing power resources required by AI.The two gradually tend to integrate to form the knowledge defined computing networking(KDCN).KDCN has empowered the development of many new network applications,such as the metaverse,AR/VR,east-west computing.These new applications have a great demand for computing power resources and network resources,and are called heavy hitter(HH).The existence of HH flow seriously aggravates the congestion of KDCN network.In response to this challenge,this paper proposes an intelligent traffic scheduling mechanism,which aims to solve the congestion problem in KDCN through deep Q neural networks.Compared with the offline training process,a real-time closed loop is established between traffic data detection and acquisition,model training,and congestion flow modulation decision-making to realize the online training of the deep Q neural network model.Based on this closed-loop control,the intelligent flow modulation model can realize continuous evolution through continuous learning,and is used to provide real-time decision-making.Experimental results show that the proposed algorithm is superior to the existing methods in resource utilization,throughput,average packet loss rate and so on.

Deep Contrastive Siamese Network Based Repeated Event Identification

LI Zichen, YI Xiuwen, CHEN Shun, ZHANG Junbo, LI Tianrui

Computer Science. 2024, 51 (12): 30-36. doi:10.11896/jsjkx.240300025

Abstract

PDF(1807KB) ( 247 )

References | Related Articles | Metrics

In China,citizens can report issues they encounter in daily life to the government and seek assistance by calling the 12345 citizen hotline.However,many events are reported multiple times,which places significant pressure on the staffs responsible for event allocation,resulting in low efficiency of event disposal and waste of public resources.Identifying repeated events requires precise analysis of textual semantics and contextual relationships.To address this problem,this paper proposes an event repetition identification method based on a deep contrastive siamese network.By evaluating the similarity between the descriptions of events,the method identifies events with the same demands.First,it reduces the number of events through retrieval and filtering.Then,it fine-tunes a pre-trained BERT model through contrastive learning to learn distinct semantic representations of event descriptions.Finally,the event title is introduced as contextual information,and a siamese network with a classifier is used to identify repeated events.Experimental results on the 12345 event dataset of Nantong demonstrate that the proposed method outperforms baseline methods across various evaluation metrics,particularly in the F0.5 score,which is relevant to the repetition task scenario.The proposed method can effectively identify repeated events and improve the efficiency of event handling.

Deterministic Transmission Scheduling Mechanism for Mixed Traffic Flows Towards Digital Twin Networks

WANG Kewen, ZHANG Weiting, LIAO Peixi

Computer Science. 2024, 51 (12): 37-45. doi:10.11896/jsjkx.240200063

Abstract

PDF(4254KB) ( 182 )

References | Related Articles | Metrics

A deterministic transmission scheduling mechanism based on deep reinforcement learning under the digital twin architecture is proposed for the end-to-end transmission of mixed traffic flows in railway operation and maintenance scenarios,namely end-to-end transmission scheduling mechanism for online mixed-traffic(E2ETSM-OMT).Based on the idea of differentiated scheduling strategy,the proposed mechanism divides traffic flows into three categories:monitoring and data collection flow,control and execution traffic flow,and data analysis and business optimization flow,implementing cross domain end-to-end low latency transmission through deterministic technologies.Meanwhile,through model mapping and behavior mapping,the physical space is projected to the virtual space with high precision in all directions.In the digital twin network,after constructing the topology of mixed flows,deep reinforcement learning(DRL) agent makes pre-allocation decisions of transmission path and time slot resources,taking the effect and efficiency into account,so as to reduce scheduling conflicts and resource competition among different traffic flows.Compared with the existing mechanisms,digital twin technologies can realize the mutual mapping between the phy-sical world and the virtual world,realize the application of DRL in non-stationary communication environment,and avoid the loss of service quality caused by exploration in the real network.Simulation results show that the digital twin-oriented deterministic transmission scheduling mechanism achieves high transmission benefits with low end-to-end overall delay while ensuring successful scheduling of mixed traffic flows.

Regression Test Case Prioritization Approach Based on Deep Learning

ZHANG Lizheng, YANG Qiuhui, LI Xingjia, DAI Shengxin

Computer Science. 2024, 51 (12): 46-52. doi:10.11896/jsjkx.231000147

Abstract

PDF(2241KB) ( 209 )

References | Related Articles | Metrics

Prioritizing test cases in regression testing can expedite the detection of code defects,save testing time and resources,and enhance testing efficiency.However,existing test case prioritization methods often fail to consider both code change information and test case execution history simultaneously,and they do not adequately account for differences in the length of test case execution history,resulting in poor prioritization outcomes.To address these issues,this paper introduces a deep learning-based approach for prioritizing regression test cases.Initially,it constructs classification models based on code change information and historical execution data separately.Subsequently,it identifies classes affected by code changes using inter-class relationship graphs and classifies test cases belonging to these classes,as well as those that have recently exposed defects.Finally,it employs classification models and heuristic sorting method to prioritize the test cases,followed by merging the sorted results through an iterative process.Experimental results on 6 projects selected from the preprocessed RTPTorrent dataset demonstrate that:1)in scenarios without time constraints,the proposed approach achieves impressive prioritization results across all projects,with an APFD of 0.972 on the cloudify project;2)under time-constrained conditions,the proposed approach outperforms popular existing prioritization methods in terms of NAPFD metrics.

DeepGenFuzz:An Efficient PDF Application Fuzzing Test Case Generation Framework Based on Deep Learning

LIU Jiahao, JIANG He

Computer Science. 2024, 51 (12): 53-62. doi:10.11896/jsjkx.231100179

Abstract

PDF(2899KB) ( 162 )

References | Related Articles | Metrics

PDF file is a widely used and important document format.Due to the complexity of PDF files,defects in PDF-related applications can lead to serious consequences such as malicious attacks and incorrect information rendering.Therefore,testing PDF-related applications has become a hot research topic.The most effective method currently is grammar-based fuzz testing,but it often requires a significant amount of manual work to summarize and write complex grammar rules,which seriously hinders the efficient automation of test case generation.Deep learning techniques provide a feasible solution to this challenge.However,the quality of test cases generated by current methods is generally low,and the ability to find bugs is poor.To further improve this,three main challenges need to be addressed:data set filtering,balancing test case coverage improvement and test case size increase,and efficient mutation of test cases.Therefore,this paper proposes a deep learning-based efficient PDF application fuzz test case generation framework called DeepGenFuzz.It utilizes models such as CNN,Seq2Seq,and Transformer to generate high-quality PDF test cases through steps including data filtering,object generation,object appending,and efficient mutation.Evaluations on PDF applications like MuPDF show that DeepGenFuzz generates test cases with significantly higher average code coverage compared to state-of-the-art tools like Learn&Fuzz and IUST-DeepFuzz,reaching up to 8.12% ~ 61.03%.Its bug-finding capabilities are also far superior to those of Learn&Fuzz and IUST-DeepFuzz. Currently,31 previously unreported bugs have been discovered in the seven most popular PDF applications,among which 25 have been confirmed or fixed,covering all tested programs.

Automatic Test Case Generation Method for Automotive Electronic Control System Verification

LI Zhanqi, WU Xinwei, ZHANG Lei, LIU Quanzhou, XIE Hui, XIONG Deyi

Computer Science. 2024, 51 (12): 63-70. doi:10.11896/jsjkx.240900093

Abstract

PDF(2326KB) ( 170 )

References | Related Articles | Metrics

With the development of “software-defined vehicles”,the complexity of automotive software functions and the demand for rapid development have imposed higher requirements on the verification of electronic control systems.Currently,the development of test flow charts for electronic control system software functions mainly relies on manual methods,which are inefficient and susceptible to human factors.This paper details the task and challenges of automatic test case generation in automotive electronic control system verification and proposes an automatic test flow chart generation method based on large language models(LLM) to improve development efficiency and reduce labor costs.The method includes constructing domain task datasets and selecting appropriate LLM application routes.The study explores the advantages and disadvantages of two technical routes:traditional language model fine-tuning and LLM API adaptation.Experiments validate the performance of different LLM APIs in test case generation tasks and the effectiveness of prompt engineering techniques in enhancing LLM API performance.In summary,this paper proposes an efficient method for automatically generating automotive test flow charts,demonstrating the potential of LLMs in improving the efficiency of automotive software testing.

SSFuzz:State-sensitive Greybox Fuzzing for Network Protocol Services

LIN Jiahan, RAN Meng, PENG Jianshan

Computer Science. 2024, 51 (12): 71-78. doi:10.11896/jsjkx.231000018

Abstract

PDF(2053KB) ( 207 )

References | Related Articles | Metrics

The vulnerability of network protocol services,as the interface for personal devices to interact with the Internet,poses a serious threat to users’ privacy and information security.The state-of-the-art network protocol grey-box fuzzy testing tools introduce state feedback on the basis of code coverage,which further filters effective variant seeds by analysing the state information of network protocol services.However,different fuzz testing tools have different definitions of network protocol service state,e.g.,AFLNET extracts state by analysing the contents of server response packets,and StateAFL defines long-lived memory as program state.For state collection,SGFuzz identifies assignment statements of state variables and inserts stakes by analysing Enum type data definitions.However,SGFuzz cannot identify the indirect assignment statements of state variables,and the identification of state variables is not comprehensive.Meanwhile,when constructing state machines,different fuzzy testing techniques have different definitions of state machine nodes,making it difficult to use multiple state collection strategies on the same fuzzy testing tool at the same time.In addition,in terms of experimental design,existing schemes tend to compare the code cove-rage situation over the same period of time.However,the growth of code coverage is affected by various factors,such as throughput,seed screening strategies,etc.Code coverage experiments within the same time are suitable for comparison between different fuzzy testing tools,not for improvement experiments of individual modules in them.In this paper,we propose SSFuzz.Specifically,SSFuzz first investigates the state-variable based staking approach,which identifies the indirect assignment method of state-variable assignment based on the abstract syntax tree information during the code compilation process,and is able to stake state-variable assignment statements more accurately.Secondly,SSFuzz defines the state machine for guiding state screening,which is able to facilitate the co-construction of state machines by different state feedback strategies.Experiments show that SSFuzz enables staking of most network protocol services,and compared to SGFuzz,indirect assignment statements.In addition,we discuss experimental methods suitable for evaluating the effectiveness of state machines and demonstrate that SSFuzz is able to achieve higher path coverage with a smaller number of test samples.

Ensemble Learning Based Open Source License Detection and Compatibility Assessment

BAI Jianghao, PIAO Yong

Computer Science. 2024, 51 (12): 79-86. doi:10.11896/jsjkx.231200100

Abstract

PDF(2056KB) ( 144 )

References | Related Articles | Metrics

The quality and evolution of software are profoundly influenced by the security and reliability of the software supply chain.An essential element of this chain is the analysis of licenses associated with different software components.Open source licenses play a vital role in defining conditions for using open source software,safeguarding intellectual property,and ensuring the sustained development of open source projects.To mitigate legal risks and protect against property losses,it is imperative to accurately identify open source software licenses and assess their compatibility.In this paper,we propose an innovative method for detecting open source licenses using ensemble learning,complemented by a recommendation system based on compatibility.Our main approach leverages ensemble learning techniques,particularly emphasizing the use of large language models.To bolster the accuracy of open source license detection,this methodology is augmented with rule matching.Subsequently,compatibility assessments and license recommendations are derived using directed graph algorithms.Experimental results validate the effectiveness of our method,showcasing not only reduced maintenance costs and heightened scalability but also superior detection performance in comparison to traditional methods.The proposed approach excels in identifying compatibility issues and provides dependable recommendations,thereby contributing to a more secure and reliable software supply chain.

Value Assessment System Oriented for Open-source Software Developers and Its Empirical Research

YOU Lan, TIAN Mingyan, ZHOU Ye, CHEN Zhijun, WANG Wei, JIN Hong, ZENG Xing, CUI Haibo

Computer Science. 2024, 51 (12): 87-99. doi:10.11896/jsjkx.240100169

Abstract

PDF(1632KB) ( 153 )

References | Related Articles | Metrics

Assessing the value of open-source software developers in a scientific and objective manner is an important issue in the open-source field.Existing research methods encounter challenges,including the use of limited evaluation metrics and the complexities associated with determining metric weights.To mitigate these issues,this paper proposes a multi-dimensional and multi-level assessment system for open-source software developers.The system is informed by an analysis of big data from the open-source ecosystem and combines both subjective and objective evaluation methods.By considering developers’ performance in project management,programming,team collaboration,learning,and dedication,the proposed system comprehensively and objectively assesses their values using five primary indicators,twelve secondary indicators,and seven tertiary indicators.The Critic method is employed in this paper to determine the weights of various dimensions,overcoming the issue of low accuracy caused by experiential weights.Finally,multiple empirical studies are conducted using GitHub’s 2020 global open-source ecosystem data to validate the effectiveness and feasibility of the open-source community developer value assessment system.This research provides an objective,scientific,and practical method for measuring the talent,discovery,and management of open-source software developers.The experimental code can be Obtained from theGithub platform¹⁾.

Tile Selection Algorithm Based on Data Locality

LIAO Qihua, NIE Kai, HAN Lin, CHEN Mengyao, XIE Wenbing

Computer Science. 2024, 51 (12): 100-109. doi:10.11896/jsjkx.231100060

Abstract

PDF(2336KB) ( 154 )

References | Related Articles | Metrics

The existing polyhedral compilation frameworks(such as Pluto,LLVM/Poly and GCC/Graphite) use fixed block sizes when performing loop tiling,which cannot fully utilize the caching characteristics of different hardware,resulting in significant performance differences.In response to this issue,many loop tiling algorithms based on multi-level caching and data locality have emerged,but these algorithms often only optimize specific loop programs or lack comprehensive consideration,and are not suitable for transplantation into general compilers.This paper proposes a tile size selection algorithm based on data locality,which not only considers the impact of cache replacement strategy,but also considers the load balancing problem in multi-core environments.The algorithm is implemented based on the Polly module in LLVM,and some test cases from Pluto and PolyBench are selected for single core and multi-core testing.The experimental results show that compared to the default partitioning method of LLVM/Polly,the proposed algorithm achieves an average acceleration ratio of 2.03 and 2.05 on two hardware platforms in a single core environment,and has good parallel scalability in a multi-core environment.

Optimisation of Automatic Matrix Multiplication Mixing Accuracy Based on Polyhedral Models

HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, XU Jinchen

Computer Science. 2024, 51 (12): 110-119. doi:10.11896/jsjkx.230800106

Abstract

PDF(3102KB) ( 149 )

References | Related Articles | Metrics

Mixed precision is a numerical computation technique in computers that improves the efficiency of computation by converting some of the data types in the computation from high precision to low precision.Matrix multiplication has an important and wide application in computer science and mathematics,using mixed precision techniques in matrix multiplication to speed up the computational process is a challenging task.Existing mixed-precision optimisation suffers from several problems,such as high storage overhead,having to be implemented on specific hardware units,limiting the deployment options of models or algorithms and reducing their portability.In the face of the above problems,this paper proposes and implements an automatic mixed-precision code generation tool based on polyhedral models AGMMMPC.By adding the low-precision-by-high-precision-plus-basic-mixed-precision matrix multiplication code generation functionality to the source-source PPCG compiler,and using the precision tuning(PT) algorithm to find high-frequency errors in the basic mixed-precision computation.These points are processed by high precision calculation method,while the rest are processed by the basic mixed precision calculation method,which effectively reduces the error in the basic mixed precision calculation,and realises the automatic generation of source-to-source mixed-precision code for matrix multiplication for the first time.Experiments show that the maximum acceleration ratio of the advanced mixed-precision code generated by AGMMMPC is 1.39 and the geometric mean acceleration ratio is 1.14 on X86 platform with high-precision computation as the benchmark.

Study on Distributed Training Optimization Based on Hybrid Parallel

XU Jinlong, LI Pengfei, LI Jianan, CHEN Biaoyuan, GAO Wei, HAN Lin

Computer Science. 2024, 51 (12): 120-128. doi:10.11896/jsjkx.231200128

Abstract

PDF(2384KB) ( 141 )

References | Related Articles | Metrics

Large-scale neural network training is a hot topic in the field of deep learning,and distributed training stands out as one of the most effective methods for training large neural networks across multiple nodes.Distributed training typically involves three parallel methods:data parallelism,inter-layer parallelism,and intra-layer parallelism.However,in existing frameworks,manual model partitioning is required for inter-layer parallelism,which increases the abstract complexity of model design.To address this issue,we propose a node-constrained relationship search algorithm that automates the model partitioning process.Moreover,in traditional data parallelism and inter-layer parallelism,strict serialization limits the overlap of computation and communication due to complex model constraints and the need for communication operations.To overcome this challenge,we introduce a synchronous optimization algorithm,enabling the overlap of computation and communication and effectively enhancing the overall training efficiency.The experiments involve training GPT-2 of different sizes,AlexNet,VGG16,and ResNet50 models.Using the synchronous optimization algorithm under a 6-node configuration,the training performance of GPT2-XL,GPT2-LARGE,and GPT2-MEDIUM models is improved,achieving speed-ups of 1.14,1.18,and 1.23,respectively.Under 1-node configuration,performance enhancements are also observed for AlexNet,VGG16,and ResNet50 models,with speed-ups of 1.31,1.14,and 1.03,respectively.The experimental results indicate that the synchronous optimization algorithm effectively enhances the training efficiency in mixed parallelism.

Automatic Pipeline Parallel Training Framework for General-purpose Computing Devices

ZHONG Zhenyu, LIN Yongliang, WANG Haotian, LI Dongwen, SUN Yufei, ZHANG Yuzhi

Computer Science. 2024, 51 (12): 129-136. doi:10.11896/jsjkx.231000110

Abstract

PDF(2050KB) ( 150 )

References | Related Articles | Metrics

Training large-scale neural networks usually exceeds the memory and computing capacity of a single computing node,which requires distributed training using multiple nodes.Existing distributed deep learning frameworks are mainly designed for specific hardware environments and cannot effectively adapt to various general-purpose computing devices.To support the efficient training of large-scale deep neural networks,this paper implements a general-purpose automatic pipeline parallel distributed training framework.This framework combines the model parallel strategy based on pipeline parallelism with the algorithm that automatically splits the neural network model,and realizes the automatic parallelization and training of large-scale neural network models and training data on general computer clusters,including the new generation of supercomputers in China,significantly reducing the memory and computing pressure of a single computing node.The framework does not require manual adjustment,and can automatically and efficiently deploy deep neural networks to multi-node distributed environments.It is not only suitable for supercomputers and other high-performance computer clusters,but also can be deployed to other general distributed computing environments,providing support for the automatic distributed training of large-scale neural networks.

Efficient Task Flow Parallel System for New Generation Sunway Processor

FU You, DU Leiming, GAO Xiran, CHEN Li

Computer Science. 2024, 51 (12): 137-146. doi:10.11896/jsjkx.231100135

Abstract

PDF(4597KB) ( 160 )

References | Related Articles | Metrics

China’s independently developed next-generation Sunway supercomputer features a more powerful memory system and higher computational density compared to its predecessor,the Sunway TaihuLight.Its primary programming model remains the bulk synchronous parallelism(BSP) model.The sequential task flow(STF) model,based on data flow information,automates the task parallelization of serial programs and achieves asynchronous parallelism through fine-grained synchronization between tasks.Compared to the global synchronization of the BSP model,STF offers higher parallelism and more balanced load distribution,providing users with a new option for efficiently utilizing the Sunway platform.However,on many-core systems,the runtime overhead of the STF model directly impacts the performance of parallel programs.This paper first analyzes two characteristics of the new Sunway processor that affect the efficient implementation of the STF model.Then,leveraging the unique features of the processor architecture,it proposes an agent-based dataflow graph construction mechanism to meet the modeling requirements and a lock-free centralized task scheduling mechanism to optimize scheduling overhead.Finally,based on these technologies,an efficient task flow parallel system is implemented for the AceMesh model.Experiments show that the implemented task flow parallel system has significant advantages over traditional runtime support,achieving a maximum speedup of 2.37 times in fine-grained task scenarios;the performance of AceMesh exceeds that of the OpenACC model on the Sunway platform,with a maximum speedup of 2.07 times for typical applications.

Load Prediction Method of Cloud Resource Based on v-Informer

YOU Wenlong, DENG Li, LI Ruilong, XIE Yuxin, REN Zhengwei

Computer Science. 2024, 51 (12): 147-156. doi:10.11896/jsjkx.231000098

Abstract

PDF(3964KB) ( 144 )

References | Related Articles | Metrics

Cloud computing technology is widely used at present.With the increase in the number of users,the allocation and management of cloud computing resources is becoming more and more important,and accurate load prediction is an important basis for allocation and management.Based on the Informer model,this paper proposes a long-term CPU load prediction method for high dynamic cloud platform tasks,called v-Informer.v-Informer decomposes the variation trend in the load sequence through va-riational mode decomposition,and introduces a multi-head self-attention mechanism to capture the long-term dependence and local nonlinear relationship.At the same time,the gradient concentration technique is used to improve the optimizer and reduce the computational cost.Experiments are carried out on the data of Microsoft and Google cloud platforms.The results show that,compared with the existing CPU load prediction models LSTM,Transformer,TCN and CEEMDAN-Informer,the prediction error of v-Informer is reduced by 34%,19%,15% and 6.5% respectively on the Google dataset.The prediction error on the Microsoft dataset is reduced by 32%,16%,12% and 7% respectively,with better prediction accuracy.

Vehicle Trajectory Prediction Based on Spatial-Temporal Graph Attention Convolutional Network

YUAN Jing, XIA Ying

Computer Science. 2024, 51 (12): 157-165. doi:10.11896/jsjkx.231100145

Abstract

PDF(2122KB) ( 184 )

References | Related Articles | Metrics

Vehicle trajectory prediction is a crucial technology in fields such as traffic management,intelligent-car,and autonomous driving.Accurately predicting vehicle trajectories contributes to safe driving.In urban traffic scenarios,the spatial- temporal features of vehicle trajectory data are complex and variable.To fully capture the dynamic spatial-temporal correlations in the data,enhance trajectory prediction accuracy,and simultaneously reduce model complexity,this paper proposes a spatial- temporal graph attention convolutional network(STGACN).It utilizes a trajectory information embedding module to transform historical vehicle trajectory data into spatial-temporal graphs.Subsequently,it extracts and combines temporal and spatial features of trajectory data through stacked spatial-temporal convolution blocks.Finally,encoding and decoding are performed by gated recurrent units to obtain the predicted trajectory.The model employs a gated convolutional network composed of dilated causal convolutions and gating units to extract temporal features,avoiding the redundant iterations introduced by recurrent neural network.The fusion of spatial- temporal features in the spatial-temporal convolution blocks group enables the model to focus on richer scene features.This results in a model with fewer parameters,faster trajectory prediction inference speed,and improved prediction accuracy.Experiments are conducted on real trajectory datasets,including Argoverse and NGSIM,and the results demonstrate that the proposed STGACN model exhibits higher prediction accuracy and efficiency than the compared baseline models.

GBDEN:A Fast Clustering Algorithm for Large-scale Data Based on Granular Ball

XUE Renxuan, YI Shichao, WANG Pingxin

Computer Science. 2024, 51 (12): 166-173. doi:10.11896/jsjkx.240600002

Abstract

PDF(4604KB) ( 139 )

References | Related Articles | Metrics

Clustering is a technique used to partition the objects in a dataset into groups or clusters based on their similar features,aiming to form groups where objects within each group are more similar to each other than to those in other groups.Density-based clustering is one of the unsupervised clustering methods that does not require the number of clusters to be specified in advance.On the contrary,it adaptively determines the clusters based on the density of the data.Compared to methods like K-MEANS,density-based clustering is less sensitive to the selection of initial points.It also can produce more robust and reliable clustering results.Among various density-based clustering algorithms,DENCLUE(DENsity-based CLUstEring) utilizes a hill-climbing approach,which is grounded in a solid mathematical foundation.At the same time,it performs well in datasets with considerable noise,allowing clustering of arbitrarily shaped clusters in high-dimensional datasets.However,processing large-scale datasets with DENCLUE requires significant computational resources and time.To address this challenge,this paper proposes a fast clustering algorithm for large-scale data based on granular ball.This involves creating a coarse-grained granular ball initially,which is then refined into fine-grained granular balls.These granular balls served as input for the DENCLUE algorithm for clustering.Experimental findings demonstrate the effectiveness of this approach across multiple datasets.

Multitask Classification Algorithm of ECG Signals Based on Radient Magnitude Direction Adjustment

ZHANG Xue, TIAN Lan, ZENG Ming, LIU Junhui, ZONG Shaoguo

Computer Science. 2024, 51 (12): 174-180. doi:10.11896/jsjkx.230800083

Abstract

PDF(2792KB) ( 154 )

References | Related Articles | Metrics

Cardiovascular diseases are posing more and more serious threats to human health and safety.ECG signals can be used to diagnose and classify related diseases.Most existing ECG classification algorithms adopt single-task learning model,which can not make comprehensive use of complementary features in multiple tasks.However,multi-task learning model can learn multiple related tasks at the same time,share related task features,and help improve the classification performance of multiple tasks.Combining deep learning and multi-task learning,a multi-task classification algorithm for ECG signals based on loss optimization is proposed.The multi-classification task of ECG signals is decomposed into multiple binary classification tasks,and loss optimization is carried out from the aspects of the amplitude and direction of task gradient,so as to avoid the negative transfer caused by manual setting of task loss weights and the cancellation of task losses.The performance of ECG signal multi-classification task is improved.The model uses PTB-XL database to decompose 23 classification tasks into 23 binary classification tasks to evaluate the proposed algorithm.Experimental results show that the average area under the macro curve(AUC) reaches 0.950,the accuracy reaches 96.50%,the tag-based F1 score reaches 0.583,and the sample-based F1 score reaches 0.777.Compared with the single-task learning algorithm,the proposedalgorithm shows good performance in the multi-classification of ECG signals.

Millimeter Wave Radar Human Activity Recognition Algorithm Based on Feature Fusion

HAN Chong, FAN Weibei, GUO Ao

Computer Science. 2024, 51 (12): 181-189. doi:10.11896/jsjkx.231200170

Abstract

PDF(2199KB) ( 168 )

References | Related Articles | Metrics

The human activity recognition method based on millimeter-wave radar captures the electromagnetic wave signals of human activities in non-contact way for recognition.It is not easily interfered by smoke and light,which has a certain degree of privacy protection,and has become a research hotspot at present.However,the existing algorithms have some problems,such as single feature input,complex model structure,and insufficient generalization ability verification.A human activity recognition algorithm with two stream feature fusion convolutional neural network is proposed,named 2S-FCNN,which uses the residual neural network equipped with attention mechanism as the backbone network,inputs the time-distance image and the time-velocity image in parallel,and uses the feature weighted score fusion method to fuse the features for classification and recognition,so as to achieve a high recognition accuracy.A set of in-depth comparative experiments are conducted with other existing algorithms on both public and self built datasets,and the experimental results show that the proposed algorithm has good performance in recognition rate and generalization ability.

Urban Illegal On-road Parking Detection Algorithm for High Dynamic Video Scenarios

CHENG Lianghua, HUANG Ruixue, SHEN Xin

Computer Science. 2024, 51 (12): 190-198. doi:10.11896/jsjkx.231100096

Abstract

PDF(3188KB) ( 172 )

References | Related Articles | Metrics

The increasing parking conflicts have led to serious parking violations on urban roads,posing a huge safety hazard to urban traffic.Therefore,timely and effective monitoring and handling of illegal parking events is essential to ensure urban traffic safety.However,existing illegal parking monitoring methods based on manual patrolling and fixed-point surveillance cameras have disadvantages such as low efficiency and limited monitoring range,which makes it difficult to meet the demand for large-scale urban monitoring.As an emerging sensing paradigm,vehicular crowdsensing can provide promising opportunities for large-scale and low-cost urban parking monitoring by motivating users to collect road videos while driving and upload them to the cloud.However,the complexity of in-vehicle video scenes,which leads to a high loss of vehicle target tracking and high complexity of parking judgment,poses a serious challenge to achieving accurate illegal on-road parking detection.To solve the above challenges,we propose an urban illegal on-road parking detection algorithm for high dynamic video scenarios.Specifically,first,we obtain vehicle image information across video frames through multi-vehicle target tracking on in-vehicle videos,Then,we convert the target vehicle image information into relative distance changes in real scenes through dynamic visual ranging and integrate it with the inter-vehicle movement to achieve the judgment of illegal parking.Finally,the performance of the proposed algorithm is evaluated based on the road dataset in Chongqing City.Experimental results show that the proposed algorithm achieves a detection accuracy of 87.1% for illegal parking vehicles,which is 21.9% higher than three baselines on average,and it shows excellent detection performance in different illegal parking scenarios.

Hyperspectral Image Denoising Combining Group Sparse and Representative Coefficient Bidirectional Spatial Spectral Total Variation

SI Weina, YE Jun, JIANG Bin

Computer Science. 2024, 51 (12): 199-208. doi:10.11896/jsjkx.231000187

Abstract

PDF(6627KB) ( 149 )

References | Related Articles | Metrics

Hyperspectral image denoising is a fundamental problem in remote sensing field,which is an important step of preprocessing.Denoising method based on total variation of representative coefficients is widely used in hyperspectral image(HSI) denoising.Representative coefficient matrix U inherits prior information of clean HSI,which can achieve global low rank and reduce computational complexity.However,due to the introduction of first-order total variational,this method produces a strong step effect in the process of denoising and ignores the common features between different bands,so the denoising effect is poor.To solve this problem,a new regularized denoising model of joint group sparse and representative coefficient bidirectional spatial spectral total variational(RCBGS) is proposed.By introducing high-order total variational,the step effect is alleviated,and the weighted $\ell$_2,1 norm is introduced into the difference of subspace to fully explore the common features of different bands except global low rank,and improve the intrinsic group sparsity and overall smoothness of HSI.Finally,the iterative rules of the proposed method are given by alternate direction multiplier method(ADMM),and the evaluation index peak signal-to-noise ratio of the proposed method is improved by 8.79% on average compared with the comparison methods.Experiments on simulated and real datasets show that the proposed method outperforms relative methods in both visual quality and quantitative evaluation.

Study on Text-based Personality Detection－A Review

ZHU Yangfu, LI Meiling, TAN Jiachen, WU Bin

Computer Science. 2024, 51 (12): 209-222. doi:10.11896/jsjkx.240500071

Abstract

PDF(2307KB) ( 177 )

References | Related Articles | Metrics

Text-based personality detection is an important research content in the personality computing field,aiming to analyze the implicit personality traits in user-generated text.With the booming of social networks,people are accustomed to posting online content that implies their psychological activities,which provides new opportunities for text-based personality detection.Accurately detecting personality traits is important in psychological health diagnosis,public opinion monitoring,human-computer interac-tion system design,and even in the construction of large language models today.This paper provides a comprehensive review of text-based personality detection.Firstly,it introduces the background and task patterns of personality detection.Secondly,the existing detection methods are categorized into four aspects:psycholinguistic statistical methods,feature engineering me-thods,deep learning methods,and pre-trained language models.Then,the commonly used datasets and model performance are summarized.Finally,the issues and future research in this field are analyzed from five aspects:reliability,fairness,ethical and privacy,the unification of dataset and evaluation metrics,and the relationship between large language models and personality.

Large Language Model-based Method for Mobile App Accessibility Enhancement

MA Qimin, LI Xiangmin, ZHOU Yaqian

Computer Science. 2024, 51 (12): 223-233. doi:10.11896/jsjkx.240400077

Abstract

PDF(3844KB) ( 164 )

References | Related Articles | Metrics

Mobile application accessibility refers to the degree to which mobile applications are designed and implemented to ensure that any user can easily access the application.However,only a small fraction of the vast number of applications in the domestic mobile application market support accessibility features,which contradicts to the vision of breaking the digital divide and enjoying the benefits of the digital age for the growing elderly and visually impaired population.Large language models(LLMs) have demonstrated significant potential for achieving human-level intelligence.Through prompts guidance,they can engage in simple logical reasoning and decision-making.In addition,shortening the interactive pathway is an intuitive strategy for enhancing mobile application accessibility.Inspired by the aforementioned facts,we propose an innovative method for enhancing mobile application accessibility based on LLMs.This method creatively applies accessibility services and LLMs,aiming to improve security,automation,and intelligence.We have implemented a mobile application accessibility tool called AccessLink.Under the premise of non-invasiveness and user authorization,AccessLink perceives and interacts with the graphical user interface of mobile applications.Additionally,we have developed a dataset construction approach based on automated methods.Experimental validation is conducted using the constructed dataset with large models such as GPT-3.5,GPT-4.0,QianWen and Baichuan,demonstrating the effectiveness of the proposed method.

Joint Extraction of Entities and Relations Based on Word-Pair Distance Embedding and Axial Attention Mechanism

ZHANG Mengying, SHEN Hailong

Computer Science. 2024, 51 (12): 234-241. doi:10.11896/jsjkx.231100023

Abstract

PDF(1932KB) ( 147 )

References | Related Articles | Metrics

The joint extraction of entities and relations provides key technical support for the construction of knowledge graphs,and the problem of overlapping relations has always been the focus of joint extraction model research.Many of the existing me-thods use multi-step modeling methods.Although they have achieved good results in solving the problem of overlapping relations,they have produced the problem of exposure bias.In order to solve the problem of overlapping relations and exposure bias at the same time,a joint entities and relations extraction method(DE-AA) based on word-pair distance embedding and axial attention mechanism is proposed.Firstly,the table features of the representative word-pair relation are constructed,and the word-pair distance feature information is added to optimize its representation.Secondly,the axial attention model based on row attention and column attention is applied to enhance the table features,which can reduce the computational complexity while fusing the global features.Finally,the table features are mapped to each relation space to generate the relation-specific word-pair relation table,and the table filling method is used to assign labels to each item in the table,and the triples are extracted by triple classification.The proposed model is evaluated on the public datasets NYT and WebNLG.Experimental results show that the proposed model achieves better performance than other baseline models,and has significant advantages in dealing with overlapping relations or multiple relations.

Multi-modal Dual Collaborative Gather Transformer Network for Fake News Detection

XIANG Wang, WANG Jinguang, WANG Yifei, QIAN Shengsheng

Computer Science. 2024, 51 (12): 242-249. doi:10.11896/jsjkx.231000057

Abstract

PDF(2003KB) ( 149 )

References | Related Articles | Metrics

Social media platforms are convenient platforms for people to share information,express opinions,and exchange ideas in their daily lives.With the increasing number of users,a large amount of data has emerged on social media websites.However,the authenticity of the shared information is difficult to be guaranteed due to users’ lack of verification.This situation has led to the widespread dissemination of a large amount of fake news on social media.However,existing methods suffer from the follo-wing limitations:1)Most existing methods rely on simple text and visual feature extraction,concatenating them to obtain multimodal features for detecting fake news,while ignoring the fine-grained intrinsic connections within and between modalities,and lacking retrieval and filtering of key information.2)There is a lack of guided feature extraction among multimodal information,with insufficient interaction and understanding between textual and visual features.To address these challenges,a novel multimodal dual-collaborative gather transformer network(MDCGTN) is proposed to overcome the limitations of existing methods.In the MDCGTN model,textual and visual features are extracted using a text-visual encoding network,and the obtained features are input into a multimodal gather transformer network for multimodal information fusion.The gathering mechanism is used to extract key information,fully capturing and fusing fine-grained relationships within and between modalities.In addition,a dual-collaborative mechanism is designed to integrate multimodal information in social media posts,enhancing interaction and understanding between modalities.Extensive experiments are conducted on two publicly available benchmark datasets.Compared to existing state-of-the-art benchmark methods,the proposed MDCGTN method achieves significant improvement in accuracy,demonstrating its superior performance in detecting fake news.

Short Text Semantic Matching Strategy Fusing Sememe Similarity Matrix and Dual-channel of Char-Word Vectors

LIU Dongxu, DUAN Liguo, CUI Juanjuan, CHANG Xuanwei

Computer Science. 2024, 51 (12): 250-258. doi:10.11896/jsjkx.231100147

Abstract

PDF(2255KB) ( 143 )

References | Related Articles | Metrics

The purpose of the short text semantic matching task is to judge whether the semantics of two short text sentences are consistent.However,many existing methods often have shortcomings such as insufficient semantic information of short text and inability to effectively identify synonyms.In response to these shortcomings,this paper proposes a short text semantic matching strategy that fuses sememe similarity matrix and dual-channel of char-word vectors.Firstly,the pre-trained model Bert is used to encode the input sentence pairs;for the word-level semantic information in the sentence,the FastText model is used to train and obtain the word vector of the text,and the BiLSTM model is added to further extract the contextual semantic information.Se-condly,making effective use of the semantic information,multi-head attention and co-attention for interactive calculation of separation vectors are added to the above-mentioned dual-channel.And the semantic similarity matrix is integrated into the attentions respectively.Finally,infer the semantic consistency according to the above vectors.The effectiveness of the above algorithm is proved by experiments on the financial dataset BQ and the open domain dataset LCQMC.

Feature-weighted Counterfactual Explanation Method:A Case Study in Credit Risk Control Scenarios

WANG Baocai, WU Guowei

Computer Science. 2024, 51 (12): 259-268. doi:10.11896/jsjkx.240300047

Abstract

PDF(1736KB) ( 152 )

References | Related Articles | Metrics

The application of machine learning technology in the financial field is becoming more and more prevalent,and providing interpretable machine learning methods to users has become an important research topic.In recent years,counterfactual explanation has attracted widespread attention,which improves the interpretability of machine learning models by providing perturbation vectors to change the predicted results obtained by classifiers.However,existing methods face feasibility and operability issues in generating counterfactual instances.This paper proposes a new counterfactual explanation framework that introduces the concept of feature-variable cost weight matrix,considering the ease of changing different feature variables to make the counterfactual results more realistic and feasible.At the same time,by predefining the feature-variable cost weight matrix by experts,a feasible method for calculating the cost weight of feature variables is proposed,allowing users to make personalized adjustments according to actual situations.The defined objective function comprehensively considers three indicators:feature-weighted distance,sparsity,and proximity,ensuring the feasibility,simplicity,and closeness to the original sample set of counterfactual results.Genetic algorithms are used to solve the problem and generate the optimal action plan.Through experiments on real datasets,it is confirmed that our method can generate feasible and actionable counterfactual instances compared to existing counterfactual me-thods.

Motif Based Hybrid-order Network Consensus for Multi-agent Systems with Trade-off Parameter Adaptation

XIE Guangqiang, WU Yebin, LI Yang

Computer Science. 2024, 51 (12): 269-276. doi:10.11896/jsjkx.231100146

Abstract

PDF(2884KB) ( 143 )

References | Related Articles | Metrics

Making full use of the high-order information in the multi-agent network structure can effectively enforce the multi-agent consensus.The algorithm proposed by motif-aware weighted multi-agent system(MWMS) focuses on the extraction of connection information in the complex network,ignoring the fragment information in the network,resulting in a large difference in the convergence effect of MWMS when taking different balance parameter values.To address the aforementioned issues,this paper proposes an alpha-adaptive motif-aware weighted multi-agent system(AMWMS) to reveal the regulatory patterns of balance parameters for MASs in hybrid-order networks.Firstly,this paper proposes methods for quantifying the degree of high-order network fragmentation based on Jaccard similarity and the degree of low-order network fragmentation based on relative distance,which are used for modeling different layer network fragment information.Secondly,an adaptive parameter generation hybrid-order network(APGHNet) is designed,and its balance parameter can adaptively change during system evolution.Finally,this paper proposes a motif-aware weighted multi-agent consensus protocol with trade-off parameter adaptation.Simulation results show that the balance parameter adaptive method of the new protocol is effective by comparing with the consistency protocol in MWMS,and the system can eventually converge to fewer clusters to enforce the system consensus.

Novel Probability Distribution Update Strategy for Distributed Deep Q-Networks Based on Sigmoid Function

GAO Zhuofan, GUO Wenli

Computer Science. 2024, 51 (12): 277-285. doi:10.11896/jsjkx.240500082

Abstract

PDF(5959KB) ( 136 )

References | Related Articles | Metrics

Based on expected value DQN,distributed deep Q network(Dist-DQN) can solve the stochastic reward problem in complex environments by continuing discrete action reward into an interval and continuously updating the probability distribution of support intervals.The distribution update strategy of reward probability,as an important function for Dist-DQN implementation,significantly affect the learning efficiency of agents in the environment.A new Sig-Dist-DQN probability distribution update strategy is proposed to address the above issues.This strategy comprehensively considers the strength of the correlation between reward probability subsets,improving the probability quality update rate of strongly correlated subsets while reducing the probability quality update rate of weakly correlated subsets.In the environment provided by OpenAI Gym,experiments are conducted,and the exponential update and harmonic series update strategies show significant differences in each training session,while the training images of the Sig-Dist-DQN strategy are very stable.Compared with the exponential update and harmonic sequence update strategies,the intelligent agent applying Sig-Dist-DQN has significantly improved the convergence speed and stability of the loss function during the learning process.

Construction and Evaluation of Intelligent Question Answering System for Electric Power Knowledge Base Based on Large Language Model

ZHANG Jinying, WANG Tiankun, YAO Changying, XIE Hua, CHAI Linzheng, LIU Shukai, LI Tongliang, LI Zhoujun

Computer Science. 2024, 51 (12): 286-292. doi:10.11896/jsjkx.240300104

Abstract

PDF(1738KB) ( 388 )

References | Related Articles | Metrics

Large language model is a major breakthrough in the field of natural language processing in recent years and have become a new paradigm for research in this field.In vertical fields such as finance and law,intelligent question and answering systems based on large models in vertical fields such as FinGPT and ChatLaw have promoted the academic research and application of large model technology in related fields.However,due to the lack of relevant high-quality data in the electric power field,the construction of related large-model question answering systems has encountered great obstacles.In order to build an intelligent question and answering system in the electric power field,an intelligent question and answering system for electric power know-ledge base ChatPower based on a large language model is proposed.In order to ensure the Q&A effect,ChatPower fully utilizes data from all aspects of power management,sorts out and integrates a large amount of power professional knowledge through semantic understanding,and carefully designs and constructs a large-scale power system knowledge base.The knowledge base co-vers power-related rules and regulations,production safety management systems,and power generation equipment failure know-ledge.In addition,by referring to the retrieved electricity knowledge,ChatPower significantly reduces the problem of model illusion in question and answering,and introduces a method that combines BM25 retrieval,dense retrieval and rerank in the retrieval system,effectively reducing the the inaccuracy of relying solely on vector library retrieval.At the same time,ChatPower combines prompt engineering technology based on large models to improve the orderliness of generating responses to rules and regulations type questions.In order to evaluate the Q&A system,a test data set for electric power knowledge question and answering is constructed,and ChatPower is tested and verified.The test results show that the electric power knowledge base question and answe-ring system ChatPower based on a large language model can effectively improve the accuracy of retrieval of power-related know-ledge and Q&A.

Zero Trust Anonymous Access Scheme Based on Software-defined Perimeters

LI Weixian, ZHANG Jianhui, ZENG Junjie, JIA Hongyong, MEN Ruirui

Computer Science. 2024, 51 (12): 293-302. doi:10.11896/jsjkx.231000176

Abstract

PDF(2131KB) ( 143 )

References | Related Articles | Metrics

Software-defined perimeters,as a highly scalable and secure zero-trust security architecture,have gained widespread adoption.Conventional software-defined perimeter(SDP) architectures employ a single packet authorization mechanism to achieve resource hiding and visitor identity validation.However,existing solutions often store and distribute SDP keys in a centralized manner,and lack of robust protection for visitor privacy.In response to the aforementioned challenges,a zero-trust anonymous access scheme within the software-defined perimeter architecture is proposed.This scheme utilizes a three-party key agreement for SDP key distribution and employs generalized designated verifier signatures for anonymous visitor identity authentication.More-over,it demonstrates resilience against network attacks such as SPA key theft,port knocking amplification attacks,and identity spoofing,thus exhibiting enhanced security compared to existing software-defined perimeter schemes.Experimental findings reveal a reduction of 33% in communication overhead and a 20% decrease in average authentication latency within multi-node network environments.

Cryptomining Malware Early Detection Method Based on SDR

ZHONG Kai, GUO Chun, LI Xianchao, SHEN Guowei

Computer Science. 2024, 51 (12): 303-309. doi:10.11896/jsjkx.231200041

Abstract

PDF(1613KB) ( 125 )

References | Related Articles | Metrics

Cryptomining malware aims to steal computing resources from devices to mine cryptocurrency,seriously compromising network security while consuming a large amount of computing resources.Current dynamic detection methods for cryptomining malware mainly rely on host behavior or network traffic collected during a long sample run for detection,which does not balance the timeliness and accuracy of detection.By analyzing the DLL(dynamic link library) called and the return value of the API called by the cryptomining malware at the early stage of operation,we propose an API sentence embedding method based on DLL and API return value(SDR),and further propose a cryptomining malware early detection method based on SDR(CEDS).CEDS uses SDR to convert the API name sequences,API returns value sequences,and DLL sequences generated in the early stages of software operation into sentence vector sequences,and uses TextCNN to build a model for early detection of cryptomining malware.Experimental results show that CEDS can determine whether a software sample is cryptomining malware or benign software with an average time of 0.5106s and an accuracy of 96.75%.

Proxy Provable Data Possession with Key-exposure Resilient

AN Ruicheng, WANG Huaqun

Computer Science. 2024, 51 (12): 310-316. doi:10.11896/jsjkx.231100085

Abstract

PDF(1699KB) ( 111 )

References | Related Articles | Metrics

More and more clients would like to store their data to public cloud server along with the rapid development of cloud storage.To check the integrity of remote data,researchers proposed provable data possession(PDP).In some cases,the client will be restricted to access the Internet,such as on the ocean-going vessel,participating in some classified projects.It has to delegate the remote data possession checking task to some proxy.However,in proxy PDP,once the client’s private key is exposed,auditing schemes would inevitably become unable to work.To solve these problems,the proposed scheme combines key-insulated with proxy PDP,and introduces a physically-secure but computationally-limited helper into the system model.The helper generates an update message in each time period and then sends it to the client to help the client calculate the signing key for the current time peroid.In this scheme,adversaries cannot forge user-generated authenticators during the time period when the key is not leaked.Security analysis and performance analysis show that the proposed scheme is secure and efficient.

CP-ABE Scheme Supports Fully Policy and Attribute Hidden

JIANG Luhan, TIAN Youliang, XIANG Axin

Computer Science. 2024, 51 (12): 317-325. doi:10.11896/jsjkx.231000056

Abstract

PDF(2282KB) ( 136 )

References | Related Articles | Metrics

The existing ciphertext-policy attribute-based encryption schemes that support policy or attribute hiding can achieve fine-grained access control for privacy protection,but most of them only realize partial policy hiding of attribute values,and ignore the problem of hiding user attributes during key generation,which is still prone to user privacy information leakage.To address the above problems,a CP-ABE scheme that fully hides access policy and user attributes for data access control and user privacy information protection during key generation is proposed.Firstly,the attribute Morton filter(AMF) is proposed,in which the access policy is fully hidden in the AMF during the encryption phase,and the user can efficiently query and accurately determine the position of attributes in the policy during the decrypt phase.Secondly,a zk-SNARKs-based key generation method is developed to effectively conceal the user attributes throughout the key generation process.Finally,security and performance analysis are conducted to evaluate the proposed scheme,demonstrating its indistinguishability under chosen-plaintext attack security without compromising efficiency.

Fine-grained Vulnerability Detection Based on Hierarchical Attention Networks and Integral Gradients

LI Qiuyue, HAN Daojun, ZHANG Lei, XU Tao

Computer Science. 2024, 51 (12): 326-333. doi:10.11896/jsjkx.231000174

Abstract

PDF(2096KB) ( 128 )

References | Related Articles | Metrics

Smart contracts are decentralized applications that run on blockchain platforms and are widely used in many fields,including digital currencies,the Internet of Things,and supply chains.Research on vulnerability detection in smart contracts is of great importance for securing digital assets and maintaining the reliability and stability of contracts.One of the current mainstream researches is to use deep learning models to automatically learn code features,so as to detect vulnerabilities in smart contracts.It has high accuracy,but has limitations in vulnerability interpretation and cannot provide fine-grained vulnerability information.To address the problem that the current deep learning-based smart contract vulnerability detection model cannot effectively provide fine-grained vulnerability explanation and lacks of fine-grained labels,a fine-grained vulnerability detection method based on hierarchical attention network and integral gradient is proposed.Using hierarchical attention network for coarse-grained vulnerability detection,the word attention encoding layer and function attention encoding layer are constructed by two attention layers to learn the function-level and contract-level representations of the source code,respectively,to pay attention to the various tokens and statements of the code;and then the integrated gradient method is used to provide fine-grained explanations and calculate the contribution of code statements to vulnerability prediction,to obtain the vulnerability statements related to vulnerabilities,so as to realise the statement-less tags in the case of word-level and statement-level vulnerability interpretation.Experimental results on real Ethereum datasets SmartbugsWilds,SmartbugsCurated and SolidiFIBenchmark show that the proposed method achieves an average accuracy of more than 80% on five vulnerability types,with a 6% improvement in the accuracy of vulnerability interpretation,which can locate the vulnerable code more accurately and help developers to review contracts.

Zero Day Attack Detection Method for Internet of Vehicles

WANG Bo, ZHAO Jincheng, XU Bingfeng, HE Gaofeng

Computer Science. 2024, 51 (12): 334-342. doi:10.11896/jsjkx.231000117

Abstract

PDF(2840KB) ( 221 )

References | Related Articles | Metrics

Zero-day attack detection in the Internet of Vehicles usually adopts anomaly-based methods due to the limited availabi-lity of attack data.Nevertheless,the complex and diverse driving environments that vehicles operate in,coupled with the variability of behavioral patterns,resulting in significant deviations in normal behavior.As a consequence,the utilization of anomaly-based methods tends to yield elevated false alarm rates.In the vehicular context,the attack principles of zero-day and known attacks exhibit similarities.Drawing inspiration from transfer learning,a zero-day attack detection method for the Internet of Vehicles is introduced,which is grounded in few-shot learning and employs conditional generative adversarial networks(CGANs).Specifically,a conditional adversarial generative network model is proposed featuring multiple generators and multiple discriminators.Within this framework,an adaptive sampling data augmentation method is developed to enhance the dataset with known attack samples.This augmentation is achieved through the optimization of input samples to effectively reduce the occurrence of false positives.Furthermore,to address the data imbalance issue stemming from a limited number of input attack samples,a collaborative focus loss function is incorporated into the discriminators,with an emphasis on distinguishing challenging-to-classify data.The effectiveness of the proposed method is rigorously assessed through comprehensive experiments conducted on the F2MD vehicle network simulation platform.The experimental results unequivocally establish the superiority of the proposed approach compared to existing methods,both in terms of detection efficacy and latency.As a result,this paper presents an effective solution for zero-day attack detection in the realm of the Internet of Vehicles.

Fuzzy Labeled Private Set Intersection Protocol

CHENG Enze, ZHANG Lei, WEI Lifei

Computer Science. 2024, 51 (12): 343-351. doi:10.11896/jsjkx.231000131

Abstract

PDF(2260KB) ( 187 )

References | Related Articles | Metrics

Fuzzy labeled private set intersection(FLPSI) is a variant of PSI where the elements in the sender’s and receiver’s sets are not the same but rather have some similarities.Each element in the sender’s set is associated with a label,and the receiver only receives the labels of the matched elements and without revealing other information.Most existing FLPSI protocols use Hamming distance to determine the degree of matching between binary vectors.These protocols are built based on expensive public key ciphers,which requiring high computation overhead and resulting in slow running time.This paper proposes an efficient FLPSI protocol based on symmetric cryptography.It proves the security of the PSI protocol in the semi-honest model,ensuring that participants cannot obtain additional data.Compared to the existing schemes,the protocol reduces the overall communication complexity and the computational complexity of the sender from O(n²) to O(n).Through experimental simulation,in balanced scenarios,the proposed protocol is 3~10x faster than the existing FLPSI protocol,and the communication is reduced by 89% to 95%.In unbalanced scenarios,the proposed protocol is 7~10x faster than the existing FLPSI protocol,and it also exhibits ob-vious advantages over similar fuzzy matching protocols.In addition,the application of FLPSI protocol in face recognition under privacy protection conditions is designed,which can meet the requirements of different scenarios by adjusting parameters.

Adaptive MSB Reversible Data Hiding Based Security Deduplication for Encrypted Images in Cloud Storage

ZHOU Yiteng, TANG Xin, JIN Luchao

Computer Science. 2024, 51 (12): 352-360. doi:10.11896/jsjkx.231100087

Abstract

PDF(2377KB) ( 157 )

References | Related Articles | Metrics

With the rapid development of information technologies,more and more multimedia data represented by images are repeatedly uploaded to the cloud for storage,resulting in a great waste of communication and storage overhead.In addition,the plaintext images are directly stored in the cloud,which brings about the problem of confidentiality breach.Even though ciphertext deduplication is an effective means to deal with these problems,the differentiated response actually creates a side channel for attackers,which makes the existence privacy of data in cloud storage at risk.At the same time,in order to achieve key transferring between data owners,a huge amount of extra overhead is required.Thus,this paper proposes an efficient adaptive MSB reversible data hiding based secure deduplication(EMSD),which is able to effectively resist side channel attacks and save communication and storage overhead.Specifically,we innovatively introduce the reversible data hiding for encrypted images into ciphertext deduplication,and embed the auxiliary information for key transferring into the encrypted images before sending to the cloud.Thus the extra communication and storage overhead for auxiliary information are successfully eliminated.Furthermore,we optimize the existing deduplication scheme to ensure that even if the image in deduplication request is not duplicate,extra ciphertext uploading is not needed,thus indistinguishable response is achieved.Security analysis and experimental results show that,the proposed scheme is able to resist side channel attack in a lightweight way comparing with existing schemes.