Computer Science

Study on Transmission Optimization for Hierarchical Federated Learning

ZOU Sai-lan, LI Zhuo, CHEN Xin

Computer Science. 2022, 49 (12): 5-16. doi:10.11896/jsjkx.220300204

Abstract

PDF(2942KB) ( 633 )

References | Related Articles | Metrics

Compared with traditional machine learning,federated learning effectively solves the problems of user data privacy and security protection,but a large number of model exchanges between massive nodes and cloud servers will produce high communication costs.Therefore,cloud-edge-side layered federated learning has received more and more attention.In hierarchical federated learning,D2D and opportunity communication can be used for model cooperation training among mobile nodes.Edge server performs local model aggregation,while cloud server performs global model aggregation.In order to improve the convergence rate of the model,the network transmission optimization technique for hierarchical federated learning is studied.This paper introduces the concept and algorithm principle of hierarchical federated learning,summarizes the key challenges that cause network communication overhead,summarizes and analyzes six network transmission optimization methods,such as selecting appropriate nodes,enhancing local computing,reducing the upload number of local model updates,compressing model updates decentralized training and parameter aggregation oriented transimission.Finally,the future research direction is summarized and discussed.

Storage Task Allocation Algorithm in Decentralized Cloud Storage Network

SHEN Zhen, ZHAO Cheng-gui

Computer Science. 2022, 49 (12): 17-21. doi:10.11896/jsjkx.220700131

Abstract

PDF(2187KB) ( 535 )

References | Related Articles | Metrics

Constructing a novel model for the storage task allocation problem of federated learning client datasets,to ensure load balancing of decentralized cloud storage networks,shorten the storage data uploading and recovery time,and reduce the total client storage cost,a data storage task allocation algorithm——URGL_allo (allocation based on user requirements and global load) that considers client requirements and global load is proposed.In the node allocation phase,node resources such as global load,topological attributes,storage price and data recovery time concerned by clients are considered,and a new node ranking method is defined in conjunction with the law of gravity to select the best storage task allocation node.In the link allocation stage,the shortest path calculation is performed using Dijkstra’s algorithm for the client node as the center to other nodes in the network,and the path with the largest bandwidth value in the set of shortest paths between two nodes is selected for allocation.Simulation results show that the proposed algorithm reduces the load balancing index and the total client storage cost by 41.9% and 5%,respectively,compared with the random policy-based allocation algorithm (Random_allo),and the data recovery time is not much different from that of the link bandwidth-based greedy algorithm,both of which are stably maintained between (0,2],which is 1/20 of Random_allo.The combined performance of global load and service quality is better than that of the comparison algorithm.

Study on Privacy-preserving Nonlinear Federated Support Vector Machines

YANG Hong-jian, HU Xue-xian, LI Ke-jia, XU Yang, WEI Jiang-hong

Computer Science. 2022, 49 (12): 22-32. doi:10.11896/jsjkx.220500240

Abstract

PDF(3347KB) ( 687 )

References | Related Articles | Metrics

Federated learning offers new ideas for solving the problem of multiparty joint modeling in “data silos”.Federated support vector machines can realize cross-device support vector machine modeling without local data,but the existing research has some defects such as insufficient privacy protection in a training process and a lack of research on nonlinear federated support vector machines.To solve the above problems,this paper utilizes the stochastic Fourier feature method and CKKS homomorphic encryption system to propose a nonlinear federated support vector machine training(PPNLFedSVM) algorithm for privacy protection.Firstly,the same Gaussian kernel approximate mapping function is generated locally for each participant based on the random Fourier feature method,and the training data of each participant is explicitly mapped from the low-dimensional space to the high-dimensional space.Secondly,the model parameter security aggregation algorithm based on CKKS cryptography ensures the privacy of model parameters and their contributions during the model aggregation process.Moreover,the parameter aggregation process is optimized and adjusted according to the characteristics of CKKS cryptography to improve the efficiency of the security aggregation algorithm.Security analysis and experimental results show that the PPNLFedSVM algorithm can ensure the privacy of participant model parameters and their contributions to the training process without losing the model accuracy.

Federated Data Augmentation Algorithm for Non-independent and Identical Distributed Data

QU Xiang-mou, WU Ying-bo, JIANG Xiao-ling

Computer Science. 2022, 49 (12): 33-39. doi:10.11896/jsjkx.220300031

Abstract

PDF(2654KB) ( 463 )

References | Related Articles | Metrics

In federated learning,the local data distribution of users changes with the location and preferences of users,the data under the non-independent and identical distributed(Non-IID) data may lack data of some label categories,which significantly affects the update rate and the performance of the global model in federated aggregation.To solve this problem,a federated data augmentation based on conditional generative adversarial network(FDA-cGAN) algorithm is proposed,which can amplify data from participants with skewed data without compromising user privacy,and greatly improve the performance of the algorithm with Non-IID data.Experimental results show that,compared with the current mainstream federated average algorithm,under the Non-IID data setting,the prediction accuracy of MNIST and CIFAR-10 data sets improves by 1.18% and 14.6%,respectively,which demonstrates the effectiveness and practicability of the proposed algorithm for Non-IID data problems in federated learning.

Efficient Federated Learning Scheme Based on Background Optimization

GUO Gui-juan, TIAN Hui, WANG Tian, JIA Wei-jia

Computer Science. 2022, 49 (12): 40-45. doi:10.11896/jsjkx.220600237

Abstract

PDF(2022KB) ( 467 )

References | Related Articles | Metrics

Federated learning can effectively ensure the privacy and security of data because it trains data locally on the client.The study of federal learning has made great progress.However,due to the existence of non-independent and identically distributed data,unbalanced data amount and data type,the client will inevitably have problems such as lack of accuracy and low training efficiency when using local data for training.In order to deal with the problem that the federal learning efficiency is reduced due to the difference of the federal learning background,this paper proposes an efficient federated learning scheme based on background optimization to improve the accuracy of the local model in the terminal device,so as to reduce the communication cost and improve the training efficiency of the whole model.Specifically,the first device and the second device are selected according to the diffe-rence in accuracy in different environments,and the irrelevance between the first device model and the global model (hereafter we collectively refer to as the difference value) is taken as the standard difference value.Whether the second device uploads the local model is determined by the value of the difference between the second device and the first device.Experimental results show that compared with the traditional federated learning,the proposed scheme performs better than the federated average algorithm in common federated learning scenarios,and improves the accuracy by about 7.5% in the MINIST data sets.In the CIFAR-10 data set,accuracy improves by about 10%.

Survey of Incentive Mechanism for Federated Learning

LIANG Wen-ya, LIU Bo, LIN Wei-wei, YAN Yuan-chao

Computer Science. 2022, 49 (12): 46-52. doi:10.11896/jsjkx.220500272

Abstract

PDF(2478KB) ( 910 )

References | Related Articles | Metrics

Federated Learning(FL) is driven by multi-party data participation,where participants and central servers continuously exchange model parameters rather than directly upload raw data to achieve data sharing and privacy protection.In practical applications,the accuracy of the FL global model relies on multiple stable and high-quality clients participating,but there is an imba-lance in the data quality of participating clients,which can lead to the client being in an unfair position in the training process or not participating in training.Therefore,how to motivate clients to participate in federated learning actively and reliably is the key,which ensuring that FL is widely promoted and applied.This paper mainly introduces the necessity of incentive mechanisms in FL and divides the existing research into incentive mechanisms based on contribution measurement,client selection,payment allocation and multiple sub-problems optimization according to the sub-problems of incentive mechanisms in the FL training process,analyzes and compares existing incentive schemes,and summarizes the challenges in the development of incentive mechanisms on this basis,and explores the future research direction of FL incentive mechanisms.

Federated Learning Optimization Method for Dynamic Weights in Edge Scenarios

CHENG Fan, WANG Rui-jin, ZHANG Feng-li

Computer Science. 2022, 49 (12): 53-58. doi:10.11896/jsjkx.220700136

Abstract

PDF(2602KB) ( 596 )

References | Related Articles | Metrics

As a new computing paradigm,edge computing provides computing and storage services at the edge of the network compared to traditional cloud computing model.It has the characteristics of high reliability and low latency.However,there are still some problems in privacy protection and data processing.As a distributed machine learning model,federated learning can well solve the problems of inconsistent data distribution and data privacy in edge computing scenarios,but it still faces challenges in equipment heterogeneity,data heterogeneity and communication,such as model offset,the convergence effect is poor,and the calculation results of some devices are lost.In order to solve the above problems,a federated learning optimization algorithm with dynamic weights(FedDw) is proposed,which focuses on the service quality of the equipment,reduces the heterogeneous impact caused by the participation of some equipments due to inconsistent training speed,and determines the proportion in the final mo-del aggregation according to the service quality,so as to ensure that the aggregation results are more robust in complex real situations.Through experiments,the two excellent federated learning algorithms,FedProx and Scaffold,are compared on the real data sets of 10 regional weather stations.The results show that the FedDw algorithm has better comprehensive performance.

Multi-dimensional Resource Dynamic Allocation Algorithm for Internet of Vehicles Based on Federated Learning

WU Yun-han, BAI Guang-wei, SHEN Hang

Computer Science. 2022, 49 (12): 59-65. doi:10.11896/jsjkx.211000123

Abstract

PDF(2367KB) ( 496 )

References | Related Articles | Metrics

In consideration of the characteristics of multi-dimensional resource consumption fluctuating with time in the Internet of Vehicles system and users’ demands for efficient computing services and data privacy and security,this paper proposes a me-thod of multi-dimensional resource allocation for Internet of Vehicles based on federated learning.On the one hand,the allocation of computing,cache and bandwidth resources is considered comprehensively to ensure the completion rate of computing tasks and avoid the redundant allocation of multidimensional resources.For this purpose,a deep learning algorithm is designed to predict the consumption of various resources through the data collected by edge servers.On the other hand,considering the data island problem caused by users’ data privacy and security requirements,federated learning architecture is adopted to obtain a neural network model with better generalization.The proposed algorithm can not only adjust the allocation of multi-dimensional resources over time,but also meet the resource requirements that change over time,and ensure the efficient completion of computing tasks in the Internet of Vehicles system.Experimental results show that the algorithm has the characteristics of fast convergence and good model generalization,and can complete the aggregation of federated learning with fewer communication rounds.

FL-GRM:Gamma Regression Algorithm Based on Federated Learning

GUO Yan-qing, LI Yu-hang, WANG Wan-wan, FU Hai-yan, WU Ming-kan, LI Yi

Computer Science. 2022, 49 (12): 66-73. doi:10.11896/jsjkx.220600034

Abstract

PDF(2260KB) ( 385 )

References | Related Articles | Metrics

People commonly hypothesize that an independent variable follows a Gamma distribution in many areas,including hydrology,meteorology and insurance claim.Under the Gamma distribution assumption,Gamma regression model enables an outstanding fitting effect,compared with multivariate linear-regression model.Previous studies may be able to obtain a Gamma regression model trained only on a public dataset.However,when the datasets are provided by multiple parties,how to seek to address the problem of data privacy by training Gamma regression model without exchanging the data itself? A secure multi-party federated Gamma regression algorithm has been applied to this area.Firstly,the log-likelihood function is derived with the iterative method.Secondly,the link function is determined according to the fact,and the gradient updating strategy is constructed by the loss function.Finally,the parameters with homomorphic encryption are updated,then the training is completed.The model is tested on two public datasets,and the results show that under the premise of privacy protection our method can effectively use the value of multi-party data to generate Gamma regression model.The fitting performance of our method is better than that of Gamma regression model implements in a single part,and is close to the result yielded by centralized data learning model.

Fault Detection and Diagnosis of HVAC System Based on Federated Learning

WANG Xian-sheng, YAN Ke

Computer Science. 2022, 49 (12): 74-80. doi:10.11896/jsjkx.220700280

Abstract

PDF(3567KB) ( 412 )

References | Related Articles | Metrics

Automation and accurate fault detection and diagnosis of HVAC systems is one of the most important technologies for reducing time,energy,and financial costs in building performance management.In recent years,data-driven fault detection and diagnosis methods have been heavily studied for fault detection and diagnosis of HVAC systems.However,most existing works deal with single systems and are unable to perform cross-system fault diagnosis.In this paper,a federal learning-based fault detection and diagnosis method is proposed,which uses convolutional neural networks to extract information features,aggregates features using special-designed algorithms,and perform cross-level and cross-system fault detection and diagnosis via federal lear-ning.For multi-fault level fault detection and diagnosis,federal learning is performed using data from four fault levels of chillers.Experimental results show that the average F1-score of the fault detection and diagnosis effect of the four-fault levels is close to 0.97,which is within the practical range.Federal learning uses chiller and air handling unit data for cross-system fault detection and diagnosis.Experimental results show that federal learning using different system data improves the diagnosis results of particular faults,e.g.,14.4% for RefOver faults and 2%~4% for both Refleak and Exoil faults.

Automatic Assignment Method for Software Bug Based on Multivariate Features of Developers

DONG Xia-lei, XIANG Zheng-long, WU Hong-run, WANG Ding-wen, LI Yuan-xiang

Computer Science. 2022, 49 (12): 81-88. doi:10.11896/jsjkx.211100040

Abstract

PDF(2948KB) ( 442 )

References | Related Articles | Metrics

Software bug repair is a problem that cannot be ignored in the process of software life.How to efficiently assign software bugs automatically is a very important research direction.Now,the existing research methods mainly focus on the bugreport’s text content or the low-level information of the developers’ tossing network,while ignoring the high-level topology information in the tossing network.Therefore,this paper proposes a software bug automatic assignment model MFD-GCN based on the developers’ multivariate features.Model fully considers the high-level topological features in the developers’ tossing network,and uses the powerful network feature extraction capabilities of graph convolution network to fully mine the multivariate features that represent developers’ deep cooperation relationship and fixing preferences,and train the classifier together with the bug text features.The proposed method is evaluated on two large open-source software projects,i.e.,Eclipse and Mozilla.Expe-rimental results show that compared with the mainstream bug-assignment methods proposed in recent years,the MFD-GCN mo-del has achieved state-of-art results in recommending the top K developers.The accuracy rate of top-1 recommendation on the Eclipse and Mozilla project reaches 69.8% and 59.7%,respectively.

Empirical Study on Defects in R Programming Language and Core Packages

WANG Zi-yuan, BU De-xin, LI Ling-ling, ZHANG Xia

Computer Science. 2022, 49 (12): 89-98. doi:10.11896/jsjkx.220200181

Abstract

PDF(1495KB) ( 353 )

References | Related Articles | Metrics

The R programming language that provides a variety of statistical calculation functions is considered to be one of the programming languages most suitable for artificial intelligence.The correctness of the language implementation is a prerequisite for the correctness of the programs developed with such a language.However,there are inevitably many defects in the R programming language.This paper conducts an empirical study on defects in the R programming language and its core packages.By analyzing 7020 issues,we find that:1) Among all the 35 versions involved in these defects,there are the most defects in R 3.1.2,R 3.0.2 and R 3.5.0,and these defects are primarily distributed in a few components such as Documentation,Graphics,Language.2) The components with higher overall defect priority include Startup,Installation and Analyses,and the components with higher overall defect severity include I/O,Installation and Accuracy.There is a significant intermediate correlation between the priority and severity of the defects.3) About 78% of defects could be repaired within one year.4) Semantic faults are the most frequent root cause of defects,in which the “missing feature” and “processing” are more than others.These findings reveal some laws of defects in the R programming language and its core packages.It can assist developers of the R programming language in improving their development quality,assist maintainers of the R programming language in detecting and repairing defects more effectively,and suggest users of the R programming language evade potential risks.

Developer Recommendation Method for Crowdsourcing Tasks in Open Source Community

JIANG Jing, PING Yuan, WU Qiu-di, ZHANG Li

Computer Science. 2022, 49 (12): 99-108. doi:10.11896/jsjkx.220400289

Abstract

PDF(2059KB) ( 332 )

References | Related Articles | Metrics

Gitcoin is a crowdsourcing platform based on open-source community GitHub.In Gitcoin,project teams can release development tasks.The developers select the task they are interested in to register,and the publisher selects the appropriate deve-loper to complete the task and offers a reward.But some tasks fail because of a lack of registrants.Some tasks are not performed properly.Successfully completed tasks also face the problem of long developer registration intervals.Therefore,a developer re-commendation method is needed to quickly find suitable developers for crowdsourcing tasks,shorten the time for developers to register for crowdsourcing tasks,find potential suitable developers and motivate them to register,so as to promote the successful completion of crowdsourcing tasks.A developer recommendation system DEVRec based on the LGBM classification algorithm is proposed in this paper.Firstly,the task-related characteristics,developer-related characteristics,and the relationship between developers and tasks in the crowd-sourcing task assignment records are extracted.Then the LGBM classification algorithm is used for binary classification.The probability of a developer registering the task is given,and finally the list of recommended people for the task is provided.To evaluate the recommendation effect,1 599 completed crowdsourcing tasks,343 publishers,and 1 605 deve-lopers are crawled from Gitcoin platform.Experimental results show that,compared with the Policy Model,the recommendation accuracy and MRR index of the top 1,top3,top5 and top10 of DEVRec improves by 73.11%,119.07%,86.55%,29.24% and 62.27% respectively.

Classification of Unreproducible Build Causes Based on Log Information

MA Zhao, LIU Dong, REN Zhi-lei, JIANG He

Computer Science. 2022, 49 (12): 109-117. doi:10.11896/jsjkx.220300227

Abstract

PDF(2107KB) ( 334 )

References | Related Articles | Metrics

Reproducible build is the ability to recreate binary artifacts in a predefined build environment.Due to the role of reproducible build in ensuring the security of software construction environment and improving the efficiency of software construction and distribution,many open source software repositories(such as Debian) have carried out software reproducible build practice.However,due to the lack of sufficient judgment information and the complexity and diversity of source files,it is still a time-consuming and laborious challenge to determine why software can not be built reproducibly.In order to overcome this challenge,this paper studies the classification and detection of software unreproducible build causes based on machine learning.This paper stu-dies four typical reasons for unreproducible build,namely timestamp,fileordering,randomness and locale.This method uses the word vector generated by word2vec to represent the text log,and then cooperates with the logistic regression model to learn and train the text corpus combined with the difference log and the build log,so as to realize the automatic classification of the causes of unreproducible build.In this paper,the algorithm is implemented and tested on 671 unreproducible build Debian software packa-ges.Experimental results show that our method achieves a macro average precision of 80.75% and a macro average recall of 86.07%,which are better than other commonly used machine learning algorithms.In addition,we also analyze the relevance and importance of difference log and build log.Result indicates that both of them are significant for the classification of unreproducible build causes.This method provides a reliable research basis for automatic classification of unreproducible build causes.

Software Diversity Evaluation Method Based on Multi-granularity Features

CHI Yu-ning, GUO Yun-fei, WANG Ya-wen, HU Hong-chao

Computer Science. 2022, 49 (12): 118-124. doi:10.11896/jsjkx.211200029

Abstract

PDF(2320KB) ( 352 )

References | Related Articles | Metrics

Aiming at the problem that existing software diversity evaluation methods generally adopt single feature,a software diversity evaluation method based on multi-granularity feature is proposed.This method analyzes four granularity of program:instruction,function,basic block and binary file.First,different granularity are obtained by small prime product method and dyna-mic weight distribution algorithm.Then,the granularity is analyzed according to the effectiveness of diversification technology.In the experimental part,GNU coreutils is used to comprehensively evaluate 7 software diversification methods.The result is analyzed to verify the applicability of the evaluation algorithm.Experimental results show that this evaluation method can accurately evaluate the effectiveness of software diversification methods from both vertical and horizontal directions,which has reference value for the research direction of subsequent diversification technology.

Study on Anomaly Detection and Real-time Reliability Evaluation of Complex Component System Based on Log of Cloud Platform

WANG Bo, HUA Qing-yi, SHU Xin-feng

Computer Science. 2022, 49 (12): 125-135. doi:10.11896/jsjkx.220200106

Abstract

PDF(2170KB) ( 355 )

References | Related Articles | Metrics

Reliability,usability and security are three important indicators of software quality measurement,and software reliability is the most important indicator.Software system is regarded as a whole or viewed invocation structure of software as static structure in traditional software reliability evaluation and prediction.Today’s software architecture has changed significantly.Typical features such as autonomy,coordination,evolution,dynamic and adaptive have been infiltrated into the current complex network software system.Traditional reliability evaluation and prediction methods cannot adapt to such software architecture or environment.Currently,in the society of high-speed information,“software defines everything”.Massive information systems ge-nerate large-scale data resources.The diversity and complexity of log resources are the results of heterogeneity,parallelism,complexity and huge scale of modern information systems.Accurate analysis and anomaly prediction based on logs are particularly important for building safe and reliable systems.There are a lot of research on anomaly prediction and software reliability in the existing literatures,but there is little about real-time software reliability measurement for massive logs and complex network component systems.Accordingly,based on the complete procedures of log processing,from its analysis,feature extraction,anomaly detection and prediction evaluation to real-time reliability evaluation,this paper uses ensemble learning model to analyze and predict anomaly of the massive system logs.Comparisons with the traditional machine learning methods are made to improve the accuracy,recall rate and F1 value of anomaly prediction.The evaluation result is used to correct the real-time reliability in view of the low predicted recall rate,which greatly improves the accuracy of real-time reliability.According to the individual reliability,the system reliability based on Markov theory is used to measure the reliability of microservice composite components,so as to provide accurate data basis and anomaly location basis for intelligent operation and maintenance.

Automating Release of Android APIs Based on Computational Reflection

WANG Yi, CHEN Ying-ren, CHEN Xing, LIN Bin, MA Yun

Computer Science. 2022, 49 (12): 136-145. doi:10.11896/jsjkx.211100066

Abstract

PDF(3863KB) ( 313 )

References | Related Articles | Metrics

With the development of mobile hardware and 5G communication technologies,smart applications are booming,which has penetrated into all the aspects of our life and work.There are many functions in applications,which can not only satisfy the user’s requirements,but also be further released as APIs for external invocations.For instance,the APIs provided by applications can be invoked by the intelligent voice assistant.However,these functions must be released as APIs during the development phase,otherwise they cannot be used for external invocations.To address this problem,this paper proposes an approach to automatic release of APIs of Android applications based on computational reflection.It first rebuilds runtime software architecture for the activities of Android applications based on the reflection mechanism,without modifying the source code of application.Then,based on test cases of the specified function,it analyzes its user-behavior workflow and corresponding procedure calls.Finally,the function can be invoked by simulating the user behaviors,and then is released as the corresponding API.We evaluate our approach with 300 popular apps on Android app store Wandoujia,and the results show that our approach is effective for 280 of them.For the specified functions,APIs can be implemented by our approach in about 15 minutes,and their runtime performance is desirable.

Method of Attributed Heterogeneous Network Embedding with Multiple Features

TANG Qi-you, ZHANG Feng-li, WANG Rui-jin, WANG Xue-ting, ZHOU Zhi-yuan, HAN Ying-jun

Computer Science. 2022, 49 (12): 146-154. doi:10.11896/jsjkx.211200082

Abstract

PDF(2683KB) ( 416 )

References | Related Articles | Metrics

Network embedding aims to represent nodes in unstructured network with low-dimensional,real-valued vectors,so that node embedding can retain the structural and attribute features of the original network as much as possible.However,current research mainly focuses on embedding the network structure.There are few researches considering relationship attributes and node attributes with rich semantics in heterogeneous information networks,which mayresult in semetic loss of node embedding and affect the prediction effect of downstream applications.To solve this problem,this paper designs a method of attributed heteroge-neous network embedding with multiple features(MFAHNE).This method integrates the relationship attributes,node attributes and structural semantics in the network into the final node embedding through the steps of sampling sequence,embedding with structural feature,embedding with attribute feature and merging features.Experiment result shows that this method can take into account the structural feature and attribute features,realizes the mutual supplement of two kinds of feature information,and is better than the traditional network embedding methods.

Overlapping Community Detection Algorithm Based on Local Path Information

ZHENG Wen-ping, WANG Ning, YANG Gui

Computer Science. 2022, 49 (12): 155-162. doi:10.11896/jsjkx.220500190

Abstract

PDF(3196KB) ( 394 )

References | Related Articles | Metrics

The detection of overlapping communities is one of the main tasks of complex network analysis.The performance of most existing methods based on local expansion and optimization are greatly affected by the selection of initial seed nodes and the community structure significance measurement.Aiming at these problems,an overlapping community detection algorithm is proposed based on local path information(LPIO).First,the local maximum degree points are selected as initial seeds,which will be updated according to the label consistency of node’s neighborhood in the community to reduce the influence of the selection of initial seeds.To measure the various connection patterns between nodes in networks,a community fitness function is defined based on local path information to obtain community structures from seed nodes.Finally,unclustered nodes are assigned to proper communities according to the number of non-repetitive paths between the unclustered nodes and the community seed sets.Comparative experiments on 4 labeled networks and 8 unlabeled networks with 7 classic overlapping community detection algorithms show that the proposed algorithm performs well on overlapping standard mutual information(ONMI),F₁ score,and extended modularity(EQ).

Disentangled Sequential Variational Autoencoder for Collaborative Filtering

WU Mei-lin, HUANG Jia-jin, QIN Jin

Computer Science. 2022, 49 (12): 163-169. doi:10.11896/jsjkx.211200080

Abstract

PDF(2305KB) ( 310 )

References | Related Articles | Metrics

Recommendation models typically use user’s historical behaviors to obtain user preference representations for recommendations.Most of the methods of learning user representations always entangle different preference factors,while the disentangled learning method can be used to decompose user behavior characteristics.In this paper,a variational autoencoder based framework DSVAECF is proposed to disentangle the static and dynamic factors from user’s historical behaviors.Firstly,two encoders of the model use multi-layer perceptron and recurrent neural network to model the user behavior history respectively,so as to obtain the static and dynamic preference representation of the user.Then,the concatenate static and dynamic preference representations are treated as disentangled representation input decoders to capture user’s decisions and reconstruct user’s behavior.On the one hand,in the model training phase,DSVAECF learns model parameters by maximizes the mutual information between reconstructed user’s behaviors and actual user’s behaviors.On the other hand,DSVAECF minimizes the difference between disentangled representations and their prior distribution to retain the generation ability of the model.Experimental results on Amazon and MovieLens data sets show that,compared with the baselines,DSVAECF significantly improves the normalized discounted cumulative gain,recall,and precision,and has better recommendation performance.

Node Local Similarity Based Two-stage Density Peaks Algorithm for Overlapping Community Detection

DUAN Xiao-hu, CAO Fu-yuan

Computer Science. 2022, 49 (12): 170-177. doi:10.11896/jsjkx.211000025

Abstract

PDF(2317KB) ( 352 )

References | Related Articles | Metrics

In order to detect overlapping community structures in complex networks,the idea of density peaks clustering algorithm is introduced.However,applying the density peaks clustering algorithm to community detection still has problems such as how to measure the distance between nodes and how to generate overlapping partition results.Therefore,a node local similarity based two-stage density peaks algorithm for overlapping community detection is proposed (LSDPC).By combining hub promoted index and connection contribution degree,a new node local similarity index is defined,and the node distance is measured with the node local similarity.Then the local density and minimum distance of nodes are used to calculate their center values and Chebyshev inequality is used to select communities’ center nodes.The overlapping communities are obtained through initial assignment and overlapping assignment.Experimental results on real network datasets and synthetic network datasets show that the proposed algorithm can effectively detect overlapping community structure,and the results are better than that of other algorithms.

Variational Recommendation Algorithm Based on Differential Hamming Distance

DONG Jia-wei, SUN Fu-zhen, WU Xiang-shuai, WU Tian-hui, WANG Shao-qing

Computer Science. 2022, 49 (12): 178-184. doi:10.11896/jsjkx.220600024

Abstract

PDF(2037KB) ( 304 )

References | Related Articles | Metrics

Current recommendation algorithms based on hashing technology commonly uses Hamming distance to indicate the similarity between user hash code and item hash code,while it ignores the potential difference information of each bit dimension.Therefore,this paper proposes a differential Hamming distance,which by calculating the dissimilarity between hash codes to assign bit weights.This paper designs a variational recommendation model for dissimilarity Hamming distance.The model is divided into a user hash component and an item hash component,which are connected by variational autoencoder structure.The model uses encoder to generate hash codes for user and items.In order to improve the robustness of the hash codes,we apply a Gaussian noise to both user and item hash coeds.Besides,the user and item hash codes are optimized by differential Hamming distance to maximize the ability of the model to reconstruct user-item scores.Experiments on benchmark datasets demonstrate that the proposed algorithm VDHR improves 3.9% in NDCG and 4.7% in MRR compared to the state-of-the-art hash recommendation algorithm under the premise of constant computational cost.

Framework of Business Intelligence and Analysis Based on Data Fusion

LI Ai-hua, XU Wei-jia, SHI Yong

Computer Science. 2022, 49 (12): 185-194. doi:10.11896/jsjkx.211100080

Abstract

PDF(3765KB) ( 443 )

References | Related Articles | Metrics

The emergence of business intelligence and analytics(BI&A) 3.0 and the broadening application scenarios of information fusion enhance the importance of data fusion in the business intelligence.More and more researches in the fields of economy,finance and management use the idea and methods of fusion,and the application of data fusion in these fields shows characteristics different from the traditional information fusion.Considering the concepts of information fusion and BI&A,this paper puts forward the new connotation of BI&A based on the perspective of data fusion under the background of multi-source and heteroge-neous big data,highlighting the importance of data fusion in BI&A.In addition,the paper constructs the fusion framework of ‘data,information and knowledge’ for BI&A based on WSR system methodology,so that the data fusion can be better applied in the fields of economy,finance and management.It provides scientific basis for acquiring knowledge from massive multi-source and heterogeneous data,and is beneficial to the development and implementation of a more effective business intelligence system.

Integrating XGBoost and SHAP Model for Football Player Value Prediction and Characteristic Analysis

LIAO Bin, WANG Zhi-ning, LI Min, SUN Rui-na

Computer Science. 2022, 49 (12): 195-204. doi:10.11896/jsjkx.210600029

Abstract

PDF(2959KB) ( 458 )

References | Related Articles | Metrics

With the increasing globalization of football,the global player transfer market is becoming more and more prosperous.However,as the most important factor affecting player transfer transaction,the player’s transfer value lacks in-depth model and application research.In this paper,the FIFA’s official player database is taken as the research object.Firstly,on the premise of distinguishing different player positions,Box-Cox transformation,F-Score feature selection,etc.are used to perform feature processing on the original data set.Secondly,the player value prediction model is constructed by XGBoost,and compared with the main machine learning algorithms such as random forest,AdaBoost,GBDT and SVR for 10-fold cross validation experiments.Experimental results prove that the XGBoost model has a performance advantage over the existing models on the indicators of R²,MAE and RMSE.Finally,on the basis of constructing the value prediction model,this paper integrates the SHAP framework to analyze the important factors affecting the players’ value score in different positions,and provides decision support for some scenarios,such as player’s value score evaluation,comparative analysis,and training strategy formulation,etc.

Small Object Detection Based on Deep Convolutional Neural Networks:A Review

DU Zi-wei, ZHOU Heng, LI Cheng-yang, LI Zhong-bo, XIE Yong-qiang, DONG Yu-chen, QI Jin

Computer Science. 2022, 49 (12): 205-218. doi:10.11896/jsjkx.220500260

Abstract

PDF(2881KB) ( 667 )

References | Related Articles | Metrics

Small object detection has long been one of the most challenging problems in computer vision.Since small objects have the characteristics of small coverage area,low resolution,and lack of feature information,their detection effect is not ideal compared to large-sized objects.In recent years,the small object detection algorithm based on deep convolutional neural networks has developed vigorously,and been successfully used in fields such as satellite remote sensing and driverless vehicles.This survey makes a taxnomy,analysis and comparison of existing algorithms.First,the difficulties of small object detection and common detection datasets are introduced.Second,the existing detection algorithms are systematically described from five aspects:backbone network,pyramid structure,anchor design,optimization of object and a bag of species,to provide ideas for further improving the performance of small object detection algorithms.Then,we briefly summarize the existing small object detection algorithms and analyze their performance of the listed algorithm on common dataset.Finally,the application and the future research direction in the field of small object detection has been prospected.

Research Progress of Deep Learning Methods in Two-dimensional Human Pose Estimation

ZHANG Guo-ping, MA Nan, Guan Huai-guang, WU Zhi-xuan

Computer Science. 2022, 49 (12): 219-228. doi:10.11896/jsjkx.210900041

Abstract

PDF(2712KB) ( 476 )

References | Related Articles | Metrics

The task of human pose estimation is to locate and detect the key points of human body in images or videos.It has always been one of the hot research directions in the field of computer vision,and it is also a key step for computers to understand human actions.In recent years,it has wide application for predicting the poses of two-dimensional human body key points in images and videos.Using the powerful image feature extraction capabilities of deep learning,two-dimensional human pose estimation has been improved in robustness,accuracy,and processing time,and the performance effect is far beyond traditional methods.According to the different number of objects in the two-dimensional human body pose,it can be divided into single-person and multi-person pose estimation methods.For single-person pose estimation,according to the different representations of the extracted key points,coordinate regression methods based on the direct prediction of human coordinate points and heat map detection methods based on predicting the Gaussian distribution of human key points can be used.In multi-person pose estimation,it is divided into the top-down method which solves the process from multiple people to a single person,and a bottom-up method that directly deals with the key points of multiple people.Based on the existing estimation methods of human body posture,this paper analyzes the internal mechanism of the network structure,analyzes the commonly used datasets and evaluation indicators,and elaborates the current problems and future development trends.

Visual Question Answering Method Based on Counterfactual Thinking

YUAN De-sen, LIU Xiu-jing, WU Qing-bo, LI Hong-liang, MENG Fan-man, NGAN King-ngi, XU Lin-feng

Computer Science. 2022, 49 (12): 229-235. doi:10.11896/jsjkx.220600038

Abstract

PDF(2426KB) ( 400 )

References | Related Articles | Metrics

Visual question answering(VQA) is a multi-modal task that combines computer vision and natural language proces-sing,which is extremely challenging.However,the current VQA model is often misled by the apparent correlation in the data,and the output of the model is directly guided by language bias.Many previous researches focus on solving language bias and assisting the model via counterfactual sample methods.These studies,however,ignore the prediction information and the difference between key features and non-key features in counterfactual samples.The proposed model can distinguish the difference between the original sample,the factual sample and the counterfactual sample.In view of this,this paper proposes a paradigm of contrastive learning based on counterfactual samples.By comparing these three samples in terms of feature gaps and prediction gaps,the VQA model has been significantly improved in its robustness.Compared with CL-VQA method,the overall precision,average precision and Num index of this method improves by 0.19%,0.89% and 2.6% respectively.Compared with the CSSVQA method,the Gap of the proposed method decrease to 0.45 from 0.96.

RGBT Object Tracking Based on High Rank Feature and Position Attention

YANG Lan-lan, WANG Wen-qi, WANG Fu-tian

Computer Science. 2022, 49 (12): 236-243. doi:10.11896/jsjkx.220600037

Abstract

PDF(3542KB) ( 350 )

References | Related Articles | Metrics

RGBT target tracking uses the advantages of two different modes of visible light(RGB) and thermal infrared(T) to solve the common modal limitation problem in single mode target tracking,so as to improve the performance of target tracking in complex environment.In the RGBT object tracking algorithm,the precise location of the object and the effective fusion of the two modalities are very important issues.In order to accurately locate the object and effectively fuse the two modalities,this paper proposes a new method to explore high-rank feature maps and introduce position attention for RGBT object tracking.The method first uses location attention to focus on the location information of the object according to the deep and shallow features of the backbone network,and then focuses on the importance of the features by exploring the high-rank feature maps before the fusion of the two modalities to guide the modal features fusion.In order to focus on the object location information,this paper uses the average pooling operation on the rows and columns.For the high-rank feature guidance module,this paper guides the fusion of feature maps according to the rank of the feature maps.In order to remove redundancy and noise and achieve more robust feature expression,the feature graph with small rank is deleted directly.Experimental results on two RGBT tracking benchmark data sets show that compared with other RGBT target tracking methods,the proposed method achieves better tracking results in accuracy and success rate.

Handwritten Numeral Recognition Based on Improved Sigmoid Convolutional Neural Network

FAN Ji-hui, TENG Shao-Hua, JIN Hong-Lin

Computer Science. 2022, 49 (12): 244-249. doi:10.11896/jsjkx.211000179

Abstract

PDF(2607KB) ( 351 )

References | Related Articles | Metrics

Deep learning technology is widely used in the field of number recognition.It constructs neural network model through deep learning technology,nonlinear transformation activation function in neurons,different activation functions with different parameter initialization strategies,trains MINIST handwritten data set,constructs analysis model and recognizes numbers in images,reduce the dimension of a large amount of data into a small amount of data,and ensure the effective retention of image features.Through the analysis of image data,adding the feature conversion process,using the gradient descent optimizer to build a network structure and reduce the dimension of data,which can effectively avoid over fitting.Cross-entropy verification is used to compile and train the model,and the output classification results are further analyzed.Through the K-nearest neighbor classification algorithm,KNN classifier is set to further improve the accuracy of classification and prediction.Through MNIST data set experiment,the recognition rate is about 96.2%.The K-nearest neighbor algorithm(KNN) is introduced into the output layer,combined with the full connection layer and softmax layer of traditional convolutional neural network(CNN).After cross verification,the recognition rate is 99.6%.

Tumor Recognition Method Based on Domain Adaptive Algorithm

TIAN Tian-yi, SUN Fu-ming

Computer Science. 2022, 49 (12): 250-256. doi:10.11896/jsjkx.220600008

Abstract

PDF(2758KB) ( 359 )

References | Related Articles | Metrics

It is extremely important to focus on human life and health and to conduct regular cancer screening.A tumor recognition model based on domain adaptive algorithm is proposed to solve the problem of small number of tumor image data sets and some unlabeled ones.The structure of the backbone network is divided into three networks,feature extractor,domain discriminator and label classifier.The feature extractor extracts the features of source domain and target domain to learn tumor features.Label classifier is used to classify and output tumor images.The domain discriminator determines the source of data features.The label classifier plays a game with the domain discriminator to obtain the data distribution of the source domain and the target domain until the distribution of the source domain and the target domain tends to be consistent in the feature space.Then the classifier can classify the data of the target domain.Experimental results on BreakHis data set show that the average accuracy of the proposed model reaches 87.6%,which improves by 16.2% and 14.1% respectively compared with the two classical domain adaptive methods.The proposed method shows a good performance in the classification of unlabeled data sets.

Pest Identification Method Based on TPH-YOLOv5 Algorithm and Small Sample Learning

ZHU Xiang-yuan, NIE Hong, ZHOU Xu

Computer Science. 2022, 49 (12): 257-263. doi:10.11896/jsjkx.221000203

Abstract

PDF(2799KB) ( 561 )

References | Related Articles | Metrics

Pest identification-based deep convolutional object detection is an important application of smart agriculture,which performs pest monitoring and ensures stable agricultural production.To solve the problems of high missed detection rate of small pests and low precision of small samples,a pest identification method based on the TPH-YOLOv5 algorithm and small sample learning is proposed.First,data augmentation for small objects and small samples is designed.Through copy and pasting,cropping,and oversampling,the number of training samples increases and the pest locations are diversified,which improves the contribution to training loss.Second,a two-stage small sample learning strategy based on fine-tuning is constructed.By learning the characteristics of basic and new categories of pests in different stages,the recognition precision of basic categories will not decrease while identifying new pests,which is suitable for the actual agricultural application of continuously collecting pest data.Finally,TPH-YOLOv5 is introduced as the pest identification algorithm.Rigorous tests are conducted on the 28 categories of pest images.The results show that the proposed method achieves high learning efficiency and recognition accuracy,with precision,recall,and mean average precision(mAP) of 87.6%,84.9% and 85.7%,respectively.

Survey on Event Extraction Technology

ZHU Yi-na, CAO Yang, ZHONG Jing-yue, ZHENG Yong-zhi

Computer Science. 2022, 49 (12): 264-273. doi:10.11896/jsjkx.211100226

Abstract

PDF(2953KB) ( 438 )

References | Related Articles | Metrics

Event extraction technology mainly studies how to extract event information that users are interested in from unstructured natural language text.It is an important branch in the field of information extraction and has been widely used in intelligence analysis,intelligent question answering,information retrieval,and recommendation systems in recent years.Beginning with the concept and task of event extraction technology,this paper comprehensively reviews the data sets and methods of event extraction technology,analyzes the technical research progress of event extraction tasks,and summarizes event extraction methods based on pattern matching,machine learning,and deep learning.According to the different learning methods of the model and the difference in the size of feature range,it focuses on the introduction of deep learning-based methods,discussing and analyzing both the advantages and disadvantages of different methods.Finally,the challenges and future research trends of event extraction are summarized,and the current research trends are summarized in view of the low resource scenarios,poor generalization ability and difficult modeling of document-level event extraction.

Branch & Price Algorithm for Resource-constrained Project Scheduling Problem

ZHANG Yu-zhe, DONG Xing-ye, ZHOU Zheng

Computer Science. 2022, 49 (12): 274-282. doi:10.11896/jsjkx.211100276

Abstract

PDF(2903KB) ( 394 )

References | Related Articles | Metrics

Resource-constrained project scheduling problem(RCPSP) is the most representative scheduling problem,which is an abstract representation of many practical scheduling problems.It belongs to the NP-hard problem and is difficult to obtain the global optimal solution for large-scale problems.In this paper,an integer programming model is proposed.By decomposing the model into a constrained master problem and some subproblems,a column generation method is designed for the solving of the linear relaxation model,and then the integer solution of the problem is found through branch & price.In the process of solving,relaxation variables are introduced to solve the pseudo infeasibility of the model.Furthermore,pruning strategy,branch strategy,and two methods for reducing the solution space according to different situations are designed.On the PSPLIB data set,for the problems with 30 processes,the current optimal solutions can be obtained in 10 minutes for 301 out of 480 instances.For the problems with 60 processes,the current optimal solutions can be obtained in 20 minutes for 269 out of 480 instances.For the problem with 90 processes,the current optimal solutions can be obtained in 30 minutes for 263 out of 480 instances.At the same time,by using the strategies of reducing the solution space,the number of timeout instances decreases significantly,and the performance of the optimized initial solution is significantly improved.Experimental results show that the proposed algorithm is effective.

k-step Reachability Query Processing on Label-constrained Graph

DU Ming, XING Rui-ping, ZHOU Jun-feng, TAN Yu-ting

Computer Science. 2022, 49 (12): 283-292. doi:10.11896/jsjkx.211000077

Abstract

PDF(2289KB) ( 430 )

References | Related Articles | Metrics

The k-step reachability query processing on label-constrained graph is used to answer whether there is a path with a length not greater than k between two points and the labels on this path are in the specified label set.The k-step reachability query processing on label-constrained graph is widely used in reality,but there is no relevant algorithm to answer it.Therefore,the LK2H algorithm is proposed firstly.LK2H algorithm mainly consists of two steps.The first step is to build a pair of 2-Hop in-dexes containing k and label information for all vertices on the graph,and the second step is querying based on the built index.In order to return information as much as possible to the user,LK2H optimizes the results of a type of unreachable query:when the user cannot specify all the label types,and cannot give full label constraints resulting in unreachable query results,the complete label set will return to the user.Secondly,an optimization algorithm,LK2H+,is proposed.LK2H+ algorithm further reduces the index size and time of construction by building a 2-Hop index for part of vertices,and queries based on the built index.Queries require a discussion of whether query vertices are indexed or not.Finally,the test is conducted based on 15 real-world datasets.Experiment results show that both LK2H and LK2H+ algorithms can solve k-step reachability query processing on label constraint graphs efficiently and quickly.

Text Classification Based on Graph Neural Networks and Dependency Parsing

YANG Xu-hua, JIN Xin, TAO Jin, MAO Jian-fei

Computer Science. 2022, 49 (12): 293-300. doi:10.11896/jsjkx.220300195

Abstract

PDF(2101KB) ( 489 )

References | Related Articles | Metrics

Text classification is a basic and important task in natural language processing.It is widely used in language processing scenarios such as news classification,topic tagging and sentiment analysis.The current text classification models generally do not consider the co-occurrence relationship of text words and the syntactic characteristics of the text itself,thus limiting the effect of text classification.Therefore,a text classification model based on graph convolutional neural network(Mix-GCN) is proposed.Firstly,based on the co-occurrence relationship and syntactic dependency between text words,the text data is constructed into a text co-occurrence graph and a syntactic dependency graph.Then the GCN model is used to perform representation learning on the text graph and syntactic dependency graph,and the embedding vector of the word is obtained.Then the embedding vector of the text is obtained by graph pooling method and adaptive fusion method,and the text classification is completed by the graph classification method.Mix-GCN model simultaneously considers the relationship between adjacent words in the text and the syntactic dependencies existing between text words,which improves the performance of text classification.On 6 benchmark datasets,compared to 8 well-known text classification methods,experimental results show that Mix-GCN has a good text classification effect.

GAN and Chinese WordNet Based Text Summarization Technology

LIU Xiao-ying, WANG Huai, WU Jisiguleng

Computer Science. 2022, 49 (12): 301-304. doi:10.11896/jsjkx.210600166

Abstract

PDF(1588KB) ( 313 )

References | Related Articles | Metrics

Since the introduction of neural networks,text summarization techniques continue to attract the attention of resear-chers.Similarly,generative adversarial networks(GANs)can be used for text summarization because they can generate text features or learn the distribution of the entire sample and produce correlated sample points.In this paper,we exploit the features of generative adversarial networks(GANs)and use them for abstractive text summarization tasks.The proposed generative adversa-rial model has three components:a generator,which encodes the input sentences into shorter representations;a readability discriminator,which forces the generator to create comprehensible summaries;and a similarity discriminator,which acts on the generator to curb the discorrelation between the outputted text summarization and the inputted text summarization.In addition,Chinese WordNet is used as an external knowledge base in the similarity discriminator to enhance the discriminator.The generator is optimized using policy gradient algorithm,converting the problem into reinforcement learning.Experimental results show that the proposed model gets high ROUGE evaluation scores.

PosNet:Position-based Causal Relation Extraction Network

ZHU Guang-li, XU Xin, ZHANG Shun-xiang, WU Hou-yue, HUANG Ju

Computer Science. 2022, 49 (12): 305-311. doi:10.11896/jsjkx.211100264

Abstract

PDF(2916KB) ( 392 )

References | Related Articles | Metrics

Causal relation extraction is a natural language processing technology to extract causal entity pairs from text,which is widely used in financial,medical and other fields.Traditional causal relationship extraction technology needs to manually select text features for causal matching or use neural networks to extract features many times,resulting in complicated model structure and low extraction efficiency.To solve this problem,this paper proposes a position-based causal relation extraction network(PosNet) to improve the efficiency of causal relation extraction.Firstly,it preprocesses the text and constructs multi-granularity text features as the input of the network.Then passing the text features into the position prediction network,and predicting the start and end positions of causal entities by the classical shallow convolution neural network.Finally,the causal entities are assembled according to the start and end positions by the assembling algorithm,so that all causal entity pairs are extracted.Experimental results show that PosNet can improve the efficiency of causal relation extraction.

Re-lightweight Method of MobileNet Based on Low-cost Deformable Convolution

SUN Chang-di, PAN Zhi-song, ZHANG Yan-yan

Computer Science. 2022, 49 (12): 312-318. doi:10.11896/jsjkx.211200036

Abstract

PDF(2813KB) ( 348 )

References | Related Articles | Metrics

In recent years,with the development of unmanned driving,intelligent UAV and mobile Internet,the demand for lightweight neural network from low-power,low-cost mobile and embedded platforms is increasingly urgent.Based on the idea of deformable convolution and depthwise separable convolution,this paper presents a low-cost deformable convolution,which has the advantages of high-efficiency feature extraction ability of deformable convolution and low computational complexity of depthwise separable convolution.In addition,on the basis of applying low-cost deformable convolution and combining with the method of model structure compression,4 lightweight methods of MobileNet network are designed.Experiments on Caltech256,CIFAR100 and CIFAR10 datasets demonstrate that low-cost deformable convolution can effectively improve the classification accuracy of lightweight networks without significant increase in computational effort.Besides,the accuracy of the MobileNet network can be improved by 0.4%~1% by combining the 4 MobileNet re-lightening methods in this paper,while the network computing load can be reduced by 5% ~ 15%,which significantly improves the performance of the lightweight network and better meets the practical needs of low power consumption and low computing power.It has very important practical significance for the advancement of intelligence in the field of mobile and embedded platforms.

Fault Detection Based on Dead Reckoning in VANETs

LIU Jia-xi, WU Na, DING Fei

Computer Science. 2022, 49 (12): 319-325. doi:10.11896/jsjkx.220200155

Abstract

PDF(2327KB) ( 317 )

References | Related Articles | Metrics

Fault detection is one of the basic components of fault-tolerant system,which can ensure the safe and reliable implementation of applications on vehicular ad hoc networks.However,vehicular ad hoc networks are different from traditional mobile ad hoc networks.On the one hand,vehicles have high-speed mobility and may join or leave the system at any time,which is likely to make the network environment to be changeable.On the other hand,the links between vehicles are often interrupted due to environmental and equipment factors,which is likely to cause message loss.In order to solve the above problems,a hierarchical fault detection method based on dead reckoning is proposed.In this fault detection method,the dead reckoning model is used to predict the transmission time of heartbeat messages,and the roadside unit is considered as a static node to build a hierarchical detection architecture,so as to improve the performance of fault detection in the vehicular ad hoc networks.Using the NS2 to build the si-mulation experimental platform for performance verification,experimental results show that the proposed fault detection method is better than the comparative fault detection method in terms of detection speed,detection accuracy and detection overhead.

Dynamic Spectrum Decision-making Method for UAV Swarms in Jamming Environment

QIU Wen-jing, HAN Chen, LIU Ai-jun

Computer Science. 2022, 49 (12): 326-331. doi:10.11896/jsjkx.220400228

Abstract

PDF(2368KB) ( 462 )

References | Related Articles | Metrics

Unmanned aerial vehicles(UAVs) are widely used in the military field due to their low cost and flexible deployment,which are usually carried out in the form of cluster network for cooperative transmission.Because of the broadcasting nature of wireless transmission and the line-of-sight transmission characteristics,UAV swarm is vulnerable to malicious jamming attacks.Moreover,due to the scarcity of spectrum resources,the UAV swarm needs to share the limited spectrum resources,which will introduce severe co-channel interference.Therefore,the problem of cooperative spectrum sharing among UAV swarm is not only threatened by the malicious jamming,but also limited by the mutual interference among UAVs.Specifically,an optimization problem is formulated to maximize the sum rate of UAV swarms in the jamming environment.To improve the effectiveness and reliability of the UAVs’ transmission,this paper proposes a distributed spectrum decision-making method based on the coalition formation game,to deal with the external jamming threats and the internal mutual interference.Thus,the dynamic,efficient and intelligent spectrum control can be realized for the UAV swarm under the jamming threat.Meanwhile,with the help of potential game,the proposed anti-jamming coalition formation game turns out to be able to form the stable alliance grouping,and achieve the Nash equilibrium.

Intelligent Routing Technology for Multi-terminal Access in Integrated Network

XU Yi-ming, MA Li, FU Ying-xun, LI Yang, MA Dong-chao

Computer Science. 2022, 49 (12): 332-339. doi:10.11896/jsjkx.210900042

Abstract

PDF(2893KB) ( 315 )

References | Related Articles | Metrics

Aiming at the problem of network load balancing caused by the drastic fluctuation of network traffic caused by the access of a large number of terminal devices in the integrated heterogeneous network,an intelligent routing algorithm TDANRA based on reinforcement learning is proposed.Fine-grained and high-precision network traffic status parameters are obtained by software-defined network technology,TDANRA algorithm automatically generates real-time routing policies based on network traffic status and link bandwidth utilization threshold adjustment mechanism to guide the forwarding of network traffic,so as to solve the problem of drastic fluctuation of network traffic.Simulation results show that TDANRA algorithm can realize load ba-lancing of network traffic and reduce end-to-end transmission delay and data packet loss rate when a large number of terminal devices are connected to the network.

Graph Convolutional Network Adversarial Attack Method for Brain Disease Diagnosis

WANG Xiao-ming, WEN Xu-yun, XU Meng-ting, ZHANG Dao-qiang

Computer Science. 2022, 49 (12): 340-345. doi:10.11896/jsjkx.220500185

Abstract

PDF(3163KB) ( 499 )

References | Related Articles | Metrics

In recent years,brain functional networks analysis using the resting state functional magnetic resonance imaging data has been widely used in computer-aided diagnosis tasks of various brain diseases.The graph convolutional network framework integrating clinical phenotypic measurements and brain functional networks improves the applicability of intelligent medical disease diagnosis models to the real world.However,the trustworthiness study is an important but still widely neglected component of disease diagnosis models based on brain functional networks.Adversarial attack techniques in medical machine learning can deceive models,which further leads to the security and trust issues of the model applied in clinical practice.Based on this,this paper proposes an adversarial attack method BFGCNattack on graph convolutional network for brain disease diagnosis,constructs a disease diagnosis model integrating clinical phenotypic measurements,and evaluates the robustness of brain functional networks-based disease diagnosis model in the face of adversarial attacks.Experimental results on the autism brain imaging data exchange dataset suggest that the models constructed using graph convolutional networks are vulnerable to the proposed adversarial attack.Even if only a small number(10%) of perturbations are performed,the model’s accuracy and classification margin significantly decrease,while the fooling rate significantly increases.

Robust Subgroup ID-based Multi-signature Scheme

TIAN Chen, WANG Zhi-wei

Computer Science. 2022, 49 (12): 346-352. doi:10.11896/jsjkx.211200101

Abstract

PDF(1652KB) ( 344 )

References | Related Articles | Metrics

The existing multi-signature scheme applied in the consensus mechanism scenario defaults that the signers are honest entities,so the security and validity of the signature could not be guaranteed when malicious nodes existed.In order to improve the robustness of multi-signature in the typical adversarial scenarios in consensus protocols,this paper proposes an ID-based multi-signature scheme based on the advantages of the ID-based cryptography system.In this signature scheme,non-fixed subgroup generates randomly cooperated to generate multi-signatures representing the entire group,and the validity of all subgroup signatures must be verified before signature aggregation.The bilinear pairings required by this scheme to generate multi-signatures are related to the number of subgroup members,which improve the security of the scheme at the cost of certain efficiency.This paper introduces a notion of robustness for robust subgroup ID-based multi-signatures,and the corresponding proof of the proposed scheme is given.Furthermore,under the random oracle model,relying on the hardness of the computational Diffie-Helman(CDH) problem,the scheme is proved is proved to be unforgeable under adaptive selection message attack.In addition,theoretical analysis and prototype implementation of the signature scheme are carried out,and the experimental results are compared with the performance of relevant signature schemes.

Reverse Location of Software Online Upgrade Function Based on Semantic Guidance

LYU Xiao-shao, SHU Hui, KANG Fei, HUANG Yu-yao

Computer Science. 2022, 49 (12): 353-361. doi:10.11896/jsjkx.211000059

Abstract

PDF(2168KB) ( 329 )

References | Related Articles | Metrics

The hijacking attack for software online upgrade is one of the most common methods of network attack.Program ana-lysis is an important method to evaluate the security of software upgrade quickly and automatically.Rapid reverse positioning of upgrade functions in software is a key premise to realize static analysis and improve the efficiency of dynamic analysis.Traditional program analysis reverse localization relies on manual experience based on the cross reference chain relation of semantic information,such as string and API function,which is inefficient and cannot be automated.To solve this problem,this paper proposes a software upgrade function localization method based on semantic analysis and reverse analysis.Firstly,an upgrade semantic classification model based on natural language processing is established for common semantic information(string,function name,API function,etc.) in software binary program.Secondly,the software semantic information is extracted by reverse analysis tool,and the upgrade semantic classification model is used to identify the upgrade semantic information.Finally,an algorithm is defined to solve the key nodes of the upgrade function in the graph tree of function call relationship.This paper designs and implements a software online upgrade positioning system,and carries out reverse positioning analysis on 153 commonly used softwares,126 of which are successfully located.The security of some software upgrades is preliminarily evaluated by positioning analysis,and one CNNVD vulnerability and five CNVD vulnerabilities are found.

Consensus Optimization Algorithm for Proof of Importance Based on Dynamic Grouping

WANG Dong, XIAO Bing-bing, JIN Chen-guang, LI Zheng, LI Xiao-ruo, ZHU Bing-nan

Computer Science. 2022, 49 (12): 362-367. doi:10.11896/jsjkx.211100282

Abstract

PDF(2009KB) ( 356 )

References | Related Articles | Metrics

Proof of stake consensus algorithm(PoS) has the advantage of not requiring arithmetic power.However,the higher the equity of the node,the higher the probability of obtaining the bookkeeping rights,resulting in a very deterministic bookkeeping node and makes it easy for the rich to get richer.Once the node with the highest equity fails to book the block properly,the rest of the nodes still have to compete for the bookkeeping rights again.The probability of system stagnation increases dramatically at this point.To address these two shortcomings,a consensus optimization algorithm for proof of importance based on dynamic grouping(DPoI) is proposed.The algorithm introduces an importance assessment scheme,which calculates the importance score iValue of nodes in each round based on node activity,transaction share,time to find random numbers and reputation.Then,the nodes with similar iValue are dynamically grouped using Fibonacci series.Within the group,the DPoS voting strategy ranking is borrowed to act as an alternative node,thus forming a disaster recovery scheme to effectively avoid system stagnation.Finally,a binary exponential backoff algorithm is designed to quickly remove malicious nodes from the system,thus effectively enhancing the security and stability of the blockchain system.Experimental results show that the speed of DPoI block-out is about 6 times faster than PoI,which significantly improves the block-out speed.When the percentage of malicious nodes reaches 70%,the binaryexponential backoff algorithm can still effectively reject malicious nodes,and the reliability of the system is fully guaranteed.

Selective Shared Image Encryption Method Based on Chaotic System and YOLO v4

ZHANG Guo-mei MA Lin-juan, ZHANG Fu-quan, LI Qing-zhen

Computer Science. 2022, 49 (12): 368-373. doi:10.11896/jsjkx.220600139

Abstract

PDF(3763KB) ( 372 )

References | Related Articles | Metrics

Aiming at the information security problem of sharing images on social platforms,a selective region of interest(ROI) image encryption scheme based on YOLO v4 and hybrid chaotic map encryption is proposed.By utilizing YOLO v4,the uploaded image is automatically detected and the candidate bounding boxes to be encrypted are provided.Then the image areas selected by user are encrypted with the proposed hybrid encryption algorithm combining cosine and polynomial mapping,so that only legally authorized users can access the sensitive information of the shared image.Through the secret key issuing and authorization mecha-nism,the protection of sensitive information of the image forwarded by a third party is realized.Statistical and security analysis results prove that the proposed scheme can resist various attacks,and the processing speed can meet the real-time needs of online users.

Data Center Power Attack Defense Strategy Based on PCPEC

OU Dong-yang, ZHANG Kai-qiang, CHEN Sheng-lei, JIANG Cong-feng, YAN Long-chuan

Computer Science. 2022, 49 (12): 374-380. doi:10.11896/jsjkx.211000065

Abstract

PDF(2096KB) ( 355 )

References | Related Articles | Metrics

Currently,due to the wide application of multi-tenancy,containerization,virtualization and power over-subscription in data centers,the possibility of power attack is becoming increasingly higher.The main means of power attack is to run malicious codes to increase the power consumption of servers,storage device and network equipment to exceed the power limit of a distribution system.And it causes server failure or circuit breaker trip,or even the interruption of the power supply system of the data centers.In order to reduce the risk of power attack on data center,this paper proposes a power capping method of performance equivalence configuration(PCPEC).This method takes advantage of the difference of power consumption in different configurations of virtual machines to implement the equivalent replacement of virtual machine configuration.Experiment result shows that PCPEC can reduce the dynamic power consumption of the server by 22.2%~29.6%,and the performance of most virtual machines increases by 2.12% after the replacement of resource configuration,thus effectively reducing the impact of power attack on the data center.