Computer Science

Select

Detection Method of Duplicate Defect Reports Fusing Text and Categorization Information

FAN Dao-yuan, SUN Ji-hong, WANG Wei, TU Ji-ping, HE Xin

Computer Science 2019, 46 (12): 192-200. DOI: 10.11896/jsjkx.181102232

Abstract （532）

PDF（pc）（1743KB）（825）

Save

Software defect is the root of software errors and failures.Software defect is caused by unreasonable requirement analysis,imprecise programming language and lack of experience of developers.Software defects are inevitable,and submitting defect reports is an important way to find and improve defects.Defect report is the carrier of describing defects,and the repair of defect report is the necessary means to improve software.Maintenance personnel and users submit reports for the same defect repeatedly,resulting in a large number of redundant reports in the defect report library.Manual triage is unable to adapt to more and more complex software systems.The detection of duplicate defect reports can filter redundant duplicate reports from defect report libraries and invests human and time in new defect reports.The prediction accuracy rate of current research methods is not high,and the difficulty is to find a suitable and comprehensive method to measure the similarity between defect reports.Based on the idea of the integration method and the python language,a new method named BSO (combination of BM25F,LSI and One-Hot) for detecting duplicate defect report was proposed by using text information and categorization information.On the basis of data preprocessing,duplicate defect report is divided into text information domain and categorization information domain.BM25F and LSI algorithms are used to get similarity scores in text information domain,and One-Hot algorithm is used to get similarity scores in categorization information domain.The similarity fusion method is used to synthesize the similarity score between text information domain and categorization information domain,and a recommendation list for each defect report corresponds to a duplicate defect report.The accuracy of the duplicate defect report detection is calculated.Compared with the baseline method and the state-of the art methods including REP and DBTM on OpenOffice.The experimental results show that the accuracy of the proposed method is 4.7% higher than that of DBTM,6.3% higher than that of REP,and higher than that of baseline method.Experiment results fully prove the effectiveness of BSO method.

Reference | Related Articles | Metrics

Select

Software Feature Extraction Method Based on Overlapping Community Detection

LIU Chun, ZHANG Guo-liang

Computer Science 2019, 46 (12): 201-207. DOI: 10.11896/jsjkx.181001856

Abstract （397）

PDF（pc）（1332KB）（679）

Save

Extracting software features from natural language of product descriptions has gained a lot of attentions in recent years.In light that the sentences in the descriptions can describe the semantics of software features more precisely and one sentence may be concerned about more than one software feature,this paper proposed a feature identification method by detecting the overlapping clusters of these sentences in the natural language descriptions.Based on the overlapping community detection algorithm (LMF),the proposed method defines a metric to measure the similarity between each pair of sentences in the descriptions,builds a sentence similarity network accordingly,and then detects the overlapping sentence communities in such network.Each sentence community is a cluster which implies one software feature,and contains all the sentences potentially describing the implied feature.Further,in order to help people better understand the characteristics of sentence communities,the proposed method designs corresponding algorithms to select the communities with the lowest entropy from all sentence communities in turn,and to select the most representative sentences from the selected communities that have not been selected by other communities as descriptors of the features contained in the community.The natural language product descriptions from Soft pedia.com were crawled as experimental data.Experimental results show that the proposed method has better performance in accuracy and time consumption.

Reference | Related Articles | Metrics

Select

Multi-objective Test Case Prioritization Method Combined with Dynamic Reduction

ZHANG Na, XU Hai-xia, BAO Xiao-an, XU Lu, WU Biao

Computer Science 2019, 46 (12): 208-212. DOI: 10.11896/jsjkx.181102106

Abstract （278）

PDF（pc）（1327KB）（754）

Save

Aiming at the shortcomings of ant colony algorithm in solving MOTCP problem,such as slow convergence rate and easy to fall into local optimum,a dynamic multi-objective test case prioritization method for online ant colony pheromone updating was proposed.The method introduces a dynamic reduction idea.Firstly,the initial test case set cove-ring the same requirements is firstly reduced according to the coverage of the requirements by each test case.Secondly,according to whether the test case can detect the error and the severity of the detected error during the execution process,a method for judging the failure degree of the test case is designed.After each iteration of the ant colony,a se-cond reduction is made to the test case in which no error is detected,so as to reduce the number of test cases that the ant colony needs to pass in the next iteration,and the sorting time is greatly reduced by two reductions.At the same time,in the process of each iteration of the ant colony,by considering the influence of the test factor importance degree,the failure degree and the actual execution time on the next round of pheromone,an online ant colony is designed to update the ant colony simultaneously under three influence factors.The pheromone method enables ant colonies to find the next test case faster and more accurately.Finally,this method,traditional ant colony sorting method and multi-objective optimization sorting method were respectively applied to multiple open source software programs for experimental comparison.The simulation results show that the prioritization method of the online update pheromone of the proposed dynamic reduction has great advantages in performance indicators such as defect detection capability and effective execution time,and can detect errors with higher severity at an earlier level.

Reference | Related Articles | Metrics

Select

Consistency Checking Algorithm for Distributed Key-Value Database Based on Operation History Graph

LIAO Bin, ZHANG Tao, LI Min, YU Jiong, GUO Bing-lei, LIU Yan

Computer Science 2019, 46 (12): 213-219. DOI: 10.11896/jsjkx.181102097

Abstract （415）

PDF（pc）（1753KB）（593）

Save

The replica mechanism of distributed database system not only improves reliability and performance of the overall system,but also leads to the consistency problem of multi-replica data management mechanism.To keep the consistency of data,a consistency protocol model is needed to avoid data’s inconsistency events.Moreover,consistency checking algorithms are also needed to detect inconsistent data.Firstly,the concepts of temporal relations,security consistency,and concurrent consistency between read and write operations are defined.Secondly,according to the parallel and temporal relationship between read and write operations that recorded in the set of operations,the rules of transforming operation record set to operation record graph are extracted,and then the algorithm of transforming operation records into operation record graph is also designed.Then,taking the set of operation record graph as input,a violation operation search algorithm is designed to find the set of inconsistent read operations which have violated security and parallel consistency.Finally,experiments are conducted based on Cassandra and the read-write consistency is set to ONE.YCSB generates parallel read-write stress tests.The comparative experiments with similar algorithms verify the advantages of the proposed algorithm in both function and efficiency.

Reference | Related Articles | Metrics

Select

Empirical Study of Code Query Technique Based on Constraint Solving on StackOverflow

CHEN Zheng-zhao, JIANG Ren-he, PAN Min-xue, ZHANG Tian, LI Xuan-dong

Computer Science 2019, 46 (11): 137-144. DOI: 10.11896/jsjkx.191100501C

Abstract （455）

PDF（pc）（1464KB）（761）

Save

Code query plays an important role in code reuse,and the Q&A about code on StackOverflow which is a professionalquestion-and-answer site for programmers is a typical scenario for code reuse.In practice,the manual way is adopted to answer questions,which usually has the disadvantages of poor real-time,incorrect description of problems,and low availability of answers.If the process of code query and search can be automated and replace manual answering, it will save a lot of manpower and time cost.Now there are already many code query technologies,but most lack experie-nce of application in the real case.Based on the ideas of Satsy,this paper implemented the code query technology based on constraint solving for Java language,and designed the empirical study.This paper used StackOverflow as the research object,and mainly studied how to apply the code query technology based on constraint solving of Q&A about code on the website.First of all,the problems on the website are analyzed,and 35 problems with high trafficin Java language are extracted as query problems.Then,about 30000 lines of code are captured from GitHub,and they are converted into the form of constraints as well as built as a large code base to support code query.Finally,through the analysis of the query results of these 35 questions,the practical application effect of the technology on StackOverflow was evalua-ted.The results show that the proposed technology has good practical application effect on the specific questions and code scale studied,and can replace the manual answer on a considerable scale.

Reference | Related Articles | Metrics

Select

Method of Microservice System Debugging Based on Log Visualization Analysis

LI Wen-hai, PENG Xin, DING DAN, XIANG Qi-lin, GUO Xiao-feng, ZHOU Xiang, ZHAO Wen-yun

Computer Science 2019, 46 (11): 145-155. DOI: 10.11896/jsjkx.181102210

Abstract （774）

PDF（pc）（2474KB）（1402）

Save

In the era of cloud computing,more and more enterprises are adopting microservice architecture for software development or traditional monolithic application transformation.However,microservice system has high complexity and dynamism.When microservice system fails,there is currently no method or tool that can effectively support the location of the root cause of failure.To this end,the paper first proposed that all business log generated on all of the ser-vices by a single request can be associated by the trace information.And on this basis,this paper studied the method of microservice system debugging based on log visualization analysis.Firstly,the model of microservice log is defined.So the data information required for log visualization analysis can be specified.Then five kinds of visual debug strategies are summarized to support the location of four kinds of typical microservice fault’s root cause.The four kinds of microservice faults are ordinary fault with exceptions,logical fault with no exceptions,fault caused by unexpected service asynchronous invocation sequences and faults caused by service multi-instances.The strategies include single trace with log information,comparison of different traces,service asynchronous invocation analysis,service multi-instances analysis and trace segmentation.Among them,in order to realize service asynchronous invocation analysis and service multi-instances analysis,this paper designed two algorithms.At the same time,a prototype tool named LogVisualization was designed and implemented.LogVisualization can collect log information,trace data,nodes information and service instance information of the cluster,generated by the microservice system runtime.It can associate the business log with trace information by less code intrusion.And it supports users to use five strategies for visual debug.Finally,the prototype tool is applied to the actual micro-service system.Compared with the existing tools (Zipkin＋ELK),the usefulness and effectiveness of prototype tool in the root location of four micro-service faults are verified.

Reference | Related Articles | Metrics

Select

Cost-sensitive Convolutional Neural Network Model for Software Defect Prediction

QIU Shao-jian, CAIZi-yi, LU Lu

Computer Science 2019, 46 (11): 156-160. DOI: 10.11896/jsjkx.191100502C

Abstract （488）

PDF（pc）（1522KB）（929）

Save

Machine-learning-based software defect prediction methods are received widely attention from the researchers in the field of software engineering.The defect distribution in the software can be analyzed by the defectprediction mo-del,so as to help the software quality assurance team to detect potential software errors and allocate test resources reasonably.However,most of the existing defect prediction methods are based on hand-crafted features such as line of code,dependency between modules and stack reference depth.These methods do not take into account the potential semantic features of the software source code and may result in poor predictions.To solve the above problems,this paper applied convolutional neural networks to mine the semantic features implicit in the source code.In the effective mining of source code semantic features,this paper used three-layer convolutional neural network to extract data abstract features.In terms of data imbalance processing,this paper adopted a cost-sensitive method,which gives different weights to positive and negative examples,and balances the impact of positive and negative examples on model training.In terms of experimental data sets,this paper selected multiple versions of the eight softwares in the PROMISE defect dataset,totaling 19 projects.In terms of model comparison,this paper compared the proposed cost-sensitive software defect prediction model based on convolutional neural network (CS-TCNN) with logistic regression and deep confidence network respectively.The evaluation metrics contain AUC and MCC,which are widely used in the field of defect prediction research.The experimental results demonstrate that CS-TCNN can effectively extract the semantic features in the program code,and improve the prediction effect of the software defect prediction model.

Reference | Related Articles | Metrics

Select

Stochastic TBFL Approach Based on Calibration Factor

WANG Zhen-zhen, LIU Jia

Computer Science 2019, 46 (11): 161-167. DOI: 10.11896/jsjkx.191100503C

Abstract （277）

PDF（pc）（1406KB）（646）

Save

Approaches for fault localization based on test suites are now collectively called TBFL (Testing Based Fault Localization).However,current algorithms have not taken advantages of the prior knowledge about test cases and program,so that they waste these valuable “resources”.Literature [12] introduced a new kind of stochastic TBFL approach whose spirit is to combine the prior knowledge with actual testing activities under stochastic theory,so as to locate program faults.This algorithm may be regarded as a general pattern of this kind of approach,from which people can deve-lop various algorithms.The approach presented in this paper was simplifying the TBFL algorithm.It mainly revises the prior probability of program variable X from separate testing activity of each test case.If there are n test cases,n calibration factors can be obtained.These n calibration factors are then added and standardized,finally the posterior probability of the program is obtained.The approach proposed in this paper is called stochastic TBFL approach just because it depends on a calibration factor matrix.This paper presented three standards for comparing different TBFL approaches.Based on these standards,the improved approach is feasible for some instances.

Reference | Related Articles | Metrics

Select

Modified Neural Language Model and Its Application in Code Suggestion

ZHANG Xian, BEN Ke-rong

Computer Science 2019, 46 (11): 168-175. DOI: 10.11896/jsjkx.191100504C

Abstract （395）

PDF（pc）（1928KB）（725）

Save

Language models are designed to characterize the occurrence probabilities of text segments.As a class of important model in the field of natural language processing,it has been widely used in different software analysis tasks in recent years.To enhance the learning ability for code features,this paper proposed a modified recurrent neural network language model,called CodeNLM.By analyzing the source code sequences represented in embedding form,the model can capture rules in codes and realize the estimation of the joint probability distribution of the sequences.Considering that the existing models only learn the code data and the information is not fully utilized,this paper proposed an additional information guidance strategy,which can improve the ability of characterizing the code rules through the assistance of non-code information.Aiming at the characteristics of language modeling task,alayer-by-layer incremental nodes setting strategy is proposed,which can optimize the network structure and improve the effectiveness of information transmission.In the verification experiments,for 9 Java projects with 2.03M lines of code,the perplexity index of CodeNLM is obviously better than the contrast n-gram class models and neural language models.In the code suggestion task,the average accuracy (MRR index) of the proposed model is 3.4%~24.4% higher than the contrast methods.The experimental results show that except possessing a strong long-distance information learning capability,CodeNLM can effectively model programming language and perform code suggestion well.

Reference | Related Articles | Metrics

Select

Ensemble Model for Software Defect Prediction

HU Meng-yuan, HUANG Hong-yun, DING Zuo-hua

Computer Science 2019, 46 (11): 176-180. DOI: 10.11896/jsjkx.180901685

Abstract （368）

PDF（pc）（1263KB）（928）

Save

Software defect prediction aims to identify defective modules effectively.Traditional classifiers have good predictive effect on class-balanced data,but when the proportion of data classes is unbalanced,the traditional classifiers incline to majority classes,easily leading to the misclassification of minorityclass module.In reality,the data in software defect prediction are often unbalanced.In order to deal with this kind of class imbalance problem in software defects,this paper proposed an integrated model based on improved class weight self-adaptation,soft voting and threshold mo-ving.This model considers the class imbalance problem in the training stage and decision stage without changing the original data sets.Firstly,in class weight learning stage,the optimal weights of different classes are obtained through class weight adaptive learning.Then,in the training stage,three base classifiers are trained by using the optimal weights obtained in the previous step,and the three base classifiers are combined by soft ensemble method.Finally,in the decision stage,the decision is made according to the threshold moving model to get the final prediction category.In order to prove the validity of the proposed method,the NASA software defect standard data sets and the Eclipse software defect standard data sets are used for prediction,and the proposed method is compared with the results of several software defect prediction methods proposed in recent years on the recall rate Pd,false positive rate Pf and F1 measurement F-measure.The experimental results show that the recall rate Pd and F1 measurement F-measure of the proposed method improves by 0.09 and 0.06 on average respectively.Therefore,the overall performance of proposed method for dealing with class imbalance in software defect prediction is superior to other software defect prediction methods,and it has better prediction effect.

Reference | Related Articles | Metrics

Select

Storage and Query Model for Localized Search on Temporal Graph Data

ZHAO Ping, SHOU Li-dan, CHEN Ke, CHEN Gang, WU Xiao-fan

Computer Science 2019, 46 (10): 186-194. DOI: 10.11896/jsjkx.19100530C

Abstract （402）

PDF（pc）（1876KB）（792）

Save

The temporal graph data is a graph structure data in which the entities are related to each other,and the entity attributes and the relationships between the entities frequently change.This model is applicable to product and user relationships representation in e-commerce,knowledge graphs that contains the history,and corporate organizational structure management.Aiming at the challenge of establishing a general storage scheme for time-varying graph data,this paper proposed a local-domain query based scheme for storing and retrieving time-varying graph data,which is based on the advantage of graph traversal on graph databases and the advantages of distributed key-value databases,achieving universal expression and provide rich expressions for storing graph data.Experiment results show that the system has significant advantages in the storage of historical attributes.

Reference | Related Articles | Metrics

Select

Foreign Key Detection Algorithm for Web Tables Based on Conflict Dependency Elimination

WANG Jia-min, WANG Ning

Computer Science 2019, 46 (10): 195-201. DOI: 10.11896/jsjkx.180901748

Abstract （359）

PDF（pc）（1590KB）（645）

Save

As one of the most important constraints in databases,foreign key relationships between tables are crucial for data integration and analysis.For large amount of web tables,foreign keys are not specified in most cases.Therefore,detection of foreign key becomes a significant step in understanding and utilizing the web tables.Current researches mainly focus on the search for inclusion dependencies between attributes,and foreign key detection methods on traditional relational tables cannot solve the problem about large number of conflicting foreign keys due to the heterogeneity of web tables.Considering the conflict dependency between web tables,a foreign key detection algorithm for web tables based on conflict dependency elimination was proposed.Firstly,the concept of conflict dependency is proposed,and an inclusion dependency graph is established for the candidate foreign key relationship.Subsequently,the layer structure of the inclusion dependency graph is constructed,and the definition for the strength of candidates is given.Finally,based on the layer-by-layer elimination of conflict dependency,true foreign key relationships are obtained.To verify the effectiveness of the algorithm,this paper selected 3 true web table datasets as experimental datasets,which are the WIKI dataset with complete schema specification,the DWTC and WDC datasets without schema information.The proposed algorithm has been compared with the other two foreign key detection methods on above datasets in terms of accuracy,recall and F-measure.The experimental results show that the accuracy,recall andF-measure of the proposed algorithm on WIKI and DWTC are more superior than existing algorithms.The proposed algorithm outperforms other algorithms especially on WDC,the largest and the most up-to-data web table corpus,it’s accuracy,recall and F-measure to 0.89,0.88 and 0.89.Therefore,the proposed foreign key detection algorithm is more suitable for web tables.Compared with the existing methods,it has higher accuracy,recall and F-measure.

Reference | Related Articles | Metrics

Select

Data Replicas Distribution Transition Strategy in Cloud Storage System

WU Xiu-guo, LIU Cui

Computer Science 2019, 46 (10): 202-208. DOI: 10.11896/jsjkx.180901623

Abstract （347）

PDF（pc）（2045KB）（648）

Save

Replication is a common method used to improve the data access reliability and system fault tolerance in the cloud storage system.It is one of the most important topics to reschedule the replicas distribution dynamically according to the changes of users’ requirements and environment in the replicas management.However,current replicas redistribution strategies mostly focus on the new replicas schemes,such as replicas number and their placements,on the pre-mise that it can be completed automatically,without taking into account the task scheduling problem in practice.In fact,data replica distribution transition is a complex scheduling problem involving data replicas migration and deletion among data centers.In addition,the required disk space and time of different scheduling strategies have large differences,lea-ding to big difference in cost and efficiency.In this way,this paper first proposed a data replicas distribution transition model and feasibility analysis in the cloud storage environment.Also a minimum-cost data replicas distribution transition problem definition was presented,and then its complexity was proven based on 0-1 Knapsack problem.Besides random strategy,three transition strategies (MTCF,MOCF and MTCFSD) were given from minimum-cost view.In the end,a series of experiments were performed on CloudSim simulation platform.The results show that nearly 60% of transmission number and 50% of transmission cost are reduced compared with other methods,indicating the proposed method’sreliability and effectiveness,so as to further improve the cloud storage system performance.

Reference | Related Articles | Metrics

Select

Evaluation Model of Software Quality with Interval Data

YUE Chuan, PENG Xiao-hong

Computer Science 2019, 46 (10): 209-214. DOI: 10.11896/jsjkx.180801554

Abstract （256）

PDF（pc）（1775KB）（651）

Save

For the defects of traditional evaluation methods,a new evaluation model of software quality was developed in this paper.First,aimed at the existing problems of present projection measures,a new normalized projection measure is provided in this research.Second,an evaluation model of software quality with interval data is established,which is based on the new projection model and TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) technique.Then an assessment procedure is elaborated in a group decision-making setting.The evaluation matrices,weighted evaluation matrices,positive and negative ideal decisions and relative closeness are involved in this model.The evaluation information is based on a questionnaire survey.Finally,the effectiveness and feasibility of the developed method are illustrated by a practical example and an experimental analysis.The experimental results show that the evaluation model has the advantages in robustness and practicability.

Reference | Related Articles | Metrics

Select

Embedded Software Reliability Model and Evaluation Method Combining AADL and Z

LI Mi, ZHUANG Yi, HU Xin-wen

Computer Science 2019, 46 (8): 217-223. DOI: 10.11896/j.issn.1002-137X.2019.08.036

Abstract （494）

PDF（pc）（1358KB）（846）

Save

In the early stage of embedded software development,a reliability model is established for it to discover problems in software design as early as possible,thereby saving embedded software development costs.AADL establishes software reliability model from two aspects of software structure and fault propagation.However,the semi-formal nature of AADL makes it difficult to analyze and verify the non-functional attributes such as reliability and security.The formal specification language Z language has a strong logical description ability and can accurately express various constraints in the software,which makes the reliability model based on the Z language well rigorously analyzed and verified.Therefore,considering the characteristics of AADL and Z,an embedded software reliability model combined with Z and AADL (ZARM) was proposed.The modeling methods of ZARM fault model,structure model and behavior model were given,and the data constraints related to reliability were described in the predicate.Based on the ZARM model,a probabilistic DTMC-based reliability evaluation method was proposed to quantitatively evaluate and analyze the ZARM model.Finally,the process of reliability modeling using ZARM model was described by a flight management system (FMS),and the reliability evaluation was carried out by using the proposed evaluation method.The comparison between the evaluation results and the reference [19] results shows the correctness and effectiveness of the proposed method.

Reference | Related Articles | Metrics

Select

Method for Identifying and Recommending Reconstructed Clones Based on Software Evolution History

SHE Rong-rong, ZHANG Li-ping

Computer Science 2019, 46 (8): 224-232. DOI: 10.11896/j.issn.1002-137X.2019.08.037

Abstract （481）

PDF（pc）（3458KB）（610）

Save

The research on the existing clone code reconstruction is limited to a single version of static analysis while ignoring the evolution process of the cloned code,resulting in a lack of effective methods for reconstructing the cloned code.Therefore,this paper firstly extracted the evolution history information closely related to the clone code from clone detection,clone mapping,clone family and software maintenance log management system.Secondly,the clone code that needs to be reconstructed was identified,and the traced clone code was identified at the same time.Then,static features and evolution features were extracted and reconstructed and a feature sample database was built.Finally,a variety of machine learning methods were used to compare and select the best classifier recommended reconstruction of clones.In this paper,experiments were performed on nearly 170 versions of 7 software.The results show that the readiness for reconstructing cloned code is more than 90%.It provides more accurate and reasonable code reconstruction suggestions for software development and maintenance personnel

Reference | Related Articles | Metrics

Select

Test Case Prioritization Method Based on AHP for Regression Testing

FENG Shen-feng, GAO Jian-hua

Computer Science 2019, 46 (8): 233-238. DOI: 10.11896/j.issn.1002-137X.2019.08.038

Abstract （406）

PDF（pc）（1595KB）（907）

Save

Test case prioritization methods are based on specific criteria to sort test cases to improve the test efficiency.Considering that the existing techniques are limited to single objective or a few influencing factors,which affect the comprehensive analysis and evaluation of test cases,this paper proposed a test case prioritization method based on analytic hierarchy process.This method aims at optimizing test case sequence,takes the influencing factors as the criterion,and takes the test cases as schemes.It constructs hierarchical structure model and judgment matrices.Lastly,it sorts the test cases,carries out the consistency check,and optimizes the ratio of influencing factors.The experiment uses Matlab software and the APFD as the metric to evaluate.Experimental results show that compared with other existing prioritization methods,this method achieves higher APFD value of 85% and improves the test efficiency.In addition,according to actual requirements,it increases the number of influencing factors,so that it can be flexible

Reference | Related Articles | Metrics

Select

Priority Ranking Method of Test Cases Based on Fault Location

CHEN Jing, SHU Qiang, XIE Hao-fei

Computer Science 2019, 46 (8): 239-243. DOI: 10.11896/j.issn.1002-137X.2019.08.039

Abstract （604）

PDF（pc）（1765KB）（721）

Save

Protocol conformance testing is a method to verify whether the tested implementation is consistent with the standard protocol specification,which can ensure the interconnection and interworking of the equipment or system in accordance with the protocol.In the process of debugging,upgrading and repairing the tested equipment,it is often necessary to re-execute all test cases to ensure the completeness of protocol conformance testing.In the process of protocol implementation,it is necessary to test frequently and repairs this process until the protocol implementation of the tested equipment fully conforms to the protocol standard specification.In each regression process,the unstrategic execution of all test cases in the test case set will increase the workload of the test.Only at the end of all test cases,whether the test failure has been repaired correctly,or if other new failures have been detected,can be determined.As a result,some test cases that can detect faults can not be executed as soon as possible,and the test can not focus on the error-prone parts.The cost of test execution is large,which affects the test efficiency.Therefore,in the process of protocol conformance testing,how to optimize the huge test case set and reduce the test cost.Under the premise of ensuring the test requirements,using as few test cases as possible to detect the faults in the system as soon as possible,and improving the test fault detection rate has become an urgent problem to be solved.In this paper,based on the research of the existing test case priority sorting methods,the test case priority sorting algorithm based on fault location was improved,so as to improve the efficiency of fault detection.Combined with the dependence between test requirements,the dynamic adjustment of sequence is performed,and the test cases with high error detection probability are selected dynamically.The algorithm is verified effectively on the protocol conformance test system of wireless sensor networks.Compared with the Additional and FTP algorithms,its average percentage of fault detection APFD and test cost TCFD increases by at least 9.2% and 7.6% respectively.

Reference | Related Articles | Metrics

Select

Vulnerability Discovery Approach Based on Similarity Matching of Program Slicing

LIU Qiang,KUANG Xiao-hui,CHEN Hua,LI Xiang,LI Guang-ke

Computer Science 2019, 46 (7): 126-132. DOI: 10.11896/j.issn.1002-137X.2019.07.020

Abstract （562）

PDF（pc）（1331KB）（1228）

Save

Vulnerability analysis method based on similarity matching is one of the most effective vulnerability analysis methods.How to reduce the false positive rate without reducing the false negative rate,and increase the efficiency,are the main goals to optimize the method.Aiming at these challenges,this paper proposed an optimization vulnerability analysis framework based on similarity matching of program slices.In the framework,the methods of code slice,feature extraction and vectorization based on vulnerable key points were studied.The core ideal of the framework is taking vulnerability semantic context slice of the vulnerability code as a reference to calculate the similarity between the slice of the tested code and the vulnerable sample slice,and determining the likelihood of a vulnerability.This paper implemented this framework and validateed it with open source projects with known vulnerabilities.Compared with the existing research,the vulnerability slice similarity framework has the ability to close describe the vulnerability context,and the vulnerability discovery method based on similarity matching is optimized by the slice technique.The proposed framework and method are verified to effectively reduce the false positive rate and false negative rate ofvulnerabi-lity discovery.

Reference | Related Articles | Metrics

Select

Automatic Vulnerability Detection and Test Cases Generation Method for Vulnerabilities Caused by SEH

HUANG Zhao,HUANG Shu-guang,DENG Zhao-kun,HUANG Hui

Computer Science 2019, 46 (7): 133-138. DOI: 10.11896/j.issn.1002-137X.2019.07.021

Abstract （735）

PDF（pc）（1564KB）（991）

Save

Structured Exception Handling (SEH),which offered by Windows operating system,is a way to handle program errors or exceptions.However,while SEH handles exception based on link,there may be corresponding vulnerabi-lities.To solve this problem,in order to improve program security,a method was proposed to generate test cases base on SEH.First,the method judge whether the program has the risk of being attacked based on the SEH.If there is a risk,the test case constraints are constructed and adjusted.Then by solve these constraints,the corresponding test cases are generated automatically.On the one hand,this method extends the current automatic test case generation pattern.And on the other hand,it can generate effective test cases even when GS protection is turned on.Finally,the effectiveness of the method is verified by experiments.

Reference | Related Articles | Metrics

Select

Matrix Formalization Based on Coq Record

MA Zhen-wei,CHEN Gang

Computer Science 2019, 46 (7): 139-145. DOI: 10.11896/j.issn.1002-137X.2019.07.022

Abstract （525）

PDF（pc）（1265KB）（968）

Save

Matrix has a wide range of applications in engineering systems,and the correctness of matrix operations has an important impact on the reliability of engineering systems.Coq is a powerful higher-order theorem prover based on the type λ calculus.Although the Coq type system can describe variable-sized dynamic data types well,there is a lack of satisfactory description mechanisms for data types such as fixed-size vectors and matrices.At present,there is no vector library or matrix library in the Coq library,so it is more difficult to use Coq to formally verify the theorem or algorithm involving the matrix.In order to solve these problems,this paper proposed a matrix implementation method based on Record type,defined a set of basic matrix functions and proved their basic properties.The verification of the flight control transformation matrix can be done easily by using the matrix types and related lemmas provided in this paper.Compared with other matrix implementation methods,this method is not only relatively simple to implement,but also simpler and more convenient to use.

Reference | Related Articles | Metrics

Select

Test Case Generation Method Based on Particle Swarm Optimization Algorithm

ZHANG Na,TENG Sai-na,WU Biao,BAO Xiao-an

Computer Science 2019, 46 (7): 146-150. DOI: 10.11896/j.issn.1002-137X.2019.07.023

Abstract （733）

PDF（pc）（1397KB）（961）

Save

In order to solve the problem of premature convergence and being easy to fall into local extremum in standard particle swarm optimization,this paper put forward a particle swarm optimization based on reverse-learning and search-again for test case generation.Firstly,the learning factor is improved by the nonlinear decreasing inertia weight function,realizing the preliminary search for the population,and the gradient descent method is used to complete the search-again of the optimal solution and the suboptimal solution.Secondly,setting taboo areas with extreme points as the center,the population diversity is improved by the reverse learning of the particles outside the taboo region.Finally,the branch distance method is used to construct fitness function to evaluate the quality of test cases.Experiment results show that the proposed method has advantages in coverage,iteration times and defect detection rate.

Reference | Related Articles | Metrics

Select

Analysis on Behavior Characteristics of Developers in Github

LI Cun-yan, HONG Mei

Computer Science 2019, 46 (2): 152-158. DOI: 10.11896/j.issn.1002-137X.2019.02.024

Abstract （508）

PDF（pc）（5570KB）（1007）

Save

Analysis of the behavior characteristics of developers in open source environment is one of the important issues to promote the development of open source community.This paper regarded the data of Github open source community as the research object,analyzed the influence factors of developer contribution degree on Github and explored the cooperative relationship between developers through utilizing the visualization analysis technology,and further dissected the relationship between the region that the developers belong to and the collaboration of developers.Some phenomena and conclusions with important theories and time values can be obtained from the study,revealing some behavioral cha-racteristics of developers from a new perspective.

Reference | Related Articles | Metrics

Select

Approach for Generating Class Integration Test Sequence Based on Dream Particle Swarm Optimization Algorithm

ZHANG Yue-ning, JIANG Shu-juan, ZHANG Yan-mei

Computer Science 2019, 46 (2): 159-165. DOI: 10.11896/j.issn.1002-137X.2019.02.025

Abstract （428）

PDF（pc）（1750KB）（690）

Save

Determination of class integration test sequence is an important topic in object-oriented software integration testing.Reasonable class integration test sequence can reduce the overall complexity of test stub,and then reduce test cost.For particle swarm optimization algorithm,it is easy to be precocious.So a class integration test sequence determination method based on dream particle swarm optimization algorithm was proposed in this paper.First,each sequence is taken as a particle in one dimensional space.Then,every particle is considered to be a dreamer.Each iteration cycle is divided into two phases:day and night.In the daytime,particles move to new locations,and during the night,they contort the locations gained at day phase according to dreaming ability.In this way,particle has the opportunity to search near the current location,so that the algorithm can converge slowly and avoid falling into local optimum too early.The experimental results show that the proposed approach takes a lower test cost in most cases.

Reference | Related Articles | Metrics

Select

Study on Fractal Features of Software Networks

PAN Hao, ZHENG Wei, ZHANG Zi-feng, LU Chao-qun

Computer Science 2019, 46 (2): 166-170. DOI: 10.11896/j.issn.1002-137X.2019.02.026

Abstract （440）

PDF（pc）（1928KB）（630）

Save

With the development of the Internet technology,the scale of software architecture is growing beyond the requirements,and the software function changes the structure of software network.The fractal structure of the software network reflects the self-similarity of the modules and the whole software network,and can analyze the architecture of software from the code level.This paper researched the fractal characteristic of software networks.First,the software networks are weighted by the complex relationships between the classes.Further more,a network centrality based box algorithm is utilized to calculate the fractal dimension of the software network.At last,two representative software of spring and struts2 are analyzed through experiments,and the results show that the both two framework and their mo-dules have fractal features and the fractal dimension of the software network increases with the complexity of the mo-dule.The fractal dimension of the software network with more comprehensive functions is bigger than that of the software network with special functions.Also the fractal dimension increases with the evolution of the software version in which the software function is gradually improved.

Reference | Related Articles | Metrics

Select

Geo-semantic Data Storage and Retrieval Mechanism Based on CAN

LU Hai-chuan, FU Hai-dong, LIU Yu

Computer Science 2019, 46 (2): 171-177. DOI: 10.11896/j.issn.1002-137X.2019.02.027

Abstract （485）

PDF（pc）（1871KB）（775）

Save

Semantic technology can search information more intelligently and accurately,and assist researchers to make scientific decisions.Therefore,this technology has been introduced into geographic information processing and formed a geo-query language GeoSPARQL based on RDF (Resource Description Framework).However,the existing application platforms based on geographic semantic information processing adopt centralized storage and retrieval services,which will cause the disadvantages of single node failure and poor scalability.Although researchers have proposed a variety of methods to use peer-to-peer network to improve the reliability and scalability of application systems,these methods do not consider the characteristics of geographic semantic data.In view of the above problems,this paper considered the feature of geographical semantic data and optimized the storage of semantic data on the peer-to-peer network.This paper proposed a storage and retrieval scheme based on content addressed network,and also improved the retrieval efficiency of semantic data by mapping the triple to the network according to its position.The experimental results show that the proposed scheme has good expansibility,and the query efficiency of topology relation is superior to the existing schemes.

Reference | Related Articles | Metrics

Select

Software Operational Reliability Growth Model Considering End-user Behavior and Fault-correction Time Delay

YANG Jian-feng, ZHAO Ming, HU Wen-sheng

Computer Science 2019, 46 (1): 212-218. DOI: 10.11896／j.issn.1002-137X.2019.01.033

Abstract （242）

PDF（pc）（1528KB）（581）

Save

Most of traditional software reliability models assume that the testing environment and the opera-ting environment are the same,that is,the software reliability model using failure data during the testing phase can predict the ope-rational reliability.It is well known that correcting bugs will improve software reliability,while another phenomenon occurs:the failure rate has decreased asthe users are more familiar with the system.In this paper,the inherent fault-detection process (IFDP),inherent fault-correction process (IFCP) and external fault-detection process (EFDP) were discussed.Moreover,a software operational reliability growth model considering end-user behavior and fault-correction time delay was proposed.By using the real data from end-users bug tracking data for open source software,the numerical results show that the proposed model is useful and powerful.

Reference | Related Articles | Metrics

Select

Code-predicting Model Based on Method Constraints

FANG Wen-yuan, LIU Yan, ZHU Ma

Computer Science 2019, 46 (1): 219-225. DOI: 10.11896／j.issn.1002-137X.2019.01.034

Abstract （271）

PDF（pc）（1825KB）（610）

Save

The state-of-the-art study shows that extracting the code features from a large amount of source codes and building the statistical language model have good predictive ability for the codes.However,the present methods still can be polished in the predicting accuracy,because when they build the existing statistical language model,the text information in the codes is often used as feature words,which means that the syntax structure information of the codes can not be fully utilized.In order to improve the predicting performance of the code,this paper proposed the concept of the constraint relation of methods.Based on this,this paper studied the method invocation sequence of Java objects,abstracted code features,and built the statistical language model to complete the code prediction.Moreover,this paper studied the application scope of the prediction model based on the method constraint relationship in Java language.Experiments show that this method improves the accuracy by 8% compared with the existing model.

Reference | Related Articles | Metrics

Select

Fault Tree Module Expansion Decomposition Method Based on Liner-time Algorithm

SONG Jun-hua, WEI Ou

Computer Science 2019, 46 (1): 226-231. DOI: 10.11896／j.issn.1002-137X.2019.01.035

Abstract （336）

PDF（pc）（1382KB）（573）

Save

Fault tree is widely used to analyze the security in many safety-critical fields including nuclear industry,aerospace engineering and traffic control.However,large amount of computation resources are consumed when large-scaled fault tree is analyzed in major industries such as nuclear power station,leading to low efficiency and excessive time consumption.In order to solve this problem,this paper made several extensions based on existing linear-time algorithm,and proposed a new fault tree reduction rules and modular derivation based decomposition algorithm.Firstly,the concept of equivalent event is presented to extend the number of modules decomposed by linear-time algorithm.Based on the consideration of time complexity and resource utilization,new reduction rules are proposed to get rid of the redundant information in fault tree.Experimental results show that the proposed decomposition method can optimize the fault tree ana-lysis effectively,and reduce the time consumption and memory usage when dealing with the large-scaled fault tree.

Reference | Related Articles | Metrics

Select

Reverse k Nearest Neighbor Queries in Time-dependent Road Networks

LI Jia-jia, SHEN Pan-pan, XIA Xiu-feng, LIU Xiang-yu

Computer Science 2019, 46 (1): 232-237. DOI: 10.11896／j.issn.1002-137X.2019.01.036

Abstract （451）

PDF（pc）（1577KB）（612）

Save

Most existing efficient algorithms for reverse k nearest neighbor query focus on the Euclidean space or static networks,and few of them study the reverse k nearest neighbor query in time-dependent networks.However,the exis-ting algorithm is inefficient if the density of interest points is sparse or the value of k is large.To address these problems,this paper proposed a sub net division based reverse k nearest neighbor query algorithm mTD-SubG.Firstly,the entire road network is divided into subnets with the same size,and they are expanded to other subnets through the border nodes to speed up the search process for interest points.Secondly,the pruning technology is utilized to narrow the expansion range of road network.Finally,the existing nearest neighbor query algorithm of time-dependent road networks is used for each searchedinterest points to determine whether it belongs to the reverse k nearest neighbor results.Extensive experiments were conducted to compare the proposed algorithm mTD-SubG with the existing algorithm mTD-Eager.The results show that the response time of mTD-SubG is 85.05% less than that of mTD-Eager,and mTD-SubG reduces the number of traversed nodes by 51.40% compared with mTD-Eager.

Reference | Related Articles | Metrics