Database Technology

Select

Geo-semantic Data Storage and Retrieval Mechanism Based on CAN

LU Hai-chuan, FU Hai-dong, LIU Yu

Computer Science 2019, 46 (2): 171-177. DOI: 10.11896/j.issn.1002-137X.2019.02.027

Abstract （497）

PDF（pc）（1871KB）（777）

Save

Semantic technology can search information more intelligently and accurately,and assist researchers to make scientific decisions.Therefore,this technology has been introduced into geographic information processing and formed a geo-query language GeoSPARQL based on RDF (Resource Description Framework).However,the existing application platforms based on geographic semantic information processing adopt centralized storage and retrieval services,which will cause the disadvantages of single node failure and poor scalability.Although researchers have proposed a variety of methods to use peer-to-peer network to improve the reliability and scalability of application systems,these methods do not consider the characteristics of geographic semantic data.In view of the above problems,this paper considered the feature of geographical semantic data and optimized the storage of semantic data on the peer-to-peer network.This paper proposed a storage and retrieval scheme based on content addressed network,and also improved the retrieval efficiency of semantic data by mapping the triple to the network according to its position.The experimental results show that the proposed scheme has good expansibility,and the query efficiency of topology relation is superior to the existing schemes.

Reference | Related Articles | Metrics

Select

Optimization of Spark RDD Based on Non-serialization Native Storage

ZHAO Jun-xian, YU Jian

Computer Science 2019, 46 (5): 143-149. DOI: 10.11896/j.issn.1002-137X.2019.05.022

Abstract （387）

PDF（pc）（1501KB）（954）

Save

Spark framework is taken as the computing framework of big data by more and more enterprises.However,with the increasing of available memory resource of current severs,Spark can’t match with new environment well.Spark runs on Java Virtual Machine (JVM).Asheap space memory is used heavily,the ratio of time cost produced by Java virtual machine to provide space for new objects by reclaiming memory(GC) to total time cost of Spark jobs increases significantly,but the efficiency of Spark jobs doesn’t improve with a certain ratio when the available memory increases.After using OffHeap (native) memory storage mode,the cost of serialization/deserialization becomes the new conflict point instead of GC.This paper used the way of native storage to deal with GC problem,and speeded up the job by reducing the overhead of GC.This paper also proposed and modified the storage structure of Spark,and improved the elimination mechanism and the caching way of RDD.The data without serialization are moved into native memory,realizing low garbage collection overhead and avoiding the time spending on serialization.Experimental results demonstrate that the GC cost of modification method on server with single node and large memory is 5% to 30% compared with the storage on heap of Spark.Meanwhile,the overhead of serialization decreases,the throughput increases and the running time of job can be reduced by more than 8%.

Reference | Related Articles | Metrics

Select

DFTS:A Top-k Skyline Query for Large Datasets

WEI Liang, LIN Zi-yu, LAI Yong-xuan

Computer Science 2019, 46 (5): 150-156. DOI: 10.11896/j.issn.1002-137X.2019.05.023

Abstract （557）

PDF（pc）（1524KB）（838）

Save

Top-k Skyline query combines the features of Top-k query and Skyline,which can find the best object in the datasets.However,the available methods can not fit to large datasets well.An efficient Top-k Skyline query method called DFTS was proposed,which can perform well for large datasets.DFTS involves three steps.Firstly,the degreescore function is used to rank the dataset,and a large quantity of objects with low ranking will be filtered out.Secondly,DFTS makes a Skyline query upon the candidates and generates a Skyline subset.Finally,top-k objects with high ran-king will be selected from the Skyline subset as the final result.Through these steps,DFTS can significantly reduce the time cost.It is proved that the results of DFTS satisfy the demand of Top-k Skyline query.Extensive experimental results show that DFTS can achieve much better performance for large datasets than state-of-the-art methods.

Reference | Related Articles | Metrics

Select

Optimization Algorithm of Complementary Register Usage Between Two Register Classesin Register Spilling for DSP Register Allocation

QIU Ya-qiong, HU Yong-hua, LI Yang, TANG Zhen, SHI Lin

Computer Science 2019, 46 (6): 196-200. DOI: 10.11896/j.issn.1002-137X.2019.06.029

Abstract （411）

PDF（pc）（1380KB）（812）

Save

Register allocation has become one of the most important optimization techniques for compiler for that registers are limited and valuable resources in hardware architecture of computer.One of the key factors affecting the results of register allocation is the access and storage costs incurred from spilling signed registers.For DSP architectures with two classes of general-purpose registers,this paper proposed a complementary utilization strategy between the registers and a corresponding register spilling optimization algorithm on the basis of graph coloring register allocation method.Through distinguishing the interference between candidates of the same register class from those of different register classes,an undirected graph is built by improving the analysis for variables’ live ranges.Compared with the conventional graph coloring register allocation,the improved algorithm fully consideres the interferences among the register allocation candidates for two register classes,thus achiving less memory access operations in register spilling and higher code performance.

Reference | Related Articles | Metrics

Select

Database-level Web Cache Replacement Strategy Based on SVM Access Prediction Mechanism

YANG Rui-jun, ZHU Ke, CHENG Yan

Computer Science 2019, 46 (6): 201-205. DOI: 10.11896/j.issn.1002-137X.2019.06.030

Abstract （412）

PDF（pc）（1616KB）（963）

Save

Web cache is used to solve the problems of network access delay and network congestion,and cache replacement strategy directly affects the hit rate of cache.For this reason,this paper proposed a database-level Web cache replacement strategy based on SVM access prediction mechanism.Firstly,according to previous access logs of users,a feature data set is constructed on the basis of extracting multiple features through a pre-processing operation.Then,a Support Vector Machine (SVM) classifier is trained to predict whether a cached object is likely to be accessed again in the future,and the cached objects that are classified as not being accessed are deleted to free memory.Simulation results show that,compared with the traditional LRU,LFU and GDSF schemes,this strategy has higher request hit rate and byte hit rate.

Reference | Related Articles | Metrics

Select

Logless Hash Table Based on NVM

WANG Tao, LIANG Xiao, WU Qian-qian, WANG Peng, CAO Wei, SUN Jian-ling

Computer Science 2019, 46 (9): 66-72. DOI: 10.11896/j.issn.1002-137X.2019.09.008

Abstract （469）

PDF（pc）（2092KB）（978）

Save

Emerging non-volatile memory(NVM) is taking people’s attention.Due to the advantages of low latency,persistence,large capacity and byte-addressable,database system can run on the NVM-only storage architecture.In this configuration,some novel logless indexing structures come into being and are expected to recover indexing capability immediately after an system failure.However,under the current computer architecture,these structures need a large amount of synchronizations to ensure data consistency,which leads to a severe performance penalty.NVM-baesd logless hash table leverages the atomic update of the pointer data to ensure the consistency.An optimized rehash procedure was proposed to not only reduce the synchronizations during normal execution,but also ensure the instant recovery after system failures.Performance evaluation shows that,compared with existing persistent indexing structures,logless hash tables perform well under most workloads,and have significant advantages in terms of recovery time,NVM footprint,and write wear.

Reference | Related Articles | Metrics

Select

Dynamic Skyline Query for Multiple Mobile Users Based on Road Network

ZHOU Jian-gang, QIN Xiao-lin, ZHANG Ke-heng, XU Jian-qiu

Computer Science 2019, 46 (9): 73-78. DOI: 10.11896/j.issn.1002-137X.2019.09.009

Abstract （370）

PDF（pc）（2338KB）（992）

Save

With the development of wireless communication and positioning technology,the road network Skyline query has become increasingly important in location-based services.However,the spatial attributes involved in the existing road network Skyline research only consider distance,and do not consider the influence of changes in the positions and speeds of multiple mobile users on the user’s movement time.When the user’s movement state is changed,the Skyline results need to be dynamically adjusted and re-planned.This paper analyzed the incidence relation between the user’s motion state and the query,proposed the query processing algorithm EI,and divided the query process into two steps.Firstly,the initial Skyline result set is determined by the collaborative filtering extension method according to time,and the data set is pruned.The user’s movement status,as soon as the user’s speed changes,quickly adjusts the Skyline set according to the entry point.Finally,the algorithm is tested on the real road network,and is compared with the existing algorithms N3S and EDC.The results show that EI algorithm can efficiently solve the dynamic Skyline query problem of multiple mobile users based on road network.

Reference | Related Articles | Metrics

Select

Study on Heterogeneous Multimodal Data Retrieval Based on Hash Algorithm

CHEN Feng, MENG Zu-qiang

Computer Science 2019, 46 (10): 49-54. DOI: 10.11896/jsjkx.190100139

Abstract （592）

PDF（pc）（2090KB）（1092）

Save

The development of the era of big data has resulted in an exponentially growing of Internet heterogeneous multimodal data including text,images,video and audio.Therefore,heterogeneous multimodal data retrieval has become a hot direction in big data research.However,heterogeneous multimodal data retrieval encounters two major challenges.The first challenge is how to express the similarity between heterogeneous data while there is a “semantic gap”.The second challenge is how to achieve accurate and efficient retrieval in massive data.To solve the problem that the hash retrieval algorithm ignores semantic similarity of heterogeneous multimodal data,this paper proposed a hash retrieval algorithm based on canonical correlation analysis-semantic consistency,named CCA-SCH.In order to keep semantic consistency within the modality,the CCA-SCH algorithm separately generates semantic models of text and image data.In order to keep semantic consistency between modalities,the CCA algorithm is used to fuse semantics of text and image data to generate the maximum correlation matrix.At the same time,the paradigm ℓ_2,ρ is introduced to overcome the noise and redundant information of original datasets,so that the hash function has better robustness.Experiment results show that the mean average precision(Map) of CCA-SCH algorithm is increased by over 10% compared to benchmark algorithms’ performances on experimental data sets,which embodies the better retrieval ability of proposed algorithm.

Reference | Related Articles | Metrics

Select

Column-oriented Store Based Sampling Query Process on Big Data

QI Wen, BAO Yu-bin, SONG Jie

Computer Science 2019, 46 (12): 13-19. DOI: 10.11896/jsjkx.190500155

Abstract （532）

PDF（pc）（2881KB）（1162）

Save

The era of big data bring performance challenges to traditional data query,even if the query algorithm is O(n) linear complexity,but when the n is extremely large,its time cost is also unbearable.In many practical applications,exact query results may be unnecessary but the queries should be accomplished at a given time,so appropriately losing the query accuracy is acceptable to meet performance constraints.Sampling queries can improve query perfor-mance by reducing query ranges.Existing researches are often studied for specific algorithms and specific application scenarios,and there is a lack of research on general sampling and query methods in the big data environment,as well as research on performance and accuracy guarantee.This paper studied the sampling and query processing in the big data environment,which improves the query efficiency of big data from data partition and data reduction.This paper proposed a sampling method based on speedup and potential distribution,which supports all kinds of sampling algorithms,and achieves randomicity guarantee,performance assurance and approximation evaluation of sampling queries in distri-buted environment,and is compatible with precise queries.This method can be applied to the column store for the big data with good expansibility and maintainability.The experimental results show that as the Top-K query case,the proposed method has better loading performance,while the sampling errors are less than 2%,and the variances of query accuracy are between 0.1 and 0.12 under various sampling rates,data volumes and sampling algorithms.The sampling efficiency of proposed partition is also higher than that of linear partition based or uniform partition based sampling.

Reference | Related Articles | Metrics

Select

K Nearest Neighbors Queries of Moving Objects in Time-dependent Road Networks

ZHANG Tong,QIN Xiao-lin

Computer Science 2020, 47 (1): 79-86. DOI: 10.11896/jsjkx.181102231

Abstract （596）

PDF（pc）（2656KB）（1008）

Save

With the wide application of location-based services,object-based query on time-dependent road network has gradually become a research hotspot.In the past,most of the researches only focused on static objects on time-dependent road networks (such as gas stations,restaurants,etc.),and did not take into account the situation of mobile objects (such as taxis).The query of mobile objects has a very wide range of applications in daily life.Therefore,the K nearest neighbor query algorithm TD-MOKNN of moving object is proposed for time-dependent road network.The algorithm is divided into pre-processing stage and query stage.In the pre-processing stage,the road network and grid index are established,and a new mapping method of moving objects to the road network is proposed,which removes the limitation of previous researches that moving objects happen to be on the intersection of the road networks.In the query stage,a new efficient heuristic value is calculated by using inverted grid index,and an efficient k-nearest neighbor query algorithm is designed by using pre-processing information and heuristic value.Experiments verify the effectiveness of the algorithm.Compared with existing algorithm,TD_MOKNN algorithm reduces the number of traversing vertices and response time by 55.91% and 54.57% respectively,and improves the query efficiency by 55.2% on average.

Reference | Related Articles | Metrics