Not found Database Technology

Default Latest Most Read
Please wait a minute...
For Selected: Toggle Thumbnails
Geo-semantic Data Storage and Retrieval Mechanism Based on CAN
LU Hai-chuan, FU Hai-dong, LIU Yu
Computer Science    2019, 46 (2): 171-177.   DOI: 10.11896/j.issn.1002-137X.2019.02.027
Abstract497)      PDF(pc) (1871KB)(777)       Save
Semantic technology can search information more intelligently and accurately,and assist researchers to make scientific decisions.Therefore,this technology has been introduced into geographic information processing and formed a geo-query language GeoSPARQL based on RDF (Resource Description Framework).However,the existing application platforms based on geographic semantic information processing adopt centralized storage and retrieval services,which will cause the disadvantages of single node failure and poor scalability.Although researchers have proposed a variety of methods to use peer-to-peer network to improve the reliability and scalability of application systems,these methods do not consider the characteristics of geographic semantic data.In view of the above problems,this paper considered the feature of geographical semantic data and optimized the storage of semantic data on the peer-to-peer network.This paper proposed a storage and retrieval scheme based on content addressed network,and also improved the retrieval efficiency of semantic data by mapping the triple to the network according to its position.The experimental results show that the proposed scheme has good expansibility,and the query efficiency of topology relation is superior to the existing schemes.
Reference | Related Articles | Metrics
Optimization of Spark RDD Based on Non-serialization Native Storage
ZHAO Jun-xian, YU Jian
Computer Science    2019, 46 (5): 143-149.   DOI: 10.11896/j.issn.1002-137X.2019.05.022
Abstract387)      PDF(pc) (1501KB)(954)       Save
Spark framework is taken as the computing framework of big data by more and more enterprises.However,with the increasing of available memory resource of current severs,Spark can’t match with new environment well.Spark runs on Java Virtual Machine (JVM).Asheap space memory is used heavily,the ratio of time cost produced by Java virtual machine to provide space for new objects by reclaiming memory(GC) to total time cost of Spark jobs increases significantly,but the efficiency of Spark jobs doesn’t improve with a certain ratio when the available memory increases.After using OffHeap (native) memory storage mode,the cost of serialization/deserialization becomes the new conflict point instead of GC.This paper used the way of native storage to deal with GC problem,and speeded up the job by reducing the overhead of GC.This paper also proposed and modified the storage structure of Spark,and improved the elimination mechanism and the caching way of RDD.The data without serialization are moved into native memory,realizing low garbage collection overhead and avoiding the time spending on serialization.Experimental results demonstrate that the GC cost of modification method on server with single node and large memory is 5% to 30% compared with the storage on heap of Spark.Meanwhile,the overhead of serialization decreases,the throughput increases and the running time of job can be reduced by more than 8%.
Reference | Related Articles | Metrics
DFTS:A Top-k Skyline Query for Large Datasets
WEI Liang, LIN Zi-yu, LAI Yong-xuan
Computer Science    2019, 46 (5): 150-156.   DOI: 10.11896/j.issn.1002-137X.2019.05.023
Abstract557)      PDF(pc) (1524KB)(838)       Save
Top-k Skyline query combines the features of Top-k query and Skyline,which can find the best object in the datasets.However,the available methods can not fit to large datasets well.An efficient Top-k Skyline query method called DFTS was proposed,which can perform well for large datasets.DFTS involves three steps.Firstly,the degreescore function is used to rank the dataset,and a large quantity of objects with low ranking will be filtered out.Secondly,DFTS makes a Skyline query upon the candidates and generates a Skyline subset.Finally,top-k objects with high ran-king will be selected from the Skyline subset as the final result.Through these steps,DFTS can significantly reduce the time cost.It is proved that the results of DFTS satisfy the demand of Top-k Skyline query.Extensive experimental results show that DFTS can achieve much better performance for large datasets than state-of-the-art methods.
Reference | Related Articles | Metrics
Optimization Algorithm of Complementary Register Usage Between Two Register Classesin Register Spilling for DSP Register Allocation
QIU Ya-qiong, HU Yong-hua, LI Yang, TANG Zhen, SHI Lin
Computer Science    2019, 46 (6): 196-200.   DOI: 10.11896/j.issn.1002-137X.2019.06.029
Abstract411)      PDF(pc) (1380KB)(812)       Save
Register allocation has become one of the most important optimization techniques for compiler for that registers are limited and valuable resources in hardware architecture of computer.One of the key factors affecting the results of register allocation is the access and storage costs incurred from spilling signed registers.For DSP architectures with two classes of general-purpose registers,this paper proposed a complementary utilization strategy between the registers and a corresponding register spilling optimization algorithm on the basis of graph coloring register allocation method.Through distinguishing the interference between candidates of the same register class from those of different register classes,an undirected graph is built by improving the analysis for variables’ live ranges.Compared with the conventional graph coloring register allocation,the improved algorithm fully consideres the interferences among the register allocation candidates for two register classes,thus achiving less memory access operations in register spilling and higher code performance.
Reference | Related Articles | Metrics
Database-level Web Cache Replacement Strategy Based on SVM Access Prediction Mechanism
YANG Rui-jun, ZHU Ke, CHENG Yan
Computer Science    2019, 46 (6): 201-205.   DOI: 10.11896/j.issn.1002-137X.2019.06.030
Abstract412)      PDF(pc) (1616KB)(963)       Save
Web cache is used to solve the problems of network access delay and network congestion,and cache replacement strategy directly affects the hit rate of cache.For this reason,this paper proposed a database-level Web cache replacement strategy based on SVM access prediction mechanism.Firstly,according to previous access logs of users,a feature data set is constructed on the basis of extracting multiple features through a pre-processing operation.Then,a Support Vector Machine (SVM) classifier is trained to predict whether a cached object is likely to be accessed again in the future,and the cached objects that are classified as not being accessed are deleted to free memory.Simulation results show that,compared with the traditional LRU,LFU and GDSF schemes,this strategy has higher request hit rate and byte hit rate.
Reference | Related Articles | Metrics
Logless Hash Table Based on NVM
WANG Tao, LIANG Xiao, WU Qian-qian, WANG Peng, CAO Wei, SUN Jian-ling
Computer Science    2019, 46 (9): 66-72.   DOI: 10.11896/j.issn.1002-137X.2019.09.008
Abstract469)      PDF(pc) (2092KB)(978)       Save
Emerging non-volatile memory(NVM) is taking people’s attention.Due to the advantages of low latency,persistence,large capacity and byte-addressable,database system can run on the NVM-only storage architecture.In this configuration,some novel logless indexing structures come into being and are expected to recover indexing capability immediately after an system failure.However,under the current computer architecture,these structures need a large amount of synchronizations to ensure data consistency,which leads to a severe performance penalty.NVM-baesd logless hash table leverages the atomic update of the pointer data to ensure the consistency.An optimized rehash procedure was proposed to not only reduce the synchronizations during normal execution,but also ensure the instant recovery after system failures.Performance evaluation shows that,compared with existing persistent indexing structures,logless hash tables perform well under most workloads,and have significant advantages in terms of recovery time,NVM footprint,and write wear.
Reference | Related Articles | Metrics
Dynamic Skyline Query for Multiple Mobile Users Based on Road Network
ZHOU Jian-gang, QIN Xiao-lin, ZHANG Ke-heng, XU Jian-qiu
Computer Science    2019, 46 (9): 73-78.   DOI: 10.11896/j.issn.1002-137X.2019.09.009
Abstract370)      PDF(pc) (2338KB)(992)       Save
With the development of wireless communication and positioning technology,the road network Skyline query has become increasingly important in location-based services.However,the spatial attributes involved in the existing road network Skyline research only consider distance,and do not consider the influence of changes in the positions and speeds of multiple mobile users on the user’s movement time.When the user’s movement state is changed,the Skyline results need to be dynamically adjusted and re-planned.This paper analyzed the incidence relation between the user’s motion state and the query,proposed the query processing algorithm EI,and divided the query process into two steps.Firstly,the initial Skyline result set is determined by the collaborative filtering extension method according to time,and the data set is pruned.The user’s movement status,as soon as the user’s speed changes,quickly adjusts the Skyline set according to the entry point.Finally,the algorithm is tested on the real road network,and is compared with the existing algorithms N3S and EDC.The results show that EI algorithm can efficiently solve the dynamic Skyline query problem of multiple mobile users based on road network.
Reference | Related Articles | Metrics
Study on Heterogeneous Multimodal Data Retrieval Based on Hash Algorithm
CHEN Feng, MENG Zu-qiang
Computer Science    2019, 46 (10): 49-54.   DOI: 10.11896/jsjkx.190100139
Abstract592)      PDF(pc) (2090KB)(1092)       Save
The development of the era of big data has resulted in an exponentially growing of Internet heterogeneous multimodal data including text,images,video and audio.Therefore,heterogeneous multimodal data retrieval has become a hot direction in big data research.However,heterogeneous multimodal data retrieval encounters two major challenges.The first challenge is how to express the similarity between heterogeneous data while there is a “semantic gap”.The second challenge is how to achieve accurate and efficient retrieval in massive data.To solve the problem that the hash retrieval algorithm ignores semantic similarity of heterogeneous multimodal data,this paper proposed a hash retrieval algorithm based on canonical correlation analysis-semantic consistency,named CCA-SCH.In order to keep semantic consistency within the modality,the CCA-SCH algorithm separately generates semantic models of text and image data.In order to keep semantic consistency between modalities,the CCA algorithm is used to fuse semantics of text and image data to generate the maximum correlation matrix.At the same time,the paradigm 2,ρ is introduced to overcome the noise and redundant information of original datasets,so that the hash function has better robustness.Experiment results show that the mean average precision(Map) of CCA-SCH algorithm is increased by over 10% compared to benchmark algorithms’ performances on experimental data sets,which embodies the better retrieval ability of proposed algorithm.
Reference | Related Articles | Metrics
Column-oriented Store Based Sampling Query Process on Big Data
QI Wen, BAO Yu-bin, SONG Jie
Computer Science    2019, 46 (12): 13-19.   DOI: 10.11896/jsjkx.190500155
Abstract532)      PDF(pc) (2881KB)(1162)       Save
The era of big data bring performance challenges to traditional data query,even if the query algorithm is O(n) linear complexity,but when the n is extremely large,its time cost is also unbearable.In many practical applications,exact query results may be unnecessary but the queries should be accomplished at a given time,so appropriately losing the query accuracy is acceptable to meet performance constraints.Sampling queries can improve query perfor-mance by reducing query ranges.Existing researches are often studied for specific algorithms and specific application scenarios,and there is a lack of research on general sampling and query methods in the big data environment,as well as research on performance and accuracy guarantee.This paper studied the sampling and query processing in the big data environment,which improves the query efficiency of big data from data partition and data reduction.This paper proposed a sampling method based on speedup and potential distribution,which supports all kinds of sampling algorithms,and achieves randomicity guarantee,performance assurance and approximation evaluation of sampling queries in distri-buted environment,and is compatible with precise queries.This method can be applied to the column store for the big data with good expansibility and maintainability.The experimental results show that as the Top-K query case,the proposed method has better loading performance,while the sampling errors are less than 2%,and the variances of query accuracy are between 0.1 and 0.12 under various sampling rates,data volumes and sampling algorithms.The sampling efficiency of proposed partition is also higher than that of linear partition based or uniform partition based sampling.
Reference | Related Articles | Metrics
K Nearest Neighbors Queries of Moving Objects in Time-dependent Road Networks
ZHANG Tong,QIN Xiao-lin
Computer Science    2020, 47 (1): 79-86.   DOI: 10.11896/jsjkx.181102231
Abstract596)      PDF(pc) (2656KB)(1008)       Save
With the wide application of location-based services,object-based query on time-dependent road network has gradually become a research hotspot.In the past,most of the researches only focused on static objects on time-dependent road networks (such as gas stations,restaurants,etc.),and did not take into account the situation of mobile objects (such as taxis).The query of mobile objects has a very wide range of applications in daily life.Therefore,the K nearest neighbor query algorithm TD-MOKNN of moving object is proposed for time-dependent road network.The algorithm is divided into pre-processing stage and query stage.In the pre-processing stage,the road network and grid index are established,and a new mapping method of moving objects to the road network is proposed,which removes the limitation of previous researches that moving objects happen to be on the intersection of the road networks.In the query stage,a new efficient heuristic value is calculated by using inverted grid index,and an efficient k-nearest neighbor query algorithm is designed by using pre-processing information and heuristic value.Experiments verify the effectiveness of the algorithm.Compared with existing algorithm,TD_MOKNN algorithm reduces the number of traversing vertices and response time by 55.91% and 54.57% respectively,and improves the query efficiency by 55.2% on average.
Reference | Related Articles | Metrics