Computer Science

Cross-domaing Item Recommendation Algorithm Including Tag Transfer

GE Meng-fan, LIU Zhen, WANG Na-na, TIAN Jing-yu

Computer Science. 2019, 46 (10): 1-6. doi:10.11896/jsjkx.180901792

Abstract

PDF(2096KB) ( 2054 )

References | Related Articles | Metrics

Most recommendation algorithms often use cross-domain recommendation technology based on transfer lear-ning and rich data in the auxiliary domain to solve the problems such as data sparse commonly existing in traditional single domain recommendation.However,if the transtered knowledge is relatively simple without combining user beha-vior,it will lead to the problems such as negative transfer and poor recommendation results.Therefore,it is possible to combine other knowledge to assist the learning tasks in target domain.Using user heterogeneous behavior to improve recommendation results is one of the emerging research hotspots in recent years.For user data,tags are related to the real user preferences,which can reflect some implicit features of user or item.In light of this,this paper proposed a cross-domain item recommendation algorithm ITTCF(Item-based Tag Transfer Collaborative Filtering)based on tag transfer.Instead of single auxiliary moded of performing mining and migration for rating pattern in cross-domain recommendation,this method combines user behavior feedback and numeric ratings,and fuses two typical user behaviors:ra-ting patterns and tags.Experimental results on multiple datasets show that ITTCF has lower RMSE and MAE values,and its performance is 1.61% to 6.67% and 1.97% to 8.83% higher respectively than traditional algorithms.

Individual Credit Risk Assessment Based on Stacked Denoising Autoencoder Networks

YANG De-jie, ZHANG Ning, YUAN Ji, BAI Lu

Computer Science. 2019, 46 (10): 7-13. doi:10.11896/jsjkx.181102216

Abstract

PDF(1972KB) ( 1696 )

References | Related Articles | Metrics

Personal credit is the most important factor for banks to measure individual compliance risk.In recent years,with the increasing demand for borrowing in China,the traditional way of making credit evaluation,which is merely based on credit card transaction information,cannot fully meet the development needs of the banking industry.Therefore,this paper proposed to use the big data of personal consumption in bank as the important feature information to construct a richer user image.In order to overcome the dimensional curse and noise caused by the financial big data,a modified deep learning evaluation algorithm based on stacked denoising autoencoder neural network is proposed by considering the correlation of feature data and the truncated Karhunen-Loève expansion is applied as the noise input term,then a series of related data experiments are conducted on big data platform of a commercial bank.The experimental results show that,compared with the risk evaluation just based on credit card transaction information,the K-S value that measure the positive and negative sample resolution based on big data of bank improves 11%;the improved stack denoising autoencoder neural network method has better risk assessment results and the accuracy rate is increased by about 3% compared with the original model,thus validating the effectiveness of credit risk assessment in the big data environment of bank.

Sunburst Visualization for Comment Text Data

YI Xiao-qun, LI Tian-rui, CHEN Chao

Computer Science. 2019, 46 (10): 14-18. doi:10.11896/jsjkx.190100087

Abstract

PDF(1919KB) ( 1463 )

References | Related Articles | Metrics

Sunburst is a kind of modern pie chart.It goes beyond the traditional pie chart and ring chart.It can not only express the proportion of data,but also express the clear hierarchy and attribution relationship,and display the data composition with the hierarchical structure of father and son.When Sunburst is used to visualize text data,it can not fully display the entity relationship and emotional bias.In addition,the more hierarchies of the Sunburst is involved,the lower readability of the information will be.In view of the above problems,this paper proposed the following improvements to the traditional Sunburst.Firstly,the overlapping of the same level adjacent arc is designed to show the relation of the entity in the text.Sencondly,the combination of Sunburst and histogram is put forward to show the emotional bias of the comment text.The color width of the arc in histogram chart expresses the satisfaction of the comment on a certain aspect.Thirdly,the data are rearranged optimally,including that for the overall consideration,the protruding part is placed in the adjacent position to save space,and the local data is optimized and rearranged to make the outermost nodes as high and low as possible,so as to improve the sparsity and facilitate observation.The experimental results show that the improved Sunburst can provide more comprehensive and clear visualization of comment text,and provide more flexible and personalized visualization display for users.

Sentiment Analysis of User Comments Based on Extraction of Key Words and Key Sentences

YU Ying, CHEN Ke, SHOU Li-dan, CHEN Gang, WU Xiao-fan

Computer Science. 2019, 46 (10): 19-26. doi:10.11896/jsjkx.191000531C

Abstract

PDF(1829KB) ( 2822 )

References | Related Articles | Metrics

One of the main task of sentiment analysis is to determine the polarity of a review whether it is positive or negative according to the content of the document.When determining the emotional polarity of a document,different sentences and words have different emotional contribution on the classification result,so how to extract more related words and sentences becomes an important problem.In the experiment of the supervised classification,this paper used the dependency syntactic analysis to extract the words which are more related to the emotion and can improve the classification effect.In the semi-supervised classification experiment,the key sentence extraction and the combining-classifier method based on the Chinese comments have been used.For key sentence extraction,the proposed approach takes the following attributes into account:sentiment attribute,position attribute,special word attribute and punctuation attri-bute.This approach extracts key sentences containing more emotional words and summary meaning,then uses combining-classifier method to make the sub classifier with the highest confidence to determine the classification effect.The results show that the performance of the proposed method is better than the baseline methods,which proves the validity of keyword extraction based on the dependency parsing and key Chinese sentence extraction algorithms.

Recommendation Algorithm with Field Trust and Distrust Based on SVD

ZHANG Qi, LIU Ling, WEN Jun-hao

Computer Science. 2019, 46 (10): 27-31. doi:10.11896/jsjkx.190300388

Abstract

PDF(1434KB) ( 1178 )

References | Related Articles | Metrics

The collaborative filtering algorithms in recommender systems usually suffer from data sparsity or cold-start problems.Although most of the existing social recommendation algorithms can alleviate these problems to a certain extent,they only measure the influence of trust relationship from a single aspect.In order to measure the influence of the social relationship on recommendation prediction more accurately,this paper proposed a novel social recommendation algorithm with field trust and distrust based on singular value decomposition (SVD),named FTDSVD.Based on the SVD algorithm,the trust relationship and distrust relationship information of users is added in order to correct the social relationship,and the global influence of users and the field relevance of trust are considered.Finally,it is compared with the state-of-the-art methods on the Epinions dataset .Experiment results show that the FTDSVD algorithm has obvious effects in improving the recommendation quality and alleviating the cold start problem.

Predicting User’s Dynamic Preference Based on Embedding Learning

WEN Wen, LIN Ze-tian, CAI Rui-chu, HAO Zhi-feng, WANG Li-juan

Computer Science. 2019, 46 (10): 32-38. doi:10.11896/jsjkx.180901801

Abstract

PDF(1282KB) ( 1890 )

References | Related Articles | Metrics

Traditional methods for capturing user preferences mainly focus on user’s long-term preferences.However,user interests always change over time in real-world applications.As a result,how to capture user’s dynamic prefe-rences still remains a big challenge.This paper proposed an embedding-based approach for predicting user’s dynamic preferences.Firstly,an improved embedding method is used for learning the low-dimensional vector representations of items from user’s click sequences.Then,based on the learned item vectors and user’s short-term click behaviors,user’sdynamic preferences are obtained and used for predicting the next click.Experiments were conducted on two real-world datasets and the proposed method was compared with state-of-the-art methods.The results demonstrate the significant superiority of the proposed method in prediction accuracy compared with other algorithms.

Topological Structure Based Density Peak Algorithm for Overlapping Community Detection

FENG Yun-fei, CHEN Hong-mei

Computer Science. 2019, 46 (10): 39-48. doi:10.11896/jsjkx.180901644

Abstract

PDF(3661KB) ( 1343 )

References | Related Articles | Metrics

With the continuous development of modern network science,people’s lives have been greatly facilitated.The study on complex networks is an important driving force for the development of modern network science,and the community is an important structure for research on complex networks.Most of the existing community discovery methods are highly complex,so this paper applied the density peak clustering algorithm proposed in recent years to community discovery,and proposed an efficient community discovery algorithm.Due to the particularity of the data structure of complex network,the complex network data are stored in the form of topological graphorad jacency matrix,and thus how to effectively calculate the distance between nodes and the local density of each node,and how to select the core nodes of the community are the key problems when density peak clustering algorithm is applied tocommunity discovery.In light of this,this paper calculated the local density of each node by the degree of each node and its neighbor nodes in the network topological graph,measured the distance between nodes by the similarity between nodes,and discretized the distance to expediently select the core nodes for this algorithm.Moreover,the core hop values were defined to accurately select community centers,thus avoiding the large communities annex small communities.Experiments were carried out on the LFR artificial network dataset and the real network dataset with the evaluation indexes of the extended modularity,the adjusted rand coefficient and the normalized mutual information.Experimental results in real networks show that the proposed algorithm has good effects and obvious advantages in some real networks compared with other algorithms.In the artificial network,the algorithm also has advantages.Therefore,the proposed algorithm is more stable than other algorithms.

Study on Heterogeneous Multimodal Data Retrieval Based on Hash Algorithm

CHEN Feng, MENG Zu-qiang

Computer Science. 2019, 46 (10): 49-54. doi:10.11896/jsjkx.190100139

Abstract

PDF(2090KB) ( 1507 )

References | Related Articles | Metrics

The development of the era of big data has resulted in an exponentially growing of Internet heterogeneous multimodal data including text,images,video and audio.Therefore,heterogeneous multimodal data retrieval has become a hot direction in big data research.However,heterogeneous multimodal data retrieval encounters two major challenges.The first challenge is how to express the similarity between heterogeneous data while there is a “semantic gap”.The second challenge is how to achieve accurate and efficient retrieval in massive data.To solve the problem that the hash retrieval algorithm ignores semantic similarity of heterogeneous multimodal data,this paper proposed a hash retrieval algorithm based on canonical correlation analysis-semantic consistency,named CCA-SCH.In order to keep semantic consistency within the modality,the CCA-SCH algorithm separately generates semantic models of text and image data.In order to keep semantic consistency between modalities,the CCA algorithm is used to fuse semantics of text and image data to generate the maximum correlation matrix.At the same time,the paradigm ℓ_2,ρ is introduced to overcome the noise and redundant information of original datasets,so that the hash function has better robustness.Experiment results show that the mean average precision(Map) of CCA-SCH algorithm is increased by over 10% compared to benchmark algorithms’ performances on experimental data sets,which embodies the better retrieval ability of proposed algorithm.

Deep Matrix Factorization Network for Matrix Completion

KUANG Shen-fen, HUANG Ye-wen, SONG Jie, LI Qia

Computer Science. 2019, 46 (10): 55-62. doi:10.11896/jsjkx.190300390

Abstract

PDF(1503KB) ( 1958 )

References | Related Articles | Metrics

Matrix factorization is a popular technique for matrix completion,but most of the existing methods are based on linear or shallow models,when the data matrix becomes large and the observation data is very small,it is prone to overfitting and the performance decreases significantly.To address this problem,this paper presented a Deep Matrix Factorization Network (DMFN) method,which can not only overcome the shortcoming of traditional matrix factorization,but also deal with complex non-linear data.First,by using rows and columns vectors corresponding to the observed values of the input matrices as input,the latent vector of its row (column) is projected,then the multi-layer perceptron network is constructed on the latent vector of row (column),at last the output vectors of rows and columns are fused by the bilinear pool layer.The proposed algorithm was tested on the standard recommender system dataset (MovieLens and Netflix).Experimental results show that the mean square error (RMSE) and mean absolute error (MAE) of the proposed method are significantly improved compared with the current popular methods under the same parameter setting.The RMSE is 0.830 and MAE is 0.652 on the MovieLens 1M dataset,which increases by about 2% and 6%,respectively.On the Netflix dataset,RMSE is 0.840 and MAE is 0.661,which increases by approximately 1% and 4%,respectively,and achieves optimal results.

Multi-recording Complex Webpage Information Extraction Algorithm Based on Visual Block

WANG Wei-hong, LIANG Chao-kai, MIN Yong

Computer Science. 2019, 46 (10): 63-70. doi:10.11896/jsjkx.190200346

Abstract

PDF(2582KB) ( 1563 )

References | Related Articles | Metrics

The webpage has rich content and complicated and varied structure.The existing webpage information extraction technology solves the information extraction of the single-recording simple page,but the information extraction effect of the multi-recording complex page is often poor.This paper proposed a new visual block based information extraction algorithm,named visual block based information extraction (VBIE).By constructing visual blocks and visual block trees,and through heuristic rules,regional focus,noise filtering and visual block filtering,data record extraction is realized for complex webpages.Compared with other existing methods,this method abandons the specific assumptions of the previous algorithm on the structure of the webpage,does not need to manually mark the HTML document,preserves the original structure of the webpage,and can realize unsupervised information extraction on a single page.The experimental results show that VBIE’s webpage information extraction accuracy is up to 100%,and the average value of F1 on the results page of the mainstream search engine and the post page of the community forum are 98.5% and 96.1%.Compared with the current method CMDR,the F1 value of VBIE is improved by nearly 16.3%,which proves that the method can effectively solve the information extraction task of complex webpages.

Association Rule Mining Algorithm Based on Timestamp and Vertical Format

WANG Bin, MA Jun-jie, FANG Xin-xiu, WEI Tian-you

Computer Science. 2019, 46 (10): 71-76. doi:10.11896/jsjkx.190100223

Abstract

PDF(1356KB) ( 1317 )

References | Related Articles | Metrics

The SLMCM algorithm (Specific Later-marketed Consequent Mining) is mainly used to solve the problem of later item,but it is inefficient and difficult to adapt to big data mining.For this problem,this paper proposed the improved algorithms E-SLMCM and DE-SLMCM .E-SLMCM algorithm is based on vertical structure,so it only traversal the database twice.Furthermore,the timestamp of each item can be directly calculated when the format is converted to vertical,and each transaction does not need to be sorted by the timestamp of the item.In addition,a new method for finding the itemset timestamp was proposed,which dose not need to traverse the database to find the timestamp of itemset.In order to adapt to dense database,DE-SLMCM algorithm was proposed based on E-SLMCM algorithm and diffset,which improves the execution efficiency on dense database.In the listed four simulation experiments based on common data sets,the time efficiency of E-SLMCM and DE-SLMCM running on sparse and dense data sets is 10-1000 times higher than that of SLMCM.

Study on Point-of-interest Collaborative Recommendation Method Fusing Multi-factors

CHEN Jiong, ZHANG Hu, CAO Fu-yuan

Computer Science. 2019, 46 (10): 77-83. doi:10.11896/jsjkx.180901757

Abstract

PDF(1875KB) ( 1211 )

References | Related Articles | Metrics

Point-of-interest (POI) recommendation is a task to recommend geographical locations that users may be interested in.It is an important researches in location-based social networks (LBSN) services.For the existing problems that POI recommendation currently has lower recommendation precision,lacks of personalization in recommendation results,and has poor integration of sentimental orientation factors,etc.,this paper proposed a POI collaborative recommendation model(GCSR) fusing multi-factors based on the comprehensive analysis of POI related influencing factors,such as geographical location,category preference,popularity,social and sentimental orientation and so on.Firstly,the geographical relevance score is calculated based on POI geographical location data.Secondly,category preference score is defined according to users’ category preference and POI popularity.Then,the strength of the social relationships between users is calculated based on the social relationships,the sentimental orientation score of users is calculated by mining the comment text,and the two are effectively combined with the collaborative filtering recommendation technology to obtain the social sentiment score.Finally,geographical relevance score,category preference score and social sentiment score are effectively integrated to recommend Top-N POI.Multiple comparative experiments conducted on Foursquare’s real check-in datasets demonstrate that the GCSR model achieves better recommendation effect,with an ave-rage improvement of 1.7% and 0.6% in precision and recall,compared with the best effective JRA in the baseline models.

Stock Recommendation System Based on Deep Bidirectional LSTM

ZENG An, NIE Wen-jun

Computer Science. 2019, 46 (10): 84-89. doi:10.11896/jsjkx.180901771

Abstract

PDF(1824KB) ( 2516 )

References | Related Articles | Metrics

With the diversity of applications scenarios and rapid growth of data,the stock prediction models based on classical statistical methods are unable to meet the requirements for high prediction accuracy.But traditional stock reco-mmendation models either never consider the time factor or just consider the unidirectional relationship over time.However,existing stock recommendation models based on deep learning rarely consider the time factor.This paper proposed a deep bidirectional LSTM model for stock prediction,which makes full use of context relationship in the forward direction and backward direction of time series.The problem of vanishing gradient and exploding gradient are solved by introducing LSTM when dealing with long-term sequence.The proposed model can learn information which has long-term dependence on time.At the same time,dropout strategy is introduced to prevent over-fitting caused by deep network model and speed up the training.Experiments on S&P500 dataset show that the neural network prediction model based on the deep bidirectional LSTM outperforms the existing prediction models,the error is about 5% lower,and the coefficient of determination (r²) is increased by 10%.

Satellite Reactive Scheduling Based on Heuristic Algorithm

ZHANG Ming, WEI Bo, WANG Jin-dong

Computer Science. 2019, 46 (10): 90-96. doi:10.11896/jsjkx.180901806

Abstract

PDF(1564KB) ( 1308 )

References | Related Articles | Metrics

For the sudden events such as earthquakes and fires,it is necessary to dynamically adjust the satellite dispatching plan.Considering the dynamic uncertainty factors such as satellite resource failure and joining emergency task,combining task constraints,time constraints,satellite energy and storage constraints,this paper designed an event-driven strategy based on trigger rules,constructed a multi-objective optimazation model regarding maximized scheduling gain and minimum disturbance measure as objective function,and then proposed a heuristic algorithm considering task merging,insertion,shifting and replacement.The simulation results show that the event-driven strategy based on trigger rules can balance the number of triggers,task completion rate and response time,and it is an effective reactive driving strategy.Compared with other three algorithms,the MISR-HA algorithm improves the scheduling gain by an average of 14.78%,reduces the disturbance measurement by an average of 41.91%,and reduces the running time by an average of 14.63%,thus proving the effectiveness of the algorithm.

M-APSK Signal Modulation and Demodulation Method Approaching Gaussian Channel Capacity

JIANG Xuan-you, WEI Yi-min, WANG Lei, LIU Ling-jun, PENG Lei

Computer Science. 2019, 46 (10): 97-102. doi:10.11896/jsjkx.180901777

Abstract

PDF(2226KB) ( 2081 )

References | Related Articles | Metrics

In the digital communication system,when a discrete signal uniformly distributed on the constellation is transmitted through the AWGN channel with limited power and noise power spectral density,the maximum information rate can’t reach the Gaussian channel capacity.Aiming for improving transmission rate to approach the channel capacity,a non-uniform distribution design of signal constellation is quite necessary.Therefore,Box-Muller transform is used to construct the M-APSK signal constellation obeying the Gaussian distribution when the constellation points approach infinity .Simulation results show that the maximum information rate is equal to the Gaussian channel capacity.Compared with rectangular M-QAM signal,the performance improves considerably when the modulation order is sufficiently high.On this basis,according to the distribution characteristics of the constellation,this paper designed a modulation and demodulation scheme based on Gray coding and simplified Max-Log LLR algorithm which reduces system complexity significantly.At last,the system complexity and bit error rate performance of the proposed scheme were verified through Matlab simulation.

Sub-sampling Signal Reconstruction Based on Principal Component Under Underdetermined Conditions

WANG Peng-fei, ZHANG Hang

Computer Science. 2019, 46 (10): 103-108. doi:10.11896/jsjkx.190700195

Abstract

PDF(2054KB) ( 1263 )

References | Related Articles | Metrics

The traditional sampling and reduction method has low utilization and processing efficiency of information data with high consumption of resource,so it cannot adapt to the perception of battlefield information which is constantly changing.Under the complicated electromagnetism circumstance,dynamic changes of the measuring dimensions increase the difficulty of signal acquisition and reduction.In the case of massive multi-input and multi-output wireless communication systems,this paper proposed a sub-sampling reconstruction scheme based on the theory of compressed sensing by using the sparse characteristics of signal data in transform domain space.This scheme uses principal component basis transformation to achieve sparse structure of the signal matrices,and completes the restoration of signal data with sub-sampling by using subspace pursuit.The proposed algorithm is robust to the dynamic changes of the measuring dimension,and it also avoids the high-order matrices participating in the iterative operation process,which can make the algorithm have better accuracy and efficiency of solution by blocking the signal matrices.Finally,the efficient reconstruction of information data under underdetermined conditions is achieved.

Throughput Optimization Based Resource Allocation Mechanism in Heterogeneous Networks

ZHANG Hui-juan, ZHANG Da-min, YAN Wei, CHEN Zhong-yun, XIN Zi-yun

Computer Science. 2019, 46 (10): 109-115. doi:10.11896/jsjkx.180901787

Abstract

PDF(1945KB) ( 1097 )

References | Related Articles | Metrics

Aiming at the problem of interference and spectrum resource allocation optimization caused by D2D (Device-to -Device)communication multiplexing uplink channel of heterogeneous cellular networks,this paper proposed a resource allocation scheme based on improved particle swarm optimization algorithm,and combined the proposed algorithm with the improved closed-loop power control algorithm for resource management.This scheme ensures user’s Quality of Service (QoS) by setting the Signal-to-Interference Noise Ratio (SINR) threshold.After the resource allocation is performed for the D2D user by using the improved particle swarm optimization algorithm,the user’s transmit power is dynamically adjusted by the closed-loop power control algorithm based on the received signal-to-interference noise ratio to reduce interference.Simulation results show that the proposed scheme can effectively suppress the interference problems caused by the introduction of D2D users in heterogeneous communication systems,and improve the utilization of spectrumand the throughput of the system.

Information Diffusion Path Inferring Algorithm Based on Communication Data

XIANG Ying-zhuo, WEI Qiang, YOU Ling

Computer Science. 2019, 46 (10): 116-121. doi:10.11896/jsjkx.180901759

Abstract

PDF(1926KB) ( 1291 )

References | Related Articles | Metrics

Information diffusion and propagation play an important role in viral marketing and virus diffusion.How-ever,in many occasions only the connected data of users in the network can be obtained,which makes it difficult to obtain the content of communication between users.To deal with such challenges,this paper proposed an information diffusion model based on probability in order to predict the relativity of communications between users,and then infer the diffusion path of information in the network.In addition,this paper proved that the complexity of solving the model is NP-hard,and proposed PathMine algorithm to get a near optimal solution.Experiment results show that the proposed PathMine algorithm outperforms other state-of-art algorithms.

RFID Multi-reader Channel Resources Allocation Algorithm Based on Whittle Index

SHI Jing, ZHENG Jia-li, YUAN Yuan, WANG Zhe, LI Li

Computer Science. 2019, 46 (10): 122-127. doi:10.11896/jsjkx.180801602

Abstract

PDF(1909KB) ( 1118 )

References | Related Articles | Metrics

This paper proposed amulti-reader channel resources allocation algorithm based on Whittle index for the problem that how to allocate the correspondence between tags and channel resources in the environment of Radio Frequency Identification (RFID) system with multi-tag and multi-reader.The Restless Multi-Armed Bandit (RMAB) mo-del is established and the Whittle index algorithm is adopted to solve the problem of RFID multi-reader channels allocation.According to the previous state of channels:idle or busy,the readers achieve one trust value for every channel based on its idle probability and use the current trust value to calculate the Whittle index for each channel.The tags choose the one with the largest index value as a possible sensing and accessing channel.Then,the trust value of the channels is dynamically updated based on the successful or failing data transmission in each time slot.For the collisions that may occur between tags during channel allocation,the method of reselecting and accessing based on the identified feedback information after waiting for one time slot can solve it.The proposed algorithm is compared with the typical DiCa algorithm and Gentle algorithm in the following two aspects.Firstly,the system throughput varies with the number of tags to be identified under the premise that the number of readers is fixed.Secondly,the system throughputvaries with the number of readers to be identified under the premise that the number of tags is fixed.The simulation results show that the system throughput of the proposed algorithm is superior to DiCa algorithm and Gentle algorithm in both cases.The throughput increases by an average of 150.34% and 23.98% respectively on the premise of fixed number of readers,205.01% and 43.37% respectively on the premise of fixed number of tags to be identified.Moreover,with the increase of the number of readers and tags to be identified,the advantages of the proposed algorithm in terms of system throughput are more obvious.Therefore,the proposed algorithm can be used to reasonably allocate the limited channel resources,thus effectively improving the recognition efficiency of the RFID multi-reader system.

Workflow Scheduling Strategy with Multi-QoS Constraint Based on Priority in Cloud Environment

DU Yan-ming, XIAO Jian-hua

Computer Science. 2019, 46 (10): 128-134. doi:10.11896/jsjkx.180801591

Abstract

PDF(3841KB) ( 1457 )

References | Related Articles | Metrics

For implementing the trade-off optimization between the execution time and cost of scientific workflow scheduling in cloud environment,this paper proposed a Time-Cost trade-off workflow Task Scheduling algorithm (TCTS) under bi-constrainted condition of deadline and budget.TCTS divides the solving process of the optimal schedu-ling scheme into two stages:the resource levelscheduling stage and the task level scheduling stage.In the resource level scheduling stage,the algorithm defines the priority of a task by the upward rank,and selects the suitable resource set satisfying bi-QoS constraints for tasks according to tasks’ rank.Further,in the task level scheduling stage,the algorithm defines four rules of selecting the optimal resource based upon Time-Cost trade-off,which can obtain the optimal workflow scheduling scheme.This paper elaborated the idea of the proposed algorithm by a designed example.Through the simulation tests of real-world scientific workflows,the proposed algorithm is compared with the same types of algorithms.The results show that under the constraint conditions with different tight degrees,the proposed algorithm has better perfor-mance on some indexes such as the scheduling cost,the scheduling time and the schedule success,which will effectively realize the balanced scheduling.

NFV-based Mechanism to Guard Against UDP Control Packet Redundancy in SDN Controller

XUE Hao, CHEN Ming, QIAN Hong-yan

Computer Science. 2019, 46 (10): 135-140. doi:10.11896/jsjkx.180901659

Abstract

PDF(1705KB) ( 1023 )

References | Related Articles | Metrics

Although the security of software-defined networking (SDN) obtains great attention,the threat of SDN controllers from the UDP duplicate packets in a heavy flow has not been eliminated yet.In response,based on the features of SDN and network function virtualization (NFV) technology,combining the load condition of SDN controller in handling both UDP and TCP data streams,firstly,this paper proposed a new NFV-based mechanism to guard against UDP control packet redundancy in SDN controller.The detection middlebox located in front of the OpenFlow switch interface can detect and filter UDP duplicate packets effectively.Secondly,this paper put forward a cost-effective NFV-based implementation method of detection middlebox.The detection middlebox is implemented by the Linux container and only the first UDP flow packet is allowed to pass through before a path is established by the SDN controller,ensuring that subsequent UDP flow packets already have relevant flow table entry when they reach the OpenFlow switch.Finally,this paper implemented and tested the prototype system of the mechanism in Linux server.The experimental results demonstrate that the method can effectively free from threat of the UDP redundant packets when the setting of the delay t of non-first packets is larger than or equal to the time for controller processing a single packet.

Clone Detection Algorithm for Binary Executable Code with Suffix Tree

ZHANG Ling-hao, GUI Sheng-lin, MU Feng-jun, WANG Sheng

Computer Science. 2019, 46 (10): 141-147. doi:10.11896/jsjkx.180801573

Abstract

PDF(1550KB) ( 1317 )

References | Related Articles | Metrics

How to detect code clones is an important issue in software maintenance and software infringements.Clone detection techniques based on source code tend to fail in the infringement disputes of commercial software due to trade secret.Therefore,in the scenario when the source code is unavailable for detection,this paper presented a clone detection algorithm based on binary executable file analysis.Firstly,instruction type sequences of binary executable files are obtained by decompilation instruction type abstraction,then a suffix tree is constructed based on these instruction type sequences.The clone pairs among functions can be figured out based on this suffix tree.In addition,this paper eliminated gravel instructions for enhancing performance.At last,based on MIPS32 instruction set,this paper used respectively Linux kernel and obfuscated test code as samples on clone level 0－level 2 and level 1－level 4 to compare with the source code detection tools.Test results show that even in the scenario where the source code is lacking,this algorithm can also perform fine-grained clone analysis and has high detection performance for code clones at all levels.

Big Data Oriented Privacy Disclosure Detection Method for Information Release

KE Chang-bo, HUANG Zhi-qiu, WU Jia-yu

Computer Science. 2019, 46 (10): 148-153. doi:10.11896/jsjkx.190100050

Abstract

PDF(1479KB) ( 1023 )

References | Related Articles | Metrics

In order to prevent cloud services from illegally acquiring user’s personal sensitive privacy information,this paper proposed a privacy information publishing detection and protection method for big data.Firstly,the user’s privacy data are classified,and the similarity and the disclosure cost of privacy data are measured respectively.Secondly,accor-ding to the similarity and the disclosure cost,this method detectes whether privacy data required by cloud services contain disclosure chain and key privacy data.Thirdly,continuous privacy datasets,including disclosure chains and key privacy data are decomposed.At the same time,discrete datasets are prevented from being composed into continuous datasets,which do not contain disclosure chains and key privacy data.At last,privacy data chains between discrete privacy datasets and original privacy datasets are discovered by experiment.In terms of precision and recall,the precision of Exact filter is 57% lower than that of non-discrete data sets,while the recall rate is less than 17%.Therefore,the proposed method achieves the purpose of protecting user’s sensitivity privacy information.

Cluster-based Social Network Privacy Protection Method

ZHOU Yi-hua, ZHANG Bing, YANG Yu-guang, SHI Wei-min

Computer Science. 2019, 46 (10): 154-160. doi:10.11896/jsjkx.180901749

Abstract

PDF(1712KB) ( 1428 )

References | Related Articles | Metrics

With the rapid development of social networks,social networks have accumulated a large amount of data,which reflect the social laws to some extent.Aiming at mining effective knowledge under the premise of ensuring privacy,this paper proposed a clustering-based social network privacy protection method.The method has the characteristics of adaptive privacy protection strength,high security and effectiveness of anonymity model.Clustering is conducted based on user information and social relationships.It clusters all nodes in the social network into a super point containing at least k nodes according to the distance between nodes,and then the super points are anonymized.Anonymous super points can effectively prevent various types of privacy attacks taking node attribute privacy and sub-graph structure as background knowledge,so that attackers cannot identify users with a probability greater than 1/k.According to the characteristics of clustering algorithm and social network,the initial node selection algorithm and node spacing calculation method in clustering process are optimized,and by combining the adaptive thinking,the selection method of privacy protection strength is also optimized,which effectively reduces information loss and improves data validity.Experiments were carried out on Matlab platform with different data sets.The results show that the proposed method issuperior to other related methods in terms of information loss and running time,which further proves its effectiveness and security.

Network Security Situation Prediction Method Based on NAWL-ILSTM

ZHU Jiang, CHEN Sen

Computer Science. 2019, 46 (10): 161-166. doi:10.11896/jsjkx.180901820

Abstract

PDF(1818KB) ( 1381 )

References | Related Articles | Metrics

Security situation is the premise of network security warning.The network attacks in complex network environment bring unexpected challenges,causing the sudden network security incidents such as increasing network load and network failure happen at any time.Therefore,taking into account the uncertainty and non-linearity of network security situation time series,in order to further improve the forecast accuracy of network security situation,this paper proposed a network security situation prediction method based on NAWL-ILSTM (Nadam with Look-ahead and Improved Long Short-Term Memory).Firstly,an online updating mechanism is adopted to improve the LSTM to establish time series forecasting model,which can conduct parameter updating in real time for the received online observed data and minimize the cost function,thus solving the problem that traditional LSTM algorithm can’t use network system to transmit data online reasonably,further,optimizing the parameter updating and improving the forecast accuracy of LSTM model.Then,aiming at the problems of slow convergence speed and high training cost in the training process of neural networks,the Look-ahead technology is used to improve the updating formula of Nesterov acceleration gradient adaptive estimated momentum algorithm (Nadam) to accelerate the convergence speed of the model,and then the trai-ning speed of ILSTM prediction model can be accelerated to reduce training time and cost.The simulation experiments based on Pythonin tensorflow environment demonstrate the rationality of the LSTM prediction model based on online updating mechanism.Convergence analysis and comparison experiments show the NAWL algorithm has faster convergence speed.Finally,the comparison experiments show that the proposed model based on NAWL-ILSTM has stronger applicability and higher applicability in situation time series analysis compared with other prediction model.

Social-aware D2D Secure Caching Algorithm

ZHANG Can, SHI Xin, WANG Meng

Computer Science. 2019, 46 (10): 167-172. doi:10.11896/jsjkx.180901776

Abstract

PDF(1640KB) ( 947 )

References | Related Articles | Metrics

Device-to-Device (D2D) content sharing technology is faced with some serious security challenges,which can be adopted to make users more conveniently and efficiently obtain contents.For D2D content sharing scenarios with eavesdroppers,secure caching mechanism based on Maximum Distance Separable (MDS) coding was designed to improve caching performance and meet security requirement.Firstly,content provider selection scheme based on users’ distance and social trust was designed to avoid that the eavesdroppers were selected to cache contents,and select content providers who could obtain higher transmission performance.Based on this,social-related energy efficiency was proposed to ensure the transmission performance and make contents be cached in the content providers that users believed in by using social trust and energy efficiency.Further,content caching placement was designed by maximizing social-related energy efficiency,whose core advantage is that secrecy constraints based on MDS coding was proposed to achieve that eavesdroppers could not obtain enough encoded packets to recover original contents.Finally,social-aware D2D secure caching algorithm based on MDS coding was proposed.Simulation results show that compared with the D2D content secure caching algorithm that content providers are selected randomly,the optimal performance of the proposed algorithm is improved by 15%.Compared with the D2D content caching algorithm without secrecy constraints,the proposed algorithm needs to achieve information security with a performance loss of 42%,but it can achieve its optimal performance when the content provider’s cache capacity is lower.

Study on Intrusion Detection Based on Deep Convolution Neural Network

DING Hong-wei, WAN Liang, ZHOU Kang, LONG Ting-yan, XIN Zhuang

Computer Science. 2019, 46 (10): 173-179. doi:10.11896/jsjkx.180801429

Abstract

PDF(2224KB) ( 2316 )

References | Related Articles | Metrics

Compared with the previous network data,network data shows more huge,complex and multidimensional characteristics nowadays.In face of the high dimensional data features, traditional machine learning methods need to extract a large number of features manually.Besides,feature extraction process is complex and computational,which is not conducive to the current network intrusion detection real-time and accuracy requirements.Deep learning methods have good advantages in dealing with complex data,which can automatically extract better representation features from the data.In this paper,an intrusion detection method based on deep convolution neural network was proposed.Firstly,a method of transforming network data into images was proposed.Then a deep convolution neural network model was designed for the transformed image,which uses the two-layer convolution layer and the pool layer to reduce the dimension of the image,and introduced the Relu function as a new nonlinear activation in place of the traditional neural network.The sigmoid or Tanh function was used to speed up the convergence of the network,and the Dropout method was introduced in the model to prevent the network model from over-fitting.Finally,the image was trained and identified by constructing the completed depth convolution neural network model.The experimental results show that the proposed method has better detection accuracy,lower false alarm rate and higher detection rate compared with the existing me-thods.

Face Anti-spoofing Detection Using Color Texture Feature

BAO Xiao-an, LIN Xiao-dong, ZHANG Na, XU Lu, WU Biao

Computer Science. 2019, 46 (10): 180-185. doi:10.11896/jsjkx.180901688

Abstract

PDF(2273KB) ( 1510 )

References | Related Articles | Metrics

Aiming at the difficulty that face recognition system could be easily deceived by face photos and face videos,a face anti-spoofing detection algorithm were proposed,which uses the fusion color texture features.At present,the main face anti-spoofing detection algorithms are divided into user-matched detection and silent detection.For hot online authentication system nowadays,silent detection has become popular because of its good user experience and accuracy of classification results.Different from the currently popular methods based on brightness characteristics and image quality analysis,the proposed method studies the effectiveness of color features and combined texture features,and then the method combining brightness features,color features and local texture features is proposed.Firstly,the seetaFace algorithm is used to get the coordinates of face and eyes.And then the images which only contain the face are extracted to reduce the interference of background.Secondly,the color information and the brightness information in the image are separated by converting color space and color channel separation.Finally,the method of extracting fusion local texture features is used to extract features from different channels and the feature vectors extracted by each channel are combined and stretched into one-D feature vector,and SVM(Support Vector Machine)is used to train the classifier.The algorithm was performed on the MSU,ASIA,ULU base-line spoofing face database.The experimental results show that the proposed method performs well on improving classification accuracy.

Storage and Query Model for Localized Search on Temporal Graph Data

ZHAO Ping, SHOU Li-dan, CHEN Ke, CHEN Gang, WU Xiao-fan

Computer Science. 2019, 46 (10): 186-194. doi:10.11896/jsjkx.19100530C

Abstract

PDF(1876KB) ( 1199 )

References | Related Articles | Metrics

The temporal graph data is a graph structure data in which the entities are related to each other,and the entity attributes and the relationships between the entities frequently change.This model is applicable to product and user relationships representation in e-commerce,knowledge graphs that contains the history,and corporate organizational structure management.Aiming at the challenge of establishing a general storage scheme for time-varying graph data,this paper proposed a local-domain query based scheme for storing and retrieving time-varying graph data,which is based on the advantage of graph traversal on graph databases and the advantages of distributed key-value databases,achieving universal expression and provide rich expressions for storing graph data.Experiment results show that the system has significant advantages in the storage of historical attributes.

Foreign Key Detection Algorithm for Web Tables Based on Conflict Dependency Elimination

WANG Jia-min, WANG Ning

Computer Science. 2019, 46 (10): 195-201. doi:10.11896/jsjkx.180901748

Abstract

PDF(1590KB) ( 996 )

References | Related Articles | Metrics

As one of the most important constraints in databases,foreign key relationships between tables are crucial for data integration and analysis.For large amount of web tables,foreign keys are not specified in most cases.Therefore,detection of foreign key becomes a significant step in understanding and utilizing the web tables.Current researches mainly focus on the search for inclusion dependencies between attributes,and foreign key detection methods on traditional relational tables cannot solve the problem about large number of conflicting foreign keys due to the heterogeneity of web tables.Considering the conflict dependency between web tables,a foreign key detection algorithm for web tables based on conflict dependency elimination was proposed.Firstly,the concept of conflict dependency is proposed,and an inclusion dependency graph is established for the candidate foreign key relationship.Subsequently,the layer structure of the inclusion dependency graph is constructed,and the definition for the strength of candidates is given.Finally,based on the layer-by-layer elimination of conflict dependency,true foreign key relationships are obtained.To verify the effectiveness of the algorithm,this paper selected 3 true web table datasets as experimental datasets,which are the WIKI dataset with complete schema specification,the DWTC and WDC datasets without schema information.The proposed algorithm has been compared with the other two foreign key detection methods on above datasets in terms of accuracy,recall and F-measure.The experimental results show that the accuracy,recall andF-measure of the proposed algorithm on WIKI and DWTC are more superior than existing algorithms.The proposed algorithm outperforms other algorithms especially on WDC,the largest and the most up-to-data web table corpus,it’s accuracy,recall and F-measure to 0.89,0.88 and 0.89.Therefore,the proposed foreign key detection algorithm is more suitable for web tables.Compared with the existing methods,it has higher accuracy,recall and F-measure.

Data Replicas Distribution Transition Strategy in Cloud Storage System

WU Xiu-guo, LIU Cui

Computer Science. 2019, 46 (10): 202-208. doi:10.11896/jsjkx.180901623

Abstract

PDF(2045KB) ( 1030 )

References | Related Articles | Metrics

Replication is a common method used to improve the data access reliability and system fault tolerance in the cloud storage system.It is one of the most important topics to reschedule the replicas distribution dynamically according to the changes of users’ requirements and environment in the replicas management.However,current replicas redistribution strategies mostly focus on the new replicas schemes,such as replicas number and their placements,on the pre-mise that it can be completed automatically,without taking into account the task scheduling problem in practice.In fact,data replica distribution transition is a complex scheduling problem involving data replicas migration and deletion among data centers.In addition,the required disk space and time of different scheduling strategies have large differences,lea-ding to big difference in cost and efficiency.In this way,this paper first proposed a data replicas distribution transition model and feasibility analysis in the cloud storage environment.Also a minimum-cost data replicas distribution transition problem definition was presented,and then its complexity was proven based on 0-1 Knapsack problem.Besides random strategy,three transition strategies (MTCF,MOCF and MTCFSD) were given from minimum-cost view.In the end,a series of experiments were performed on CloudSim simulation platform.The results show that nearly 60% of transmission number and 50% of transmission cost are reduced compared with other methods,indicating the proposed method’sreliability and effectiveness,so as to further improve the cloud storage system performance.

Evaluation Model of Software Quality with Interval Data

YUE Chuan, PENG Xiao-hong

Computer Science. 2019, 46 (10): 209-214. doi:10.11896/jsjkx.180801554

Abstract

PDF(1775KB) ( 1041 )

References | Related Articles | Metrics

For the defects of traditional evaluation methods,a new evaluation model of software quality was developed in this paper.First,aimed at the existing problems of present projection measures,a new normalized projection measure is provided in this research.Second,an evaluation model of software quality with interval data is established,which is based on the new projection model and TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) technique.Then an assessment procedure is elaborated in a group decision-making setting.The evaluation matrices,weighted evaluation matrices,positive and negative ideal decisions and relative closeness are involved in this model.The evaluation information is based on a questionnaire survey.Finally,the effectiveness and feasibility of the developed method are illustrated by a practical example and an experimental analysis.The experimental results show that the evaluation model has the advantages in robustness and practicability.

Multi-view Attentional Approach to Single-fact Knowledge-based Question Answering

LUO Da, SU Jin-dian, LI Peng-fei

Computer Science. 2019, 46 (10): 215-221. doi:10.11896/jsjkx.190400071

Abstract

PDF(1677KB) ( 1281 )

References | Related Articles | Metrics

Knowledge base question answering (KB-QA) has received extensive attention in recent years,and becomes an important natural language processing task.In the knowledge base question answering task,simple question refers to the question that can be answered by a single-fact of the knowledge base.For this task,the existing approaches mainly map the question and the KB fact into a common vector space and calculate their similarity to get the answer.But this approach would lose part of the semantic interaction information of the original words.To solve this problem,a multi-view attention-based relation detection approach was proposed,which aims to model the correlation between the question and the KB relation from multiple perspectives,and preserves more original interaction information so as to improve the accuracy of the approach.In addition,in order to alleviate the impacts of noisy data and improve the accuracy of entity recognition,this paper also encoded the question by combining the dynamic word vectors based on the language model and the part-of-speech feature of the word during the process of entity linking.Finally,experiments conducted on the SimpleQuestions dataset based on FB2M and FB5M achieve the accuracy results of 78.9% and 78.3%,respectively,which illustrative the effectiveness of the proposed approach for reflecting the semantic correlation between the question and the KB relation,and reflect the improvements of the accuracy for single-fact KBQA.

Improved NSGA-II Algorithm Based on Loser Group and Hybrid Coding Strategy

LIU Xin-ping, GU Chun-hua, LUO Fei, DING Wei-chao

Computer Science. 2019, 46 (10): 222-228. doi:10.11896/jsjkx.181001852

Abstract

PDF(2284KB) ( 1120 )

References | Related Articles | Metrics

The congestion coefficient operator of NSGA-II in elite selection can not optimize the distribution of local congestion area effectively,and some individuals closer to Pareto optimal solution set will be eliminated.An improved algorithm based on loser group and hybrid encoding strategy (LGHC-NSGA-II) was proposed to overcome the shortcomingthat excellent individuals are not retained in the congestion coefficient operator.Referring to the double-losing elimination system in chess games,an external archive set of loser group is constructed.After the iteration,the archive set is merged with the last generation parent population,and the distribution coefficient is optimized by cyclic congestion coefficient ranking strategy.At the same time,a hybrid coding strategy was proposed to overcome the shortcomings of traditional coding methods in global or local space,which effectively improves the convergence of the algorithm.The test results on ZDT series problems show that the improved algorithm is superior to eight multi-objective evolutionary algorithms in convergence,distribution and robustness.

Interval-valued Intuitionistic Fuzzy Entropy Based on Exponential Weighting and Its Application

ZHANG Mao-yin, ZHENG Ting-ting, ZHENG Wan-rong

Computer Science. 2019, 46 (10): 229-235. doi:10.11896/jsjkx.180901738

Abstract

PDF(1253KB) ( 1115 )

References | Related Articles | Metrics

Entropy is an important means to describe the uncertainty degree of fuzzy sets.To depict the uncertainty of interval-valued intuitionistic fuzzy sets,this paper first put forward the definition of core interval of interval-valued intui-tionistic fuzzy sets based on Hukuhara difference (H-difference) of interval numbers,which can effectively reflect the fuzziness generated by the force comparison between membership degree and non-membership degree of interval-valued intuitionistic fuzzy sets.Considering the uncertainty of interval-valued intuitionistic fuzzy sets is codetermined by the fuzziness and hesitancy,this paper proposed the basic criterion of the uncertainty measurement of interval-valued intui-tionistic fuzzy sets,which more accords with human intuition.Due to the difficulty for completely determining the proportion of fuzziness and hesitancy,in order to better describe the influence of the fuzziness and hesitancy on the uncertainty degree of interval-valued intuitionistic fuzzy sets,this paper presented a new interval-valued intuitionistic fuzzy entropy based on the exponential weighted method.Comparison example analysis under properties discussion and diffe-rent for interval-valued intuitionistic fuzzy entropy demonstrates that when hesitancy degree interval is the same,theinterval-valued intuitionistic fuzzy entropy decreases with the increase of the number of left and right intervals of the core interval,and when core interval is the same,the new fuzzy entropy increases with the increase of the number of left and right intervals of the hesitancy degree interval,which are accord with basic principle of uncertainty measurement.The proposed method completely shows that the uncertainty can increase with the increase of the fuzziness and hesitancy,which is accordance with human intuition.Secondly,this paper analyzed and verified that when the interval-valued intui-tionistic fuzzy sets degenerate into the intuitionistic fuzzy sets,the new fuzzy entropy constructed by the proposed me-thod can measure the degree of uncertainty of the intuitionistic fuzzy sets effectively.Finally,the proposed new entropy formula is applied effectively in multiple attributes decision-making analysis with unknown attribute weights and the rationality of the method is verified by an example,which provides a new way to solve multi-attribute decision-making problem.

Formal Vector Method of Rule Extraction for Consistent Decision Information System

YAN An, YAN Xin-yi, CHEN Ze-hua

Computer Science. 2019, 46 (10): 236-241. doi:10.11896/jsjkx.190200270

Abstract

PDF(2039KB) ( 1043 )

References | Related Articles | Metrics

Knowledge representation and acquisition is one of the key problems in the field of artificial intelligence,and rule acquisition is one of the important research contents.Formal concept analysis is an effective method to deal with big data and uncertain knowledge,which is widely used in knowledge representation and data mining.Formal concept analysis can realize the rule extraction of decision information system.Firstly,the decision information system is transformed into the formal context,then the concept is generated by the formal context and the rules are acquired by concept operation.However,the generation of concepts is a complex computational process,and the generated rules are often redundant.On the basis of the formal context,the formal vector and its properties were defined and discussed,the tree topology diagram of formal vector was constructed,and a rule extraction algorithm of the decision information system based on the formal vector was proposed.The algorithm is based on granular computing,the formal vector in each layer from coarse to fine granularity space is computed,and the rules are extracted by the relationship between the conditional formal vectors and the decision formal vectors.The visualization of the rule extraction process is realized based on the tree topology diagram.And the actual time cost of the rule extraction process is greatly reduced by pruning operation.Finally,the correctness and effectiveness of the algorithm were verified by mathematical proof and case analysis.The comparison experiment also shows that the algorithm has better timeliness and higher recognition rate at the same time.

Multi-class-specific Attribute Reduction of Lower Approximation in Ordered Decision Tables

YU Tian-you, ZHANG Nan, YUE Xiao-dong, TONG Xiang-rong, KONG He-qing

Computer Science. 2019, 46 (10): 242-251. doi:10.11896/jsjkx.180901781

Abstract

PDF(2273KB) ( 1068 )

References | Related Articles | Metrics

Attribute reduction is one of the important research topic in rough set theory.The minimal feature subset of a given information system can be obtained by attribute reduction.Classical attribute reduction in ordered decision system is about all decision classes in decision attribute.However,in practice,considering the preference of decision maker and the lack of some decision data,only attribute reduction of some specific decision classes is needed.Based on this consideration,in this paper,the dominance relations and lower approximation reduction in ordered decision table were reviewed,the single-class-specific and multi-class-specific lower approximation reduction in ordered decision table were presented,a discernibility matrix for this reduction was introduced,and an algorithm of lower approximation attribute reduction in ordered decision system was proposed.The multi-class-specific lower approximation reduction can degene-rate to single-class-specific reduction or classical reduction,so it is a more general reduction framework.In the experiment,6 sets of UCI data sets were used,3 single-class-specific reducts and 3 multi-class-specific reducts were calculated on each data set,and the reduction result and efficiency were compared with the result and efficiency of classical lower,upper approximation reduction and maximum distribution reduction.Experiment results show that when the number of selected specific classes is less than all decision classes,the reduct may be shorter,and the efficiency will be improved to varying degrees.

Distant Supervision Relation Extraction Model Based on Multi-level Attention Mechanism

LI Hao, LIU Yong-jian, XIE Qing, TANG Ling-li

Computer Science. 2019, 46 (10): 252-257. doi:10.11896/jsjkx.180901780

Abstract

PDF(1655KB) ( 1676 )

References | Related Articles | Metrics

As one of the main tasks of information extraction,entity relation extraction aims at determining the relationship category of two entities in unstructured text.At present,the supervised method with high accuracy is limited by the need for a large number of manual tagging corpus.The distant supervision method obtains a large number of relational triples by heuristic alignment between knowledge base and text set,which is the main way to solve the large-scale relational extraction task.In order to solve the problems that the high-dimensional semantics of words in sentence context are not fully utilized and the dependency-inclusion relationship between relationships is not considered in the current research on distant supervision relation extraction,this paper proposed a multi-level attention mechanism model for distant supervision relation extraction.In this model,the high-level semantics of sentences are obtained by utilizing the bidirectional GRU(Gate Recurrent Unit) neural network to code the sentence word vectors.Then,the word-level attention is introduced to calculate the degree of correlation between two entities and the context words,thus capturing the semantic information of the entity context in sentences adequately.Next,the sentence-level attention is constructed on multiple instances to reduce the tag error annotation problem.Finally,the dependency-inclusion relationship between different relationships is automatically learned by the relation-level attention.The experimental results on FreeBase+NYT public dataset show that the introduction of word-level,sentence-level and relation-level attention mechanisms on the basis of bidirectional GRU model can improve the effect of distant supervision relation extraction.Compared with the existing mainstream methods,the multi-level attention mechanism relation extraction model obtained by integrating three levels attention mechanisms improves the accuracy and recall rate by about 4%,which achieves better relation extraction effect,thus providing a theoretical foundation for further constructing the knowledge graph and intelligent question answering applications.

Sentiment Analysis Based on Word Embedding Auxiliary Mechanism

HAN Xu-li, ZENG Bi-qing, ZENG Feng, ZHANG Min, SHANG Qi

Computer Science. 2019, 46 (10): 258-264. doi:10.11896/jsjkx.180901687

Abstract

PDF(1285KB) ( 1706 )

References | Related Articles | Metrics

Text sentiment analysis is an important research direction in natural language processing,and how to analyze the sentiment polarity of long text is a research hotspot.At present,majority of the researches tend to apply word embedding in neural network models for sentiment analysis.Although this method has a good representation ability of word features,it has some weaknesses for long text.Extremely long text will bring heavy burden to the model and make it consume more time and resources in training process.In light of this,this paper proposed an attention neural network model based on word embedding auxiliary mechanism,namely WEAN,and it is applied to sentiment analysis for long text.The model deals with some training burdens of long text in neural network model by using word embedding auxi-liary mechanism and utilizes bidirectional recurrent neural network and mechanism to obtain context information.At the same time,it captures information with different importance degree in sequences,thus improving the sentiment classification performance.Experiment was conducted on IMDB,Yelp 2013 and Yelp 2014 datasets.The results show that the accuracy of the sentiment analysis of the proposed model increases by 1.1%,2.0% and 2.6% respectively on three datasets compared with NSC+LA model.

Reinforcement Learning Algorithm Based on Generative Adversarial Networks

CHEN Jian-ping, ZOU Feng, LIU Quan, WU Hong-jie, HU Fu-yuan, FU Qi-ming

Computer Science. 2019, 46 (10): 265-272. doi:10.11896/jsjkx.180901655

Abstract

PDF(1721KB) ( 1660 )

References | Related Articles | Metrics

With respect to the slow learning rate caused by the lack of experience samples at the early stage for most traditional reinforcement learning algorithms,this paper proposed a novel reinforcement learning algorithm based on the generative adversarial networks.At the early stage,the algorithm collects a small amount of experience samples to construct a real sample set by a stochastic policy,and utilizes the collected samples to train GAN.Then,this algorithm uses the GAN to generate samples to construct a virtual sample set.After that,by combining two sample set,this algorithm selects a batch of samples to train value function network,thus improving the learning rate to some extent.Moreover,combining a deep neural network,this algorithm introduces a new model namely rectified relationship unit to train the internal relationship between the state,action and the next state and reward,feedbacks the GAN with the relative entropy and improves the sample quality generated by GAN.Finally,this paper applied the proposed algorithm and DQN algorithm to the traditional CartPole and MountainCar problem on OpenAI Gym platform The experimental results show that the learning rate is accelerated effectively and the convergence time is cut down by 15% through the proposed method compared with DQN.

Research on Intelligent Vehicle Speed Planning Algorithms Based on Trapezoidal Planning Curve

CAO Bo, LI Yong-le, ZHU Ying-jie, JIA Bin, XU You-chun

Computer Science. 2019, 46 (10): 273-278. doi:10.11896/jsjkx.190400147

Abstract

PDF(2210KB) ( 2158 )

References | Related Articles | Metrics

Aiming at the problem of short deceleration distance and poor stationarity caused by late deceleration in par-king process when QP (quadratic programming) algorithm is applied to speed planning of intelligent vehicles,there are some problems such as short deceleration distance and poor stationarity caused by late deceleration in the stopping process.This paper presented an intelligent vehicle speed planning algorithm based on the trapezoidal programming curve.Firstly,the QP model of speed planning is established and solved.Then,the stopping process based on trapezoidal programming curve at different initial speeds is analyzed,andits results are considered as nonlinear constraint to instantiate and solved QP model .Finally,the experimental results of QP algorithm and the algorithm were compared and analyzed through simulation experiment and real car experiment.In the simulation experiment,the initial speed of 39.8 km/h,31.5 km/h and 20.6 km/h was used to enter the parking process respectively.The speed curve shows that the proposed algorithm can advance the deceleration time,which preliminarily shows that the algorithm has the optimization effect.In the real vehicle experiment,compared with QP algorithm,the proposed algorithm advances the parking process of the three initial speed by 5.9 s,5.0 s and 3.7 s,the absolute value of the average acceleration decreases by 0.5 m/s²,0.5 m/s²and 0.4 m/s²,the absolute value of the maximum acceleration decreases respectively by 0.16 m/s²,0.33 m/s² and 0.35 m/s².The simulation and real vehicle experiments show that the improved method has obvious improvement effect and significant optimization effect.

Multi-stage Regional Transformation Strategy in Move-based Three-way Decisions Model

GUO Dou-dou, JIANG Chun-mao

Computer Science. 2019, 46 (10): 279-285. doi:10.11896/jsjkx.180801609

Abstract

PDF(1332KB) ( 1038 )

References | Related Articles | Metrics

The basic ideas of three-way decisions (3WD) proposed by Prof Yao is dividing a whole set into three parts and developing different strategies on the three parts.Furthermore,Yao proposed the trisecting-acting-outcome model.Trisecting,acting and outcome are three basic elements of 3WD.In the movement-based 3WD model,the movement of objects leads regional changes,and this chang is called regional transformatio.In the step of “acting”,that “acting” can be one-time or multiple times should be conisdered.In this process,costs or benefits are involved,so the “acting” needs to be further considered from the perspective of economy.Based on three-way decisions from the view of generalization,this paper proposed a three-way decisions model with multi-stage regional transformation,and sought the optimal “ac-ting” by measuring the outcome of “acting”.The optimal transformation strategy was studied,that is the cost optimization of one-time transformation and multiple transformation.In the three-way decisions models with multi-stage regional transformation,the cost of regional transformation is analyzed,and the number of stages is divided according to the number of regional transformation times.A dynamic programming algorithm was proposed to find the optimal transformation strategy,and then the optimal transformation strategy was presented in the case of maximizing the benefit.Finally,an example was given to analyze the one-time and multiple transformation costs of the region,and the optimal transformation times and the optimal transformation costs of multi-stage regional transformation were further obtained.This paper illustrated the effectiveness and the practicability of the algorithm by an example.

Fast Coding Unit Partition Algorithm for Depth Maps

ZHU Wei, YI Yao, WANG Tu-qiang, ZHENG Ya-yu

Computer Science. 2019, 46 (10): 286-294. doi:10.11896/jsjkx.180701337

Abstract

PDF(2752KB) ( 1289 )

References | Related Articles | Metrics

The 3D high efficient video coding (3D-HEVC) is the new generation of video coding standard,which adopts depth maps to reduce the number of viewpoints significantly.Depth maps contain geometric information which can improve the video compression efficiency,but the depth image encoding has a heavy computation and takes about 4 times as long as the color image encoding.In this paper,in order to reduce the computational complexity of the depth image coding,a coding unit (CU) partition algorithm based on texture analysis was proposed for 3D-HEVC intra coding.Firstly,the rough texture analysis is performed for the depth map,the global grayscale classification based on the OTSU method is calculated through the texture characteristics of the depth map,and the texture complexity and the direction identification of CTU are determined by the sample points in CTU.Then,the fine texture analysis is performed on a CTU with high texture complexity,and the statistical features of the pixel distribution in the CUs are used to compute the textural division flags from bottom to top for different size of CUs.Finally,the texture complexity of CTU,texture direction flags and CU texture division flags are utilized to predict the depth range of current CTU and decide whether to terminate the division of CU.Compared with the original algorithm in 3D-HEVC test model,the proposed algorithm can reduce 45% encoding time on average with only 0.8% increase in Bjontegaard delta bit rate under the all-intra configuration.Compared with three state-of-the-art algorithms,the proposed algorithm reduces the encoding time by about 12%,% and 4% respectively for overall sequences,and 14%,11% and 10% respectively for the large-resolution sequences,with a similar encoding rate distortion performance.

Research and Improvement of Web Fingerprint Identification Algorithm Based on Cosine Measure

TANG Wen-liang, TANG Shu-fang, ZHANG Ping

Computer Science. 2019, 46 (10): 295-398. doi:10.11896/jsjkx.180801473

Abstract

PDF(1276KB) ( 1330 )

References | Related Articles | Metrics

In order to realize the accurate identification of Web fingerprints in the Web fingerprint database,it is necessary to study the Web fingerprint identification algorithm.When the current fingerprint recognition algorithm is used to identify the Web fingerprint in the Web fingerprint database,there is an error between the recognition result and the actual result,and the recognition takes a long time,which result in low recognition accuracy and recognition efficiency.Based on the cosine measure,a Web fingerprint identification algorithm was proposed.The source fingerprint method is used to select the Web fingerprint in the four aspects of structural features,static files,cookie design and keywords,and a Web fingerprint database is established.Firstly,the characteristics of the data in the Web fingerprint database are extracted,and the abnormal data existing in the Web fingerprint database are removed according to the feature extraction result.Then,the cosine distance function is used as the similarity measurement function,and the K-means algorithm is used to cluster the Web fingerprints in the Web fingerprint database.Finally,the identification of the web fingerprint is completed according to the clustering result.The experimental results show that the proposed method can accurately complete the Web fingerprint identification in the Web fingerprint database in a short time,and has the advantages of high recognition accuracy and high recognition efficiency.

Parking Anomaly Behavior Recognition Method Based on Key Sentence of Behavior Sequence Features

WANG Hong-nian, SU Han, LONG Gang, WANG Yan-fei, YIN Kuan

Computer Science. 2019, 46 (10): 299-306. doi:10.11896/jsjkx.180901750

Abstract

PDF(4057KB) ( 1232 )

References | Related Articles | Metrics

With the development of technology and the popularity of cameras,people’s demands on intelligent video surveillance are increasing.Anomaly behavior recognition is a key part of intelligent monitoring systems and plays an important role in maintaining social security.Aiming at the spatio-temporal feature of video data,this paper proposed a method of characterizing behavior as a key sentence with time series,termed Key Sentence of Behavior Sequence (KSBS),and realized the anomaly behavior recognition in the parking scenes by learning key sentences of behaviors.Firstly,the motion sequence is segmented,the foreground target is extracted,and the Motion Period Curve (MPC) of the foreground target is calculated.Then,according to the motion cycle curve,the MPC and DTW method are used to extract the behavior key frames.Finally,based on the semantic understanding method in the field of natural language proces-sing,the behavior key frames are characterized as a series of behavior key sentence.In light of time series features of key sentences,LSTM,which is expert in dealing with time series data,is used to classify the key statements of behaviors.In order to solve the existing data imbalance problem,GAN is used to expand the training set,thus increasing the sample space and balancing the difference between different types of data.Validation results on CASIA behavior database and self-built behavior database show that the average recognition rate of the proposed method for anomaly behavior is 97%.It is proved that the Key Sentece of Behavior Sequence can better represent the behavior information and the LSTM model is more suitable for learning the patterns behind the time series data,verifying the effectiveness of the proposed method on anomaly behavior recognition in parking scenes.

Face Recognition Method Based on Adaptively Weighted Sub-pattern Discriminant Neighborhood Projection

YANG Liu, CHEN Li-min, YI Yu-gen

Computer Science. 2019, 46 (10): 307-310. doi:10.11896/jsjkx.190300061

Abstract

PDF(1899KB) ( 975 )

References | Related Articles | Metrics

Face recognition is one of the hot topics in the image processing and pattern recognition,an adaptively weighted sub-pattern discriminant neighborhood projection method was proposed for face recognition.Firstly,the face images are divided into several small blocks,and the sub images with same position are used to construct the sub-pattern set.Then,in order to improve the discrimination ability of low dimensional features,the local data structure information and the label information of each sub pattern set are employed to construct a local discriminant neighborhood graph.Finally,for taking different contribution of different sub-pattern into account,a non negative weight vector is introduced to combine with the local scatter matrices of all sub-pattern sets,in order to find out the complementary information between different sub-image of the same faceimage.The experimental results show that the proposed method is superior to other methods.

Flocking Based on Communication Delay and Noise

WANG Shi-li, JIN Ying-hua, WU Chen

Computer Science. 2019, 46 (10): 311-315. doi:10.11896/jsjkx.180901706

Abstract

PDF(1860KB) ( 972 )

References | Related Articles | Metrics

In real life,flocking is a very common phenomenon.However,in flocking system,due to the limited speed and traffic congestion,generally,there are time delays in transmission and communication between agents,so it is necessary to consider time delay.In addition,in the real-world environment,agents are also vulnerable to noise due to various uncertainties in the external environment,so noise is also inevitable to be considered.Based on these two points,this paper studied the flocking of multi-agent system with communication delay and noise.This paper specifically considered the Cucker-Smale model with communication delay and noise,and verified that the multi-agent system can still obtain a sufficient flocking condition when the amount of delay and the noise intensity of multi-agent system satisfy some certain conditions based on the properties of quadratic function.Finally,a numerical simulation was carried out by using MATLAB,and an example was given.The results show the correctness of the theory.

Fine-grained Geolocalisation of User Generated Short Text Based on LBSN

DENG Yao, JI Wen-li, LI Yong-jun, GAO Xing

Computer Science. 2019, 46 (10): 316-321. doi:10.11896/jsjkx.180901624

Abstract

PDF(1853KB) ( 954 )

References | Related Articles | Metrics

It is significant to use user generated short text (UGST) to estimate user’s fine-grained location.Most exis-ting methods rarely introduce the semantic information about the location in UGST,and do not prioritize the entities according to their importance,thus leading to the decrease of performance.A fine-grained geolocalisation of user-generated short text based on location-based social network (LBSN) was proposed to solve these problems.The proposed algorithm consists of three key components.1) UGST of Foursquare is used to build the tight coupling between entity and location,which can address the location-annotated sparseness problem.2) UGST is filtered out if it does not contain any location-specific entities,which allows us to eliminate the interference of noisy UGSTs.3) The candidate locations for each remaining UGST are ranked based only on its textual data,and the top-ranked location is selected for UGST.The experimental results show the effectiveness of the proposed method.

Change Region Analysis of Configurable Business Process Model Based on Business Capability

YING Li, FANG Xian-wen, WANG Li-li, LIU Xiang-wei

Computer Science. 2019, 46 (10): 322-328. doi:10.11896/jsjkx.180901692

Abstract

PDF(1733KB) ( 929 )

References | Related Articles | Metrics

In order to meet the diverse needs of users,the process model needs to be configured according to the actual needs of users.However,in the process of process model configuration,problems such as unreasonable configuration or changing region are prone to occur.To solve these problems,a method for changing region analysis of configurable business process models based on business capabilities was proposed.Firstly,according to the categories and attributes of the business capabilities,two configurable source models are configured and merged.Then,the difference of connector types between the available business capability annotation models is analyzed to find the change region of the configurable process model.The exact change area of the configurable source model is found through the mapping relationship between the process models.Finally,the relevant example analysis is given to prove the effectiveness of the proposed method.

Dynamic Estimation of Flight Departure Time Based on Bayesian Network

XING Zhi-wei, ZHU Hui, LI Biao, LUO Qian

Computer Science. 2019, 46 (10): 329-335. doi:10.11896/jsjkx.181102039

Abstract

PDF(2194KB) ( 1335 )

References | Related Articles | Metrics

In order to accurately perceive the flight departure process and estimate the flight departure time,a method for estimating flight departure time based on dynamic Bayesian network was designed.Firstly,the factors affecting the flight departure process are analyzed based on different flight attributes.According to the influencing factors,the data are classified and processed,and the Monte Carlo simulation method is combined with the historical data classification to obtain the joint and prior distribution of each link.The Kolmogorov test is used to determine the joint distribution model of each link,and the parameters of the dynamic Bayesian network modelare obtained.Secondly,the Bayesian network architecture and conditional probability are used to infer the dynamic estimation of the departure time and the completion time of each link.Finally,the single-destination operation data of an airport in the middle of the country are selected for simulation verification.The research results show that with the progress of the process,the propagation error will gradually increase,and the estimated accuracy of the departure time will be over 80%,what’s more,the stability of the dynamic estimation result will be better,which can fully reflect the actual situation of the key node in the departure process of the flights.