Computer Science

Select

AI Governance and System:Current Situation and Trend

CHAO Le-men, YIN Xian-long

Computer Science 2021, 48 (9): 1-8. DOI: 10.11896/jsjkx.210600034

Abstract （848）

PDF（pc）（1734KB）（1840）

Save

The main purpose of AI governance is to take advantage of AI and reduce the risk.AI governance also aims to build a responsible AI via embracing the influencing factors such as technology,law,policy,standard,ethics,morality,safety,economy,as well as society.AI governance has three aspects:individual intelligent governance,group intelligent governance,human-computer cooperation and symbiotic system governance,which can be divided into three levels:technical level,ethical level,social and legal level.There are four key technologies for AI governance,which are intelligible AI,defense against adversarial attacks,modeling and simulation,and real-time audit.The industry is mostly concerned about developing a responsible AI in that by studying the actual practice of AI governance from leading companies like Google,IBM and Microsoft.Furthermore,tools like interpretability,privacy protection and fairness check for AI systems are already in use.At present,the main research topics on AI governance includes software-defined AI governance,key technologies of AI governance,AI governance evaluation in large-scale machine lear-ning,AI governance based on federated learning,standardization of AI governance,enhancement on artificial intelligence and human-in-the-loop AI training.

Reference | Related Articles | Metrics

Select

AI Governance Oriented Legal to Technology Bridging Framework for Cross-modal Privacy Protection

LEI Yu-xiao , DUAN Yu-cong

Computer Science 2021, 48 (9): 9-20. DOI: 10.11896/jsjkx.201000011

Abstract （595）

PDF（pc）（1659KB）（3032）

Save

With the popularity of virtual communities among network users,virtual community groups have become a small society,which can extract user-related privacy resources through the “virtual traces” left by users' browsing and user-generated content user published.Privacy resources can be classified into data resources,information resources and knowledge resources according to their characteristics,which constitute the data,information,knowledge,and wisdom graph (DIKW graph).There are four circulation processes for privacy resources in virtual communities,namely,the sensing,storage,transfern,and processing of privacy resources.The four processes are respectively completed by the three participants,the user,the AI system,and the visitor individually or in cooperation.The right to privacy includes the right to know,the right to participate,the right to forget,and the right to supervise.By clarifying the scope of privacy rights of the three participants in the four circulation processes,and combining the protection of privacy values,an anonymous protection mechanism,risk assessment mechanism and supervision mechanism are designed to build an AI governance legal framework for privacy protection of virtual communities.

Reference | Related Articles | Metrics

Select

Survey on Privacy Protection Solutions for Recommended Applications

DONG Xiao-mei, WANG Rui, ZOU Xin-kai

Computer Science 2021, 48 (9): 21-35. DOI: 10.11896/jsjkx.201100083

Abstract （899）

PDF（pc）（3571KB）（2064）

Save

In the context of the era of big data,various industries want to train recommendation models based on user behavior data to provide users with accurate recommendations.The common characteristics of the used data are huge amount,carrying sensitive information,and easy to obtain.The recommendation system is sharing users' private data in real time while bringing accurate recommendation and market profit.Differential privacy,as a privacy protection technology,can cleverly solve the problem of privacy leakage in recommendation applications.No matter the attacker has any relevant background knowledge,differential privacy strictly defines privacy protection,and provides quantitative evaluation methods to ensure that the level of privacy protection provided by the data set is comparable.First,the concept of differential privacy and the research on mainstream recommendation algorithms is briefly described.Second,the combined application of differential privacy and recommendation algorithms is analyzed,such as matrix factorization,deep learning recommendation,and collaborative filtering.A large number of comparative experiments have been conducted on recommendation algorithms based on differential privacy technology.Then the application scenarios of the combination of differential privacy and each recommendation algorithm and the remaining problems are discussed.Finally,effective suggestions are put forward for the future development direction of the recommendation algorithm based on differential privacy.

Reference | Related Articles | Metrics

Select

Research on Big Data Governance for Science and Technology Forecast

WANG Jun, WANG Xiu-lai, PANG Wei, ZHAO Hong-fei

Computer Science 2021, 48 (9): 36-42. DOI: 10.11896/jsjkx.210500207

Abstract （370）

PDF（pc）（4195KB）（1360）

Save

From imitation to innovation,from following to leading,is not only a major change in the development of science and technology in China at this stage,but also a major strategic demand for national development.In recent years,relevant scholars at home and abroad have carried out the research of science and technology development trend analysis and hot spot tracking,but due to the lack of systematic big data collection and governance system,the scope of data analysis and mining is often limited to the single data sample of science and technology literature.Aiming at the goal of forward-looking prediction of science and technology development,this paper comprehensively analyzes the massive heterogeneous data that affect the development process of science and technology,such as all kinds of scientific and technological literature,scholar dynamics,forum hot spots and social comments.By building a data-driven big data governance system,this paper solves the data remediation problems in the process of detection and discovery,accurate collection,cleaning and aggregation,fusion processing,model construction,prediction and calculation.At the same time,on the basis of big data remediation,LDA model is used to achieve technology trend prediction and ana-lysis.The research results provide technical support for the system to solve the problem of hidden information discovery and relationship reasoning in massive scientific and technological big data.

Reference | Related Articles | Metrics

Select

Time Aware Point-of-interest Recommendation

WANG Ying-li, JIANG Cong-cong, FENG Xiao-nian, QIAN Tie-yun

Computer Science 2021, 48 (9): 43-49. DOI: 10.11896/jsjkx.210400130

Abstract （500）

PDF（pc）（1964KB）（847）

Save

In location-based social networks (LBSN),users share their location and content related to location information.Point-of-interest (POI) recommendation is an important application in LBSN which recommends locations that might be of interest to users.However,compared with other recommendation problems (such as product and movie recommendation),the users' prefe-rence for POI is particularly determined by the time feature.In this paper,the influence of time feature on POI recommendation task is explored,and a time-aware POI recommendation method is proposed,called TAPR (Time Aware POI Recommendation).Our method constructs different relation matrices based on different time scales,and uses tensor decomposition to decompose the constructed multiple relation matrices to obtain the representation of the user and the POI.Finally,our method uses cosine similarity to calculate similarity scores between users and non-visited POIs,and combines the algorithm of user preference modeling to obtain the final recommendation score.Experimental results on two public datasets show that the proposed TAPR performs better than other POI recommendation methods.

Reference | Related Articles | Metrics

Select

Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method

ZHOU Xin-min, HU Yi-gui, LIU Wen-jie, SUN Rong-jun

Computer Science 2021, 48 (9): 50-58. DOI: 10.11896/jsjkx.210500220

Abstract （461）

PDF（pc）（3639KB）（1426）

Save

The division and identification of urban functional areas is of great significance for analyzing the distribution status of urban functional areas and understanding the internal spatial structure of cities.This has stimulated the demand for multi-source geospatial data fusion,especially the fusion of urban remote sensing data and social sensing data.However,how to realize the fusion of urban remote sensing and social sensing data is a technical problem effectively.In order to realize the fusion of urban remote sensing and social sensing data and improve the accuracy of urban function recognition,taking remote sensing images and social sensing data as examples,introducing a multi-modal data fusion mechanism,and proposing a joint deep learning and ensemble learning model to infer urban regional functions.The model uses DenseNet and DPN network to extract urban remote sensing image features and social sensing features from multi-source geospatial data,and carries out multi-level data fusion of feature fusion,decision fusion and hybrid fusion to identify urban functions.The proposed model is verified on the URFC dataset,and these three evaluation index values of hybrid fusion overall classification accuracy,Kappa coefficient and average F1 are 74.29%,0.67,71.92%,respectively.Compared with the best classification method of single modal data,the three evaluation indexes of the proposed fusion model are increased by 18.83%,0.24,35.46% respectively.The experimental results show that the data fusion model has better classification performance,so that it can effectively fuse remote sensing image data and social sensing data,and realize the accurate identification of urban regional functions.

Reference | Related Articles | Metrics

Select

On Aircraft Trajectory Type Recognition Based on Frequent Route Patterns

SONG Jia-geng, ZHANG Fu-sang, JIN Bei-hong, DOU Zhu-mei

Computer Science 2021, 48 (9): 59-67. DOI: 10.11896/jsjkx.210100014

Abstract （296）

PDF（pc）（3566KB）（1576）

Save

With the development of global positioning and radar technology,more and more trajectory data can be collected.In particular,trajectories generated by aircrafts,ships,migratory birds are complicated and varied,and free from any constraints on the ground.For helping identifying the behaviors and intention of the flying objects,the recognition of the type of the aircraft tra-jectories has important value.Specifically,on the basis of identifying frequent route patterns,the paper proposes a new method,consisting of a frequent route patterns extracting algorithm and a convolution neural network model.The extracting algorithm first gets key points from the compressed trajectory,next finds the closed routes through the self-intersecting points of the trajectory,then discovers frequent patterns in the closed routes and treats them as the basis of classification.Further,the model recognizes the trajectory type via image analyses.This paper conducts extensive experiments on the real aircraft trajectory data disclosed on the FlightRadar24 website as well as the simulated data.The experimental results show that our method can effectively identify complex trajectory types.Compared with LeNet-5 CNN classification without trajectory extraction,our method has the superior performance,achieving an average accuracy of more than 95% for trajectory classification.

Reference | Related Articles | Metrics

Select

Heterogeneous Information Network Embedding with Incomplete Multi-view Fusion

ZHENG Su-su, GUAN Dong-hai, YUAN Wei-wei

Computer Science 2021, 48 (9): 68-76. DOI: 10.11896/jsjkx.210500203

Abstract （363）

PDF（pc）（3412KB）（920）

Save

Heterogeneous information network (HIN) embedding maps complex heterogeneous information to a low-dimensional dense vector space,which is conducive to the calculation and storage of network data.Most existing multi-view-based HIN embedding methods consider multiple semantic relationships between nodes,but ignore the incompleteness of the view.Most of views are incomplete and directly fusing multiple incomplete views will affect the performances of the embedding model.To address this problem,we propose a novel HIN embedding model with incomplete multi-view fusion,named IMHE.The key idea of IMHE is to aggregate neighbors of other views to reconstruct the incomplete views.Since different views describe the same HIN,neighbors in other views can restore the structure information of the missing nodes.The IMHE model first generates nodes sequences in different views,and leverages the multi-head self-attention method to obtain single-view embedding.For each incomplete view,IMHE finds the k-order neighbors of the missing nodes in other views,then aggregates the embeddings of neighbors in the incomplete view to generate new embeddings for missing nodes.IMHE finally uses the multi-view canonical correlation analysis method to obtain the joint embedding of nodes,thereby simultaneously extracting the hidden semantic relationship of multiple views.Experiment results on three real-world datasets show that the proposed method is superior to the state-of-the-art methods.

Reference | Related Articles | Metrics

Select

Cost-sensitive Convolutional Neural Network Based Hybrid Method for Imbalanced Data Classification

HUANG Ying-qi, CHEN Hong-mei

Computer Science 2021, 48 (9): 77-85. DOI: 10.11896/jsjkx.200900013

Abstract （508）

PDF（pc）（2590KB）（792）

Save

The imbalance classification is a common problem in the field of data mining.In general,the skewed distribution of data makes the classification effect of the classifier unsatisfactory.As an efficient data mining tool,convolutional neural network is widely used in classification tasks.However,if the training process is adversely affected by data imbalance,it will cause the classification accuracy of minority classes to decrease.Aiming at the classification problem of two-class unbalanced data,this paper proposes a hybrid method for unbalanced classification problems based on cost-sensitive convolutional neural networks.The proposed method first combines the density peak clustering algorithm with SMOTE,and preprocesses the data through oversampling to reduce the imbalance of the original data set.Then the cost sensitive is used to give different weights to different categories in the unbalanced data.Additionally,the Euclidean distance between the predicted value and the label value is considered.The proposed method assigns different cost losses to the majority class and the minority class in the unbalanced data to construct cost sensitivity convolutional neural network model to improve the recognition rate of convolutional neural network for minority classes.Six different datasets are used to verify the effectiveness of the proposed method.The experimental results show that the proposed method is able to improve the classification performance of the convolutional neural network model on unbalanced data.

Reference | Related Articles | Metrics

Select

Historical Driving Track Set Based Visual Vehicle Behavior Analytic Method

LUO Yue-tong, WANG Tao, YANG Meng-nan, ZHANG Yan-kong

Computer Science 2021, 48 (9): 86-94. DOI: 10.11896/jsjkx.200900040

Abstract （415）

PDF（pc）（4512KB）（1174）

Save

With the continuous development of smart city,vehicle track can be acquired automatically based on traffic bayonet,which lays a foundation for vehicle behavior analysis based on track.However,since the bayonet position is fixed,the vehicle tra-jectory is expressed as bayonet sequence.Therefore,the bayonet and trajectory are first mapped into words and sentences respectively,and the semantic similarity method is used to calculate the trajectory similarity.Then,based on the similarity of tracks,track entropy is proposed to measure the regularity of all tracks of a vehicle.Finally,the trajectory entropy is used to analyze the behavioral characteristics of vehicles.For example,vehicles with low trajectory entropy mean that the driving is particularly regular,which is likely to be commuter vehicles.To facilitate users in-depth analysis,this paper further provides a visual analysis system with more linkage view,which allows the user to compare the vehicle trajectory entropy,and combines clustering analysis and related interaction,to help users find meaningful vehicle behavior,such as commuting a commuter has a low trajectory entropy,following the model of taxi path entropy is very high.By analyzing the bayonet data set of Kunming city in February 2019,the vehicle travel behavior and its characteristics in different trajectory entropy intervals can be found effectively,which proves the effectiveness of the proposed method.

Reference | Related Articles | Metrics

Select

Railway Passenger Co-travel Prediction Based on Association Analysis

LI Si-ying, XU Yang, WANG Xin, ZHAO Ruo-cheng

Computer Science 2021, 48 (9): 95-102. DOI: 10.11896/jsjkx.200700097

Abstract （372）

PDF（pc）（2875KB）（688）

Save

With the fast development of transportation technology,the railway has become one of the main choices for people when they travel for business,vacation or visiting.As a result,the behavior of co-travel has become more and more common.Based on this co-travel relationship,people can construct a co-travel network,where each node represents a passenger and an edge indicates co-travel frequency between two passengers this edge connects,and the link prediction on the network such that persona-lized service and product can be provided even better.In light of this,this paper proposes a novel approach to predicting potential co-travel relationship.Specifically,we first propose two types of co-travel graph pattern association rules which are extended from their traditional counterparts,and can be used to predict new co-travel relationship and co-travel frequency,respectively.We then decompose this mining problem into three sub-problems,i.e.,frequent co-travel pattern mining,rules generation and association analysis,and develop parallel and centralized algorithms for these sub-problems.Extensive experimental studies on large real-life datasets show that our approach can predict potential co-travel relationship efficiently and accurately,with accuracies higher than 50% for two types of rules,and substantially superior to the traditional method (e.g.,Jaccard with accuracy 24%).

Reference | Related Articles | Metrics

Select

Biased Deep Distance Factorization Algorithm for Top-N Recommendation

QIAN Meng-wei , GUO Yi

Computer Science 2021, 48 (9): 103-109. DOI: 10.11896/jsjkx.200800129

Abstract （345）

PDF（pc）（1816KB）（806）

Save

Since traditional matrix factorization algorithms are mostly based on shallow linear models,it is difficult to learn latent factors of users and items at a deep level.When the dataset is sparse,it is inclined to overfitting.To deal with the problem,this paper proposes a biased deep distance factorization algorithm,which can not only solve the data sparse problem,but also learn the distance feature vectors with stronger characterization capabilities.Firstly,the interaction matrix is constructed through the explicit and implicit data of the user and the item.Then the interaction matrix is converted into the corresponding distance matrix.Secondly,the distance matrix is input into the depth of the bias layer by row and column respectively.The neural network learns the distance feature vectors of users and items with non-linear features.Finally,the distance between the user and the item is calculated according to the distance feature vectors.Top-N item recommended list is generated according to the distance value.The experimental results show that Precision,Recall,MAP,MRR and NDCG of this algorithm are significantly improved compared to other mainstream recommendation algorithms on four different datasets.

Reference | Related Articles | Metrics

Select

Smart Interactive Guide System for Big Data Analytics

YU Yue-zhang, XIA Tian-yu, JING Yi-nan, HE Zhen-ying, WANG Xiao-yang

Computer Science 2021, 48 (9): 110-117. DOI: 10.11896/jsjkx.200900083

Abstract （538）

PDF（pc）（3234KB）（1102）

Save

Traditional big data tools are generally built for professional data analysts,and they have the characteristics of being difficult to get started,poor operation interaction,and not intelligent enough.The intelligent interactive guidance system is a set of big data analysis auxiliary tools developed around the current problems of the big data interactive analysis system.The system not only develops core key technologies such as user intention understanding,data sampling and column recommendation,visualization recommendation,and analysis method recommendation,but also has a good graphical interface and a humanized intelligent interactive experience.While meeting the user's multiple interactive analysis needs,it also has a very high response speed.Not only can you go back to any step of the analysis process to reselect the method execution process at any time,but you can also quickly integrate with various analysis applications through the interface to deploy and apply to different scenarios.After experimental tests,the average interaction time of the system is within 3 seconds,and the execution time of the system interaction is accelerated by about 3 times compared with the traditional analysis method.After using case testing,the system is also more satisfying than the use of traditional tools.Through the exploration of ease of use,timeliness,interactivity,and intelligence,the smart interactive guide system allows users of different basic groups to use the system to complete the required big data analysis goals.

Reference | Related Articles | Metrics

Select

Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark

DAI Hong-liang, ZHONG Guo-jin, YOU Zhi-ming , DAI Hong-ming

Computer Science 2021, 48 (9): 118-124. DOI: 10.11896/jsjkx.210400280

Abstract （555）

PDF（pc）（2038KB）（966）

Save

With the development of mobile Internet technology,social media has become the main approach for the public to share views and express their emotions.Sentiment analysis for social media texts in major social events can effectively monitor public opinion.In order to solve the problem of low accuracy and efficiency of existing Chinese social media sentiment analysis algorithms,an ensemble sentiment analysis big data method(S-FWS) based on Spark distributed system is proposed.Firstly,the new words are found by calculating the PMI association degree after pre-segmentation by Jieba library.Then,the text features are extracted by considering the importance of words and feature selection is realized by Lasso.Finally,in order to improve the traditional Stacking framework neglecting the feature importance,the accuracy information of the primary learners is used to weight the probabilistic features,and the polynomial features are constructed to train the secondary learner.A variety of algorithms are introduced in the stand-alone mode and the Spark platform receptively to carry out comparative experiments.Results show that the S-FWS method proposed in this paper has certain advantages in accuracy and time consumption;distributed system can greatly improve the operating efficiency of the algorithms,and with the increase of working nodes,the time consumption of the algorithms gradually decreases.

Reference | Related Articles | Metrics