计算机科学 ›› 2021, Vol. 48 ›› Issue (2): 93-99.doi: 10.11896/jsjkx.200900039
刘立成, 徐一凡, 谢贵才, 段磊
LIU Li-cheng, XU Yi-fan, XIE Gui-cai, DUAN Lei
摘要: 随着信息化技术的发展,面对材料等相关领域数据的多源异构、扩展性强、爆炸增长等特点,传统关系数据库无法对数据进行存储,因此可利用NoSQL的无模式存储、高扩展性等特性来解决这一难题。作为NoSQL数据库常用的数据存储格式,JSON因简单性和灵活性备受欢迎。然而,NoSQL数据库缺乏模式信息,在JSON文档存入数据库之前,需要对其进行数据验证与分析。目前,大多数方法是基于JSON schema对JSON文档格式的规范性进行校验,无法有效解决JSON文档的异常检测以及语义歧义问题。为此,文中提出了面向NoSQL数据库的JSON文档异常检测与语义消歧模型doctorJSON。该模型基于JSON schema对存入的JSON文档分别设计了异常检测算法deoutJSON和语义消歧算法disemaJSON,以检测JSON文档存在的异常和歧义。在真实数据集与合成数据集上的实验验证了所提模型的有效性和执行效率。
中图分类号:
[1] KAUFMAN J,BEGLEY E.MatML:A Data Interchange Markup Language[C]//Advanced Materials & Processes/November.2003:35-36. [2] FRENKEL M,CHIRICO R,DIKY V,et al.ThermoML:XML-based IUPAC Standard for Experimental,Predicted,and Critically Evaluated Thermodynamic Property Data Storage and Capture[J].IUPAC Recommendations,2006,78(3):541-612. [3] LAKIOTAKI K,VORNIOTAKIS N,TSAGRIS M,et al.Bio-Dataome:a collection of uniformly preprocessed and automatically annotated datasets for data-driven biology[J].Database the Journal of Biological Databases & Curation,2018,2018:bay011. [4] JSON schema language[OL].http:∥json-schema.org. [5] PEZOA F,REUTTER J,SUAREZ F,et al.Foundations ofJSON Schema[C]//Proceedings of the 25th International Conference on World Wide Web(WWW '16).2016:263-273. [6] BOURHIS P,REUTTER J,SUÁREZ F,et al.JSON:data mo-del,query languages and schema specification[C]//PODS '17.2017:123-135. [7] WANG L,ZHANG S,SHI J,et al.Schema management for docu-ment stores[J].Proc.VLDB Endow,2015,8(9):922-933. [8] LI Y,KATSIPOULAKIS N,CHANDRAMOULI B,et al.Mison:a fast JSON parser for data analytics[J].PVLDB,2017,10(10):1118-1129. [9] FROZZA A,MELLO R,COSTA F.An Approach for Schema Extraction of JSON and Extended JSON Document Collections[C]//IRI '18.2018:356-363. [10] MEIKE K,UTA S,STEFANIE S.Schema Extraction andStructural Outlier Detection for JSON-based NoSQL Data Stores[C]//BTW '15.2015:425-444. [11] RAIHAN R,MALEEHA N,HAFIZ F,et al.A novel JSON based regular expression language for pattern matching in the internet of things[J].Journal of Ambient Intelligence and Humanized Computing,2019,10:1463-1481. [12] HAI R,QUI X C,KENSCHE D.Nested Schema Mappings for Integrating JSON [C]//Conceptual Modeling.ER 2018.2018,11157:397-405. [13] JAN O,CHRISTOPH L.Semantically Weighted SimilarityAnalysis for XML-based Content Components[C]//DocEng.2018,20:1-4. [14] CHEN W,ZHAO X.Similarity-Based Classification for BigNon-Structured and Semi-Structured Recipe Data[C]//Database Systems for Advanced Applications.2016:57-64. [15] BRAY T.The JavaScript Object Notation (JSON) Data Interchange Format[J].RFC,2014,8259:1-16. [16] Nigikokun.Generate-schema[OL].https://github.com/nijiko-kun/generate-schema. [17] Julian.Jsonschema[OL].https://github.com/Julian/jsonschema. [18] LI S,ZHAO Z,HU R F,et al.Analogical Reasoning on Chinese Morphological and Semantic Relations[C]//ACL.2018:138-143. [19] Fzumstein.Jsondiff:Diff JSON and JSON-like structures in Python[OL].https://github.com/fzumstein/jsondiff. [20] Rugleb.JsonCompare:compare two objects with a JSON-likestructure and data types[OL].https://github.com/rugleb/JsonCompare. |
[1] | 徐天慧, 郭强, 张彩明. 基于全变分比分隔距离的时序数据异常检测 Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance 计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174 |
[2] | 李其烨, 邢红杰. 基于最大相关熵的KPCA异常检测方法 KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion 计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175 |
[3] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[4] | 杜航原, 李铎, 王文剑. 一种面向电商网络的异常用户检测方法 Method for Abnormal Users Detection Oriented to E-commerce Network 计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092 |
[5] | 武玉坤, 李伟, 倪敏雅, 许志骋. 单类支持向量机融合深度自编码器的异常检测模型 Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder 计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142 |
[6] | 冷佳旭, 谭明圮, 胡波, 高新波. 基于隐式视角转换的视频异常检测 Video Anomaly Detection Based on Implicit View Transformation 计算机科学, 2022, 49(2): 142-148. https://doi.org/10.11896/jsjkx.210900266 |
[7] | 刘意, 毛莺池, 程杨堃, 高建, 王龙宝. 基于邻域一致性的异常检测序列集成方法 Locality and Consistency Based Sequential Ensemble Method for Outlier Detection 计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156 |
[8] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[9] | 郭奕杉, 刘漫丹. 基于时空轨迹数据的异常检测 Anomaly Detection Based on Spatial-temporal Trajectory Data 计算机科学, 2021, 48(6A): 213-219. https://doi.org/10.11896/jsjkx.201100193 |
[10] | 邢红杰, 郝忠. 基于全局和局部判别对抗自编码器的异常检测方法 Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder 计算机科学, 2021, 48(6): 202-209. https://doi.org/10.11896/jsjkx.200400083 |
[11] | 管文华, 林春雨, 杨尚蓉, 刘美琴, 赵耀. 基于人体关节点的低头异常行人检测 Detection of Head-bowing Abnormal Pedestrians Based on Human Joint Points 计算机科学, 2021, 48(5): 163-169. https://doi.org/10.11896/jsjkx.200800214 |
[12] | 邹承明, 陈德. 高维大数据分析的无监督异常检测方法 Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis 计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141 |
[13] | 石琳姗, 马创, 杨云, 靳敏. 基于SSC-BP神经网络的异常检测算法 Anomaly Detection Algorithm Based on SSC-BP Neural Network 计算机科学, 2021, 48(12): 357-363. https://doi.org/10.11896/jsjkx.201000086 |
[14] | 杨月麟, 毕宗泽. 基于深度学习的网络流量异常检测 Network Anomaly Detection Based on Deep Learning 计算机科学, 2021, 48(11A): 540-546. https://doi.org/10.11896/jsjkx.201200077 |
[15] | 冯安然, 王旭仁, 汪秋云, 熊梦博. 基于PCA和随机树的数据库异常访问检测 Database Anomaly Access Detection Based on Principal Component Analysis and Random Tree 计算机科学, 2020, 47(9): 94-98. https://doi.org/10.11896/jsjkx.190800056 |
|