Computer Science ›› 2021, Vol. 48 ›› Issue (2): 93-99.doi: 10.11896/jsjkx.200900039

• Database & Big Data & Data Science • Previous Articles     Next Articles

Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database

LIU Li-cheng, XU Yi-fan, XIE Gui-cai, DUAN Lei   

  1. School of Computer Science,Sichuan University,Chengdu 610065,China
  • Received:2020-08-04 Revised:2020-09-25 Online:2021-02-15 Published:2021-02-04
  • About author:LIU Li-cheng,born in 1995,master candidate,is a student member of China Computer Federation.His main research interests include data mining and knowledge engineering.
    DUAN Lei,born in 1981,Ph.D,professor,Ph.D supervisor,is a senior member of China Computer Federation.His main research interests include data mining,health-informatics and evolutionary computation.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(61972268).

Abstract: With the development of information technology,traditional relational database cannot be used for storage due to multi-source heterogeneity,strong scalability and explosive growth of data in materials and other related fields.Therefore,NoSQL can be used with the charactersitics of schemaless storage and high scalability to solve this problem.As a common data storage format for NoSQL databases,JSON is popular for its simplicity and flexibility.However,NoSQL databases lack schema information,and JSON documents need to be validated and analyzed before being stored in the database.At present,most methods verify the normalization of JSON document format based on JSON schema,which cannot effectively solve the problem of exception detection and semantic ambiguity of JSON document.Therefore,a JSON document outlier detection and semantic disambiguating model for NoSQL database is proposed,named doctorJSON.Based on JSON schema,the model designs outlier detection algorithm deout JSON and semantic disambiguation algorithm disemaJSON to detect the outlier and disambiguation in JSON documents.The vali-dity and efficiency of the model are verified by experiments on the real and synthetic datasets.

Key words: JSON document, JSON schema, NoSQL database, Outlier detection, Semantic disambiguation

CLC Number: 

  • TP311
[1] KAUFMAN J,BEGLEY E.MatML:A Data Interchange Markup Language[C]//Advanced Materials & Processes/November.2003:35-36.
[2] FRENKEL M,CHIRICO R,DIKY V,et al.ThermoML:XML-based IUPAC Standard for Experimental,Predicted,and Critically Evaluated Thermodynamic Property Data Storage and Capture[J].IUPAC Recommendations,2006,78(3):541-612.
[3] LAKIOTAKI K,VORNIOTAKIS N,TSAGRIS M,et al.Bio-Dataome:a collection of uniformly preprocessed and automatically annotated datasets for data-driven biology[J].Database the Journal of Biological Databases & Curation,2018,2018:bay011.
[4] JSON schema language[OL].http:∥json-schema.org.
[5] PEZOA F,REUTTER J,SUAREZ F,et al.Foundations ofJSON Schema[C]//Proceedings of the 25th International Conference on World Wide Web(WWW '16).2016:263-273.
[6] BOURHIS P,REUTTER J,SUÁREZ F,et al.JSON:data mo-del,query languages and schema specification[C]//PODS '17.2017:123-135.
[7] WANG L,ZHANG S,SHI J,et al.Schema management for docu-ment stores[J].Proc.VLDB Endow,2015,8(9):922-933.
[8] LI Y,KATSIPOULAKIS N,CHANDRAMOULI B,et al.Mison:a fast JSON parser for data analytics[J].PVLDB,2017,10(10):1118-1129.
[9] FROZZA A,MELLO R,COSTA F.An Approach for Schema Extraction of JSON and Extended JSON Document Collections[C]//IRI '18.2018:356-363.
[10] MEIKE K,UTA S,STEFANIE S.Schema Extraction andStructural Outlier Detection for JSON-based NoSQL Data Stores[C]//BTW '15.2015:425-444.
[11] RAIHAN R,MALEEHA N,HAFIZ F,et al.A novel JSON based regular expression language for pattern matching in the internet of things[J].Journal of Ambient Intelligence and Humanized Computing,2019,10:1463-1481.
[12] HAI R,QUI X C,KENSCHE D.Nested Schema Mappings for Integrating JSON [C]//Conceptual Modeling.ER 2018.2018,11157:397-405.
[13] JAN O,CHRISTOPH L.Semantically Weighted SimilarityAnalysis for XML-based Content Components[C]//DocEng.2018,20:1-4.
[14] CHEN W,ZHAO X.Similarity-Based Classification for BigNon-Structured and Semi-Structured Recipe Data[C]//Database Systems for Advanced Applications.2016:57-64.
[15] BRAY T.The JavaScript Object Notation (JSON) Data Interchange Format[J].RFC,2014,8259:1-16.
[16] Nigikokun.Generate-schema[OL].https://github.com/nijiko-kun/generate-schema.
[17] Julian.Jsonschema[OL].https://github.com/Julian/jsonschema.
[18] LI S,ZHAO Z,HU R F,et al.Analogical Reasoning on Chinese Morphological and Semantic Relations[C]//ACL.2018:138-143.
[19] Fzumstein.Jsondiff:Diff JSON and JSON-like structures in Python[OL].https://github.com/fzumstein/jsondiff.
[20] Rugleb.JsonCompare:compare two objects with a JSON-likestructure and data types[OL].https://github.com/rugleb/JsonCompare.
[1] LIU Yi, MAO Ying-chi, CHENG Yang-kun, GAO Jian, WANG Long-bao. Locality and Consistency Based Sequential Ensemble Method for Outlier Detection [J]. Computer Science, 2022, 49(1): 146-152.
[2] ZHONG Ying-yu, CHEN Song-can. High-order Multi-view Outlier Detection [J]. Computer Science, 2020, 47(9): 99-104.
[3] LIU Zhen-peng, SU Nan, QIN Yi-wen, LU Jia-huan, LI Xiao-fei. FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest [J]. Computer Science, 2020, 47(8): 185-188.
[4] LI Chang-jing,ZHAO Shu-liang,CHI Yun-xian. Outlier Detection Algorithm Based on Spectral Embedding and Local Density [J]. Computer Science, 2019, 46(3): 260-266.
[5] FENG Gui-lan, ZHOU Wen-gang. Spark-based Parallel Outlier Detection Algorithm of K-nearest Neighbor [J]. Computer Science, 2018, 45(11A): 349-352.
[6] YING Yi, REN Kai, LIU Ya-jun. Network Log Analysis Technology Based on Big Data [J]. Computer Science, 2018, 45(11A): 353-355.
[7] XU Dong, WANG Yan-jun, MENG Yu-long, ZHANG Zi-ying. Improved Data Anomaly Detection Method Based on Isolation Forest [J]. Computer Science, 2018, 45(10): 155-159.
[8] GOU Jie, MA Zi-tang and ZHANG Zhe-cheng. PODKNN:A Parallel Outlier Detection Algorithm for Large Dataset [J]. Computer Science, 2016, 43(7): 251-254.
[9] HONG Sha, LIN Jia-li and ZHANG Yue-liang. Density-based Outlier Detection on Uncertain Data [J]. Computer Science, 2015, 42(5): 230-233.
[10] JIANG Yuan-kai, ZHENG Hong-yuan and DING Qiu-lin. On Density Based Outlier Detection for Uncertain Data [J]. Computer Science, 2015, 42(4): 172-176.
[11] ZHANG Xian-ji and WANG Lun-wen. Outlier Detection Method Based on Constructive Neural Networks [J]. Computer Science, 2014, 41(7): 297-300.
[12] GUO Xiao-fang,LI Feng,SONG Xiao-ning and LIU Qing-hua. Outlier Detection of Multivariate Time Series Based on Weighted Euclid Norm [J]. Computer Science, 2014, 41(5): 263-265.
[13] ZHU Qing-sheng,TANG Hui and FENG Ji. Outlier Detection Algorithm Based on Natural Nearest Neighbor [J]. Computer Science, 2014, 41(3): 276-278.
[14] LE De-guang,JIANG Nan,ZHENG Li-xin and LI Xiao-chao. Image Region Cloning Authentication Algorithm Based on Local Invariant Feature and Outlier Detection [J]. Computer Science, 2014, 41(12): 118-124.
[15] WANG Jing-hua,ZHAO Xin-xiang,ZHANG Guo-yan and LIU Jian-yin. NLOF:A New Density-based Local Outlier Detecting Algorithm [J]. Computer Science, 2013, 40(8): 181-185.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!