计算机科学 ›› 2022, Vol. 49 ›› Issue (11): 185-196.doi: 10.11896/jsjkx.211100063

• 人工智能 • 上一篇    下一篇

一种专利知识图谱的构建方法

邓亮1,2,3, 曹存根4   

  1. 1 中国科学院大学计算机科学与技术学院 北京 100049
    2 中国科学院沈阳计算技术研究所 沈阳 110168
    3 国家知识产权专利局 北京 100083
    4 中国科学院计算技术研究所 北京 100190
  • 收稿日期:2021-11-05 修回日期:2022-03-11 出版日期:2022-11-15 发布日期:2022-11-03
  • 通讯作者: 曹存根(cgcao@ict.ac.cn)
  • 作者简介:(dengliang@cnipa.gov.cn)

Methods of Patent Knowledge Graph Construction

DENG Liang1,2,3, CAO Cun-gen4   

  1. 1 School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
    2 Shenyang Institute of Computing Technology,Chinese Academy of Sciences,Shenyang 110168,China
    3 Patent Office,China National Intellectual Property Administration,Beijing 100083,China
    4 Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
  • Received:2021-11-05 Revised:2022-03-11 Online:2022-11-15 Published:2022-11-03
  • About author:DENG Liang,born in 1980,postgra-duate.His main research interests include deep learning and knowledge graph.
    CAO Cun-gen,born in 1964,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include large-scale knowledge process and so on.

摘要: 专利知识图谱对专利精准检索、专利深度分析和专利知识培训等应用起到了重要作用。文中提出了一种实用的基于种子知识图谱、文本挖掘以及关系补全的专利知识图谱构建方法。在该方法中,为确保质量,首先人工建立一个种子专利知识图谱,然后采用专利文本模式的概念和关系抽取方法扩展种子专利知识图谱,最后对扩展的专利知识图谱进行定量评估。文中针对中医药领域专利进行了种子知识的人工提取和词法句法模式的人工总结,并使用机器学习的方法在学习到新的词法句法模式后对种子专利知识图谱进行扩展和图谱补全。实验结果表明,中医药领域专利种子知识图谱中的节点数和关系数分别为19 453个和194 775条,经过扩展后,它们分别达到了558 461个和7 275 958条,即分别增加了27.7倍和36.3倍。

关键词: 专利文本, 专利知识图谱, 词法句法分析, 表示学习

Abstract: Patent knowledge graph plays a important role in patent accurate retrieval,patent in-depth analysis and patent know-ledge training.This paper proposes a practical patent knowledge graph construction method based on seed knowledge graph,text mining and relationship completion.In this method,to ensure the quality,a seed patent knowledge graph is first established ma-nually,then the concept and relation extraction method of patent text pattern is used to expand the seed patent knowledge graph,and finally the extended patent knowledge graph is quantitatively evaluated.In this paper,artificial extraction of seed knowledge and manual summarization of lexical and syntactic patterns are carried out for patents in the field of traditional Chinese medicine.After obtaining new lexical and syntactic patterns by machine learning,the knowledge graph of seed patent is expanded and completed.Experimental results show that the number of nodes and relationships in the knowledge graph of traditional Chinese medicine are 19 453 and 194 775 respectively.After expansion,they reach 558 461 and 7 275 958 respectively,representing an increase of 27.7 and 36.3 folds respectively.

Key words: Patent text, Patent knowledge graph, Lexical and syntactic analysis, Representation learning

中图分类号: 

  • TP391
[1]WIPO.World Intellectual Property Indicators 2021[R].Geneva:WIPO,2021.
[2]XU C L.Research method and application of technology development based on patent knowledge graph[D].Guangzhou:South China University of Technology,2017.
[3]XU J.Research on anti-liver cancer drug development trend in China based on knowledge graph and patent map[J].Medical Information,2018,31(21):19-23.
[4]SUN D.Research on patent measurement and knowledge graph in cloud computing field[J].Sci-Tech Information Development & Economy,2018,3(6):35-41.
[5]ZHANG Y,PAN H Q,LIN H G.Research on patent information of radix pseudostellariae based on scientific konwledge graph[J].Journal of Anhui Agricultural Sciences,2019,47(6):234-239.
[6]GAO S Y.Knowledge graph of Mongolian medicine patent in China:Citespace based metrological analysis[J].Inner Mongolia Science technology&Economy,2020(4):96-101.
[7]ZHANG P L.Design and implementation of patent recommendation system based on knowledge graph[D].Jinan:Shandong University,2019.
[8]SERHAD S,LUO J X,KRISTIN L.Technology Knowledge Graph Based on Patent Data[J].arXiv:1906.00411,2019.
[9]JI S X,PAN S R,ERIK C,et al.A Survey on Knowledge Graphs:Representation,Acquisition and Applications[J].ar-Xiv:2002.00388,2020.
[10]ZHANG N Y,DENG S M,SUN Z L,et al.Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:3016-3025.
[11]NAYYERI M,CIL G M,VAHDATI S,et al.Link Prediction of Weighted Triples for Knowledge Graph Completion Within the Scholarly Domain[J].IEEE Access,2021,8:79521-79540.
[12]YAN C,SU Q,WANG J.MoGCN:Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts[J].IEEE Access,2020,8:181629-181639.
[13]YAN Z,PENG R,WANG Y,et al.CTEA:Context and Topic Enhanced Entity Alignment for Knowledge Graphs[J].Neurocomputing,2020,410(3):155-165.
[14]CHRISTINA L,THOMAS L,PATRICIA S,et al.Is buttercup a kind of cup? Hyponymy and semantic transparency in compound words[J].Journal of Memory,2020,113:104110.
[15]CHEN S D,OUYANG X Y.A review of named entity recognition technology[J].Radio Communications Technology,2020,46(3):251-260.
[16]YU T,CUI M,LI H Y,et al.Application research of ISO technical specification "Semantic Network of Chinese Medicine Language System"[J].China Medical Herald,2016,13(4):89-92.
[17]JI G L,HE S J,XU L H,et al.Knowledge Graph Embedding via Dynamic Mapping Matrix[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.Association for Computational Linguistics,2015:687-696.
[1] 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲.
基于无监督集群级的科技论文异质图节点表示学习方法
Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level
计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[2] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[3] 黄璞, 杜旭然, 沈阳阳, 杨章静.
基于局部正则二次线性重构表示的人脸识别
Face Recognition Based on Locality Regularized Double Linear Reconstruction Representation
计算机科学, 2022, 49(6A): 407-411. https://doi.org/10.11896/jsjkx.210700018
[4] 富坤, 郭云朋, 禚佳明, 李佳宁, 刘琪.
语义增强的完全不平衡标签网络表示学习算法
Semantic Information Enhanced Network Embedding with Completely Imbalanced Labels
计算机科学, 2022, 49(11): 109-116. https://doi.org/10.11896/jsjkx.210900101
[5] 蒋宗礼, 樊珂, 张津丽.
基于生成对抗网络和元路径的异质网络表示学习
Generative Adversarial Network and Meta-path Based Heterogeneous Network Representation Learning
计算机科学, 2022, 49(1): 133-139. https://doi.org/10.11896/jsjkx.201000179
[6] 王营丽, 姜聪聪, 冯小年, 钱铁云.
时间感知的兴趣点推荐方法
Time Aware Point-of-interest Recommendation
计算机科学, 2021, 48(9): 43-49. https://doi.org/10.11896/jsjkx.210400130
[7] 赵金龙, 赵中英.
基于异质信息网络表示学习与注意力神经网络的推荐算法
Recommendation Algorithm Based on Heterogeneous Information Network Embedding and Attention Neural Network
计算机科学, 2021, 48(8): 72-79. https://doi.org/10.11896/jsjkx.200800226
[8] 杨如涵, 戴毅茹, 王坚, 董津.
基于表示学习的工业领域人机物本体融合
Humans-Cyber-Physical Ontology Fusion of Industry Based on Representation Learning
计算机科学, 2021, 48(5): 190-196. https://doi.org/10.11896/jsjkx.200500023
[9] 钱胜胜, 张天柱, 徐常胜.
多媒体社会事件分析综述
Survey of Multimedia Social Events Analysis
计算机科学, 2021, 48(3): 97-112. https://doi.org/10.11896/jsjkx.210200023
[10] 李鑫超, 李培峰, 朱巧明.
一种基于层级信息优化的有向网络表示学习方法
Directed Network Representation Method Based on Hierarchical Structure Information
计算机科学, 2021, 48(2): 100-104. https://doi.org/10.11896/jsjkx.191200033
[11] 王雪岑, 张昱, 刘迎婕, 于戈.
基于表示学习的在线学习交互质量评价方法
Evaluation of Quality of Interaction in Online Learning Based on Representation Learning
计算机科学, 2021, 48(2): 207-211. https://doi.org/10.11896/jsjkx.201000042
[12] 富坤, 赵晓梦, 付紫桐, 高金辉, 马浩然.
基于不完全信息的深度网络表示学习方法
Deep Network Representation Learning Method on Incomplete Information Networks
计算机科学, 2021, 48(12): 212-218. https://doi.org/10.11896/jsjkx.201000015
[13] 潘雨, 邹军华, 王帅辉, 胡谷雨, 潘志松.
基于网络表示学习的深度社团发现方法
Deep Community Detection Algorithm Based on Network Representation Learning
计算机科学, 2021, 48(11A): 198-203. https://doi.org/10.11896/jsjkx.210200113
[14] 赵曼, 赵加坤, 刘金诺.
基于自我中心网络结构特征和网络表示学习的链路预测算法
Link Prediction Algorithm Based on Ego Networks Structure and Network Representation Learning
计算机科学, 2021, 48(11A): 211-217. https://doi.org/10.11896/jsjkx.201200231
[15] 纪南巡, 孙晓燕, 李祯其.
多源异构用户生成内容的融合向量化表示学习
Fusion Vectorized Representation Learning of Multi-source Heterogeneous User-generated Contents
计算机科学, 2021, 48(10): 51-58. https://doi.org/10.11896/jsjkx.200900194
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!