Computer Science ›› 2023, Vol. 50 ›› Issue (7): 229-236.doi: 10.11896/jsjkx.220500068

• Artificial Intelligence • Previous Articles     Next Articles

Recognition Method of Component Names in Patent Documents Based on the Algorithm of Word Frequency Difference and Library of Left-segmentation Words

KONG Jiabin, LYU Jianwen, LIU Jiangnan, DU Wenxuan   

  1. State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body,Changsha 410082,China
  • Received:2022-05-07 Revised:2022-10-23 Online:2023-07-15 Published:2023-07-05
  • About author:KONG Jiabin,born in 1996,postgra-duate.His main research interests include mechanical equipment innovation design and patent knowledge mining.LIU Jiangnan,born in 1965,Ph.D,professor,master supervisor.Her main research interests include innovative design theory and methods,mechanical system optimization methods,patent avoidance and regeneration.
  • Supported by:
    Innovation methods work Special Projects of Science and Technology of China(2019IM050100) and Natural Science Foundation of Hunan Province,China(2018JJ2039).

Abstract: Mechanical patent literature contains a large amount of domain knowledge where component names exist as information units.Being flexible and changeable,the word formatting of component name represents the characteristics of uniqueness,complexity and lesser-known expressions.The challenge of accurate recognition of component names by computers becomes an obstacle to patent knowledge mining.In order to propose an efficient method to recognize component names,the features of word formation in patent text statements are analyzed and extracted.Starting with external words related to component names,characters on the left side of the appended drawing reference signs(ADRS) are identified.Accordingly,candidate names are automatically retrieved from texts,and the set of candidate names are constructed.An algorithm of word frequency difference is proposed to filter redundant characters in the set of candidate names.By building left-segmentation library(LSL) dynamically,redundant characters which are not filtered are further eliminated.Based on cross-over experiment,the influence of character frequency difference prior threshold(CFDV-Ⅰ),word frequency threshold(LSWF) and character frequency difference threshold(CFDV-Ⅱ) on recognition result is tested and analyzed.Furthermore,a three-stage comprehensive method for recognizing component names from patent documents in mechanical field is proposed.Finally,the method has been proved to be effective and efficient by comparing the results of experiments.

Key words: Patent text, Redundant characters, Appended drawing reference signs, Word frequency difference, Left-segmentation words

CLC Number: 

  • TH122
[1]HE M,GONG C C,ZHANG H P,et al.Method of New WordIdentification Based on Lager- scale Corpus[J].Computer Engineering and Applications,2007,43(21):157-159.
[2]ZHAO H,CAI D,HUANG C N,et al.Chinese Word Segmentation:Another Decade Review(2007-2017) [DB/OL].https://arxiv.org/ftp/arxiv/papers/1901/1901.06079.pdf.
[3]LIU L,WANG D B.A Review on Named Entity Recognition[J].Journal of the China Society for Scientific and Technical Information,2018,37(3):329-340.
[4]SUN Z,WANG H L.Overview on the Advance of the Research on Named Entity Recognition[J].Data Analysis and Knowledge Discovery,2010,193(6):42-47.
[5]CHEN Q Y,CHENG G,LI D,et al.Named Entity Recognition for Mechanical Design and Manufacturing Area[J].Computer Engineering and Applications,2017,53(20):100-104.
[6]VIKAS Y,STEVEN B.A Survey on Recent Advances in NamedEntity Recognition from Deep Learning models [C]//Procee-dings of the 27th International Conference on Computational Linguistics.2018:2145-2158.
[7]PAN Z G.Research on the Recognition of Chinese Named EntityBased on Rulesand Statistics[J].Information Science,2012,30(5):708-712,786.
[8]MAO X L,LI F F,WANG H T,et al.Named Entity Recognition of Electronic Medical Record Based on Improved HMM Algorithm[C]//2017 International Conference on Computer Technology,Electronics and Communication(ICCTEC).IEEE,2017:435-438.
[9]JU Z F,WANG J,ZHU F.Named Entity Recognition from Biomedical Text Using SVM[C]//2011 5th International Confe-rence on Bioinformatics and Biomedical Engineering.IEEE,2011:1-4.
[10]SUN A,YU Y X,LUO Y G,et al.Research on Feature Extraction Scheme of Chinese-character Granularity in Sequence Labeling Model--A Case Study About Clinical Named Entity Recognition of CCKS2017:Task2[J].Library and Information Ser-vice,2018,62(11):103-111.
[11]DONG C H,WU H J,ZHANG J J,et al.Multichannel LSTM-CRF for Named Entity Recognition in Chinese Social Media[C]//China National Conference on Chinese Computational Linguistics International Symposium on Natural Language Proces-sing Based on Naturally Annotated Big Data.2017:197-208.
[12]LI Y,MA L,SHAO D G,et al.Chinese Named Entity Recognition for Social Media[J].Journal of Chinese Information Processing,2020,34(8):61-69.
[13]LI M Y,KONG F.Combined Self-Attention Mechanism forNamed Entity Recognition in Social Media[J].Journal of Tsinghua University(Science and Technology),2019,59(6):461-467.
[14]BATISTA-NAVARRO R,RAK R,ANANIADOU S.Optimizing Chemical Named Entity Recognition with Pre-processing Analytics,Knowledge-Rich Features and Heuristics[J].Journal of Cheminformatics,2015,7(Suppl 1):S6.
[15]YANG P,YANG Z H,LUO,et al.An Attention-Based Ap-proach for Chemical Compound and Drug Named Entity Recognition[J].Journal of Computer Research and Development,2018,55(7):1548-1556.
[16]LI X,WEI X H,JIA L,et al.Recognition of Crops,Diseases and Pesticides Named Entities in Chinese Based on Conditional Random Fields[J].Transactions of the Chinese Society for Agricultural Machinery,2017,48(S1):178-185.
[17]FENG Y T,ZHANG H J,HAO W N.Named Entity Recognition for Military Text[J].Computer Science,2015,42(7):15-18,47.
[18]SHAN Y D,WANG H J,WANG N.Military Domain Named Entity Recognition Based on Multi-label[J].Computer Science,2019,46(S2):9-12.
[19]WANG Z X,QIU Q Y,FENG P E,et al.Information Extraction Method of Technical Solution from Mechanical Product Patent[J].Journal of Mechanical Engineering,2009,45(10):198-206.
[20]FANTONI G,APREDA R,DELL’ORLETTA F,et al.Automatic Extraction of Function-Behaviour-State Information from Patents[J].Advanced Engineering Informatics,2013,27(3):317-334.
[21]ALEX J,HINRICH S,SOREN B.Unsupervised Training SetGeneration for Automatic Acquisition of Technical Terminology in Patents [C]//Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics:Dublin,Ireland,2014,Technical Papers.2014:290-300.
[22]CHEN L,XU S,ZHU L,et al.A deep Learning Based Method for Extracting Semantic Information from Patent Documents[J].Scientometrics,2020,125:289-312.
[23]LI S B,WU Y M,XU Y X,et al.A Bayesian Network BasedAdaptability Design of Product Structures for Function Evolution [J].Applied Sciences,2018,8(4):493-509.
[24]WANG M P,WANG H,DENG S H,et al.Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields[J].Data Analysis and Knowledge Discovery,2016,271(6):28-36.
[25]YU Y,ZHAO N X.Patent Term Extraction Based on GenericWords and Term Components[J].Journal of the China Society for Scientific and Technical Information,2018,37(7):742-752.
[26]CHEN M J,XIE Z P,CHEN X Q,et al.Novel Bidirectional Aggregation Degree Feature Extraction Method for Patent New Word Discovery[J].Journal of Computer Applications,2020,40(3):631-637.
[27]LI J,JING F Y,LIU J.Study on Patent Entity Extraction Based on Improved Bert Algorithms-A Case Study of Graphene[J].Journal of University of Electronic Science and Technology of China,2020,49(6):883-890.
[28]GEORGESCU T M,IANCU B,ZAMFIROIU A,et al.A Survey on Named Entity Recognition Solutions Applied for Cybersecurity-Related Text Processing[C]//Proceedings of Fifth International Congress on Information and Communication Technology,ICICT 2020,London,(Volume 2).2020:316-325.
[1] DENG Liang, CAO Cun-gen. Methods of Patent Knowledge Graph Construction [J]. Computer Science, 2022, 49(11): 185-196.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!