Computer Science ›› 2024, Vol. 51 ›› Issue (9): 233-241.doi: 10.11896/jsjkx.230900159

• Artificial Intelligence • Previous Articles     Next Articles

CFGT:A Lexicon-based Chinese Address Element Parsing Model

HUANG Wei, SHEN Yaodi, CHEN Songling, FU Xiangling   

  1. School of Computer Science(National Pilot Software Engineering School), Beijing University of Posts, Telecommunications, Beijing 100876, China
    Key Laboratory of Trustworthy Distributed Computing and Service(BUPT),Ministry of Education,Beijing 100876,China
  • Received:2023-09-28 Revised:2024-03-14 Online:2024-09-15 Published:2024-09-10
  • About author:HUANG Wei,born in 1998,postgra-duate.His main research interests include data mining and anomaly detection.
    FU Xiangling,born in 1975,Ph.D,professor,Ph.D supervisor.Her main research interests include natural language processing,smart finance and smart healthcare.
  • Supported by:
    National Natural Science Foundation of China(72274022).

Abstract: As a key step in the geocoding process,address element parsing directly affects the accuracy of geocoding.Due to the diversity and complexity of Chinese address expressions,two similar address texts may be completely different in geographical representation.Traditional address element parsing based on dictionary matching cannot handle ambiguous words well,thus showing poor recognition accuracy.A lexicon-based Chinese address element parsing model CFGT:collaborative flat-graph transformer is proposed,which uses self-matched words,nearest contextual and other lexical information to enhance the character sequence representation of address text,effectively curbing the ambiguity of address text expression.Specifically,the model first constructs two collaboration graphs,flat-lattice and flat-shift,to capture the knowledge of self-matched words and nearest contextual words for address characters,and designs a fusion layer to implement collaboration between graphs.Secondly,with the help of the improved relative position encoding,the enhancing effect of word information on the address text character sequence is further strengthened.Finally,Transformer and conditional random fields are used to analyze address elements.Experiments are conducted on multiple public datasets such as Weibo and Resume,as well as the private dataset Address.Experimental results show that the performance of the CFGT is superior to previous Chinese address element parsing models and existing models in the field of Chinese named entity recognition.

Key words: Chinese address recognition, Lexicon enhancement, External information, Named entity recognition

CLC Number: 

  • TP391
[1]GOLDBERG D W,WILSON J P,KNOBLOCK C A.From textto geographic coordinates:the current state of geocoding[J].URISA Journal,2007,19(1):33-46.
[2]GOLDBERG D W.Advances in geocoding research and practice[J].Transactions in GIS,2011,15(6):727-733.
[3]KARIMI H A,SHARKER M H,ROONGPIBOONSOPIT D.Geocoding recommender:an algorithm to recommend optimal online geocoding services for applications[J].Transactions in GIS,2011,15(6):869-886.
[4]DHAR S,VARSHNEY U.Challenges and business models for mobile location-based services and advertising[J].Communications of the ACM,2011,54(5):121-128.
[5]CONG G,JENSEN C S.Querying geo-textual data:Spatial keyword queries and beyond[C]//Proceedings of the 2016 International Conference on Management of Data.New York:Association for Computing Machinery,2016:2207-2212.
[6]LI P,LUO A,LIU J,et al.Bidirectional gated recurrent unit neural network for chinese address element segmentation[J].ISPRS International Journal of Geo-Information,2020,9(11):635.
[7]MELO F,MARTINS B.Automated geocoding of textual docu-ments:A survey of current approaches[J].Transactions in GIS,2017,21(1):3-38.
[8]KUAI X,GUO R,ZHANG Z,et al.Spatial context-based localtoponym extraction and chinese textual address segmentation from urban poi data[J].ISPRS International Journal of Geo-Information,2020,9(3):147.
[9]LI X,ZHANG Y,LI L.A Chinese address recognition methodbased on address semantics[J].Computer Engineering & Science,2019,41(3):171-178.
[10]LIN Y,KANG M,HE B.Spatial pattern analysis of addressquality:A study on the impact of rapid urban expansion in china[J].Environment and Planning B:Urban Analytics and City Science,2021,48(4):724-740.
[11]ZHANG X,LV G,LI B,et al.Rule-based Approach to Semantic Resolution of Chinese Address[J].Journal of Geo-information Science,2010(1):9-16.
[12]ZHAO Y,WANG L,QIU A.An improved algorithm for address segmentation[J].Science of Surveying and Mapping,2013,38(5):74-76.
[13]DUAN Y,LI X,HUANG S.Extraction of administrative division of Chinese address based on conditional random fields[J].Journal of Wuhan Institute of Technology,2015(11):47-51.
[14]WANG Y,ZHOU S,XING C.The address spatiotemporal data engine building method based on HMM[J].Science of Surveying and Mapping,2020,45(10):7.
[15]CHENG B,LI W,TONG H.Chinese Address Segmentationbased on BiLSTM-CRF[J].Journal of Geo-information Science,2019,21(8):1143-1151.
[16]LI P,LUO A,LIU J,et al.Bidirectional gated recurrent unit neural network for chinese address element segmentation[J].International Journal of Geo-Information,2020,9(11):635.
[17]LIU X,PENG T.Research on Chinese Scenic Spot Named Entity Recognition Based on Convolutional Neural Network[J].Computer Engineering & Science,2020,56(4):145-150.
[18]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT.Stroudsburg:Assoc Computational Linguistics-ACL,2019:4171-4186.
[19]ZHANG H,REN F,LI H,et al.Recognition method of new address elements in chinese address matching based on deep lear-ning[J].ISPRS International Journal of Geo-Information,2020,9(12):745.
[20]SUN S,TANG K.Chinese address segment method based onBERT[J].Electronic Design Engineering,2021,29(9):155-159.
[21]ZHANG Y,YANG J.Chinese ner using lattice lstm[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2018:1554-1564.
[22]LI X,YAN H,QIU X,et al.FLAT:Chinese NER Using Flat-Lattice Transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2020:6836-6842.
[23]HEWITT J,MANNING C D.A structural probe for findingsyntax in word representations[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg:Assoc Computational Linguistics,2019:4129-4138.
[24]DING R,XIE P,ZHANG X,et al.A neural multi-digraph model for chinese ner with gazetteers[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2019:1462-1467.
[25]SUI D,CHEN Y,LIU K,et al.Leverage lexical knowledge for chinese named entity recognition via collaborative graph network[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Stroudsburg:Assoc Computational Linguistics,2019:3830- 3840.
[26]LIU W,XU T,XU Q,et al.An encoding strategy based word-character lstm for chinese ner[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg:Assoc Computational Linguistics,2019:2379-2389.
[27]DAI Z,YANG Z,YANG Y,et al.Transformer-xl:Attentivelanguage models beyond a fixed-length context[C]//Procee-dings of the 57th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2019:2978-2988.
[28]HU Y,VERBERNE S.Named entity recognition for Chinese biomedical patents[C]//Proceedings of the 28th International Conference on Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2020:627-637.
[29]MA R,PENG M,ZHANG Q,et al.Simplify the usage of lexicon in chinese ner [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2020:5951-5960.
[30]LIU W,FU X,ZHANG Y,et al.Lexicon enhanced chinese sequence labelling using bert adapter[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg:Assoc Computational Linguistics,2021:5847-5858.
[31]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need [C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.California:Neural Information Processing Systems(NIPS),2017:6000-6010.
[32]PENG N,DREDZE M.Named entity recognition for chinese social media with jointly trained embeddings[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Stroudsburg:Assoc Computational Linguistics,2015:548-554.
[33]HE H,SUN X.F-score driven max margin neural network fornamed entity recognition in chinese social media[C]//Procee-dings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2017:713-718.
[34]LEVOW G A.The third international Chinese language processing bakeoff:Word segmentation and named entity recognition[C]//Proceedings of the Fifth SIGHAN workshop on Chinese language processing.Stroudsburg:Assoc Computational Linguistics,2006:108-117.
[35]WEISCHEDEL R,PARADHAN S,RAMSHAW L,et al.On-tonotes release 4.0[DB/OL].http://catalog.ldc.upenn.edu.LDC2011T03.
[1] GUO Zhiqiang, GUAN Donghai, YUAN Weiwei. Word-Character Model with Low Lexical Information Loss for Chinese NER [J]. Computer Science, 2024, 51(8): 272-280.
[2] YIN Baosheng, ZHOU Peng. Chinese Medical Named Entity Recognition with Label Knowledge [J]. Computer Science, 2024, 51(6A): 230500203-7.
[3] LAI Xin, LI Sining, LIANG Changsheng, ZHANG Hengyan. Ontology-driven Study on Information Structuring of Aeronautical Information Tables [J]. Computer Science, 2024, 51(6A): 230800150-7.
[4] YU Bihui, TAN Shuyue, WEI Jingxuan, SUN Linzhuang, BU Liping, ZHAO Yiman. Vision-enhanced Multimodal Named Entity Recognition Based on Contrastive Learning [J]. Computer Science, 2024, 51(6): 198-205.
[5] LIAO Meng, JIA Zhen, LI Tianrui. Chinese Named Entity Recognition Based on Label Information Fusion and Multi-task Learning [J]. Computer Science, 2024, 51(3): 198-204.
[6] LUO Yuanyuan, YANG Chunming, LI Bo, ZHANG Hui, ZHAO Xujian. Chinese Medical Named Entity Recognition Method Incorporating Machine ReadingComprehension [J]. Computer Science, 2023, 50(9): 287-294.
[7] GAO Xiang, WANG Shi, ZHU Junwu, LIANG Mingxuan, LI Yang, JIAO Zhixiang. Overview of Named Entity Recognition Tasks [J]. Computer Science, 2023, 50(6A): 220200119-8.
[8] GAO Xiang, TANG Jiqiang, ZHU Junwu, LIANG Mingxuan, LI Yang. Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement [J]. Computer Science, 2023, 50(6A): 220700153-6.
[9] HUANG Jiange, JIA Zhen, ZHANG Fan, LI Tianrui. Chinese Medical Named Entity Recognition Based on Multi-feature Embedding [J]. Computer Science, 2023, 50(6): 243-250.
[10] LIU Pan, GUO Yanming, LEI Jun, LAO Mingrui, LI Guohui. Study on Chinese Named Entity Extraction Rules Based on Boundary Location and Correction [J]. Computer Science, 2023, 50(3): 276-281.
[11] QIAN Taiyu, CHEN Yifei, PANG Bowen. Audit Text Named Entity Recognition Based on MacBERT and Adversarial Training [J]. Computer Science, 2023, 50(11A): 230200083-6.
[12] DING Hongxin, ZOU Peinie, ZHAO Junfeng, WANG Yasha. Active Learning-based Text Entity and Relation Joint Extraction Method [J]. Computer Science, 2023, 50(10): 126-134.
[13] ZHANG Rujia, DAI Lu, GUO Peng, WANG Bang. Chinese Nested Named Entity Recognition Algorithm Based on Segmentation Attention andBoundary-aware [J]. Computer Science, 2023, 50(1): 213-220.
[14] DU Xiao-ming, YUAN Qing-bo, YANG Fan, YAO Yi, JIANG Xiang. Construction of Named Entity Recognition Corpus in Field of Military Command and Control Support [J]. Computer Science, 2022, 49(6A): 133-139.
[15] WEI Ru-ming, CHEN Ruo-yu, LI Han, LIU Xu-hong. Analysis of Technology Trends Based on Deep Learning and Text Measurement [J]. Computer Science, 2022, 49(11A): 211100119-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!