计算机科学 ›› 2024, Vol. 51 ›› Issue (9): 233-241.doi: 10.11896/jsjkx.230900159
黄威, 沈耀迪, 陈松龄, 傅湘玲
HUANG Wei, SHEN Yaodi, CHEN Songling, FU Xiangling
摘要: 地址要素解析作为地理编码过程中的关键环节,直接影响到地理编码的准确性。由于中文地址表达的多样性和复杂性,两段相似的地址文本在地理表示上却可能完全不同。传统的通过词典匹配进行地址要素解析的方法无法较好地应对歧义词,从而导致识别准确率欠佳。文中提出一种基于词典的中文地址要素解析模型( Collaborative Flat-Graph Transformer,CFGT),利用自匹配词、最近上下文等词汇信息增强地址文本字符序列表示,有效遏制了地址文本表达的歧义性。具体地,模型首先构建Flat-Lattice和Flat-Shift两种协作图,为地址字符捕获自匹配词和最近上下文词汇的知识,并设计融合层实现图之间的协作;其次,通过改进的相对位置编码,进一步强化词信息对地址文本字符序列的增强效果;最后,利用Transformer和条件随机场进行地址要素解析。在Weibo和Resume等多个公开数据集及Address私有数据集上开展的实验表明,CFGT模型的性能优于已有的中文地址要素解析模型和中文命名实体识别模型。
中图分类号:
[1]GOLDBERG D W,WILSON J P,KNOBLOCK C A.From textto geographic coordinates:the current state of geocoding[J].URISA Journal,2007,19(1):33-46. [2]GOLDBERG D W.Advances in geocoding research and practice[J].Transactions in GIS,2011,15(6):727-733. [3]KARIMI H A,SHARKER M H,ROONGPIBOONSOPIT D.Geocoding recommender:an algorithm to recommend optimal online geocoding services for applications[J].Transactions in GIS,2011,15(6):869-886. [4]DHAR S,VARSHNEY U.Challenges and business models for mobile location-based services and advertising[J].Communications of the ACM,2011,54(5):121-128. [5]CONG G,JENSEN C S.Querying geo-textual data:Spatial keyword queries and beyond[C]//Proceedings of the 2016 International Conference on Management of Data.New York:Association for Computing Machinery,2016:2207-2212. [6]LI P,LUO A,LIU J,et al.Bidirectional gated recurrent unit neural network for chinese address element segmentation[J].ISPRS International Journal of Geo-Information,2020,9(11):635. [7]MELO F,MARTINS B.Automated geocoding of textual docu-ments:A survey of current approaches[J].Transactions in GIS,2017,21(1):3-38. [8]KUAI X,GUO R,ZHANG Z,et al.Spatial context-based localtoponym extraction and chinese textual address segmentation from urban poi data[J].ISPRS International Journal of Geo-Information,2020,9(3):147. [9]LI X,ZHANG Y,LI L.A Chinese address recognition methodbased on address semantics[J].Computer Engineering & Science,2019,41(3):171-178. [10]LIN Y,KANG M,HE B.Spatial pattern analysis of addressquality:A study on the impact of rapid urban expansion in china[J].Environment and Planning B:Urban Analytics and City Science,2021,48(4):724-740. [11]ZHANG X,LV G,LI B,et al.Rule-based Approach to Semantic Resolution of Chinese Address[J].Journal of Geo-information Science,2010(1):9-16. [12]ZHAO Y,WANG L,QIU A.An improved algorithm for address segmentation[J].Science of Surveying and Mapping,2013,38(5):74-76. [13]DUAN Y,LI X,HUANG S.Extraction of administrative division of Chinese address based on conditional random fields[J].Journal of Wuhan Institute of Technology,2015(11):47-51. [14]WANG Y,ZHOU S,XING C.The address spatiotemporal data engine building method based on HMM[J].Science of Surveying and Mapping,2020,45(10):7. [15]CHENG B,LI W,TONG H.Chinese Address Segmentationbased on BiLSTM-CRF[J].Journal of Geo-information Science,2019,21(8):1143-1151. [16]LI P,LUO A,LIU J,et al.Bidirectional gated recurrent unit neural network for chinese address element segmentation[J].International Journal of Geo-Information,2020,9(11):635. [17]LIU X,PENG T.Research on Chinese Scenic Spot Named Entity Recognition Based on Convolutional Neural Network[J].Computer Engineering & Science,2020,56(4):145-150. [18]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT.Stroudsburg:Assoc Computational Linguistics-ACL,2019:4171-4186. [19]ZHANG H,REN F,LI H,et al.Recognition method of new address elements in chinese address matching based on deep lear-ning[J].ISPRS International Journal of Geo-Information,2020,9(12):745. [20]SUN S,TANG K.Chinese address segment method based onBERT[J].Electronic Design Engineering,2021,29(9):155-159. [21]ZHANG Y,YANG J.Chinese ner using lattice lstm[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2018:1554-1564. [22]LI X,YAN H,QIU X,et al.FLAT:Chinese NER Using Flat-Lattice Transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2020:6836-6842. [23]HEWITT J,MANNING C D.A structural probe for findingsyntax in word representations[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg:Assoc Computational Linguistics,2019:4129-4138. [24]DING R,XIE P,ZHANG X,et al.A neural multi-digraph model for chinese ner with gazetteers[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2019:1462-1467. [25]SUI D,CHEN Y,LIU K,et al.Leverage lexical knowledge for chinese named entity recognition via collaborative graph network[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Stroudsburg:Assoc Computational Linguistics,2019:3830- 3840. [26]LIU W,XU T,XU Q,et al.An encoding strategy based word-character lstm for chinese ner[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg:Assoc Computational Linguistics,2019:2379-2389. [27]DAI Z,YANG Z,YANG Y,et al.Transformer-xl:Attentivelanguage models beyond a fixed-length context[C]//Procee-dings of the 57th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2019:2978-2988. [28]HU Y,VERBERNE S.Named entity recognition for Chinese biomedical patents[C]//Proceedings of the 28th International Conference on Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2020:627-637. [29]MA R,PENG M,ZHANG Q,et al.Simplify the usage of lexicon in chinese ner [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2020:5951-5960. [30]LIU W,FU X,ZHANG Y,et al.Lexicon enhanced chinese sequence labelling using bert adapter[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg:Assoc Computational Linguistics,2021:5847-5858. [31]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need [C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.California:Neural Information Processing Systems(NIPS),2017:6000-6010. [32]PENG N,DREDZE M.Named entity recognition for chinese social media with jointly trained embeddings[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Stroudsburg:Assoc Computational Linguistics,2015:548-554. [33]HE H,SUN X.F-score driven max margin neural network fornamed entity recognition in chinese social media[C]//Procee-dings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg:Assoc Computational Linguistics,2017:713-718. [34]LEVOW G A.The third international Chinese language processing bakeoff:Word segmentation and named entity recognition[C]//Proceedings of the Fifth SIGHAN workshop on Chinese language processing.Stroudsburg:Assoc Computational Linguistics,2006:108-117. [35]WEISCHEDEL R,PARADHAN S,RAMSHAW L,et al.On-tonotes release 4.0[DB/OL].http://catalog.ldc.upenn.edu.LDC2011T03. |
|