基于边界定位与纠偏的中文命名实体提取规则研究

doi:10.11896/jsjkx.220200020

Computer Science ›› 2023, Vol. 50 ›› Issue (3): 276-281.doi: 10.11896/jsjkx.220200020

• Artificial Intelligence • Previous Articles Next Articles

Study on Chinese Named Entity Extraction Rules Based on Boundary Location and Correction

LIU Pan¹, GUO Yanming¹, LEI Jun¹, LAO Mingrui², LI Guohui¹

1 College of Systems Engineering,National University of Defense Technology,Changsha 410000,China
2 LIACS Media Lab,Leiden University,Leiden 2333CA,The Netherlands

Received:2022-02-01 Revised:2022-05-13 Online:2023-03-15 Published:2023-03-15
About author:LIU Pan,born in 1990,postgraduate.His main research interests include na-tural language processing,computer vision and deep learning.
GUO Yanming,born in 1989,Ph.D,associate professor.His main research interests include computer vision,natural language processing and deep learning.
Supported by:
National Natural Science Foundation of China(61806218,71673293) and Natural Science Foundation of Hunan Province,China(2019JJ50722).

Abstract

Abstract: Compared with English text which is naturally composed of words,Chinese text has no word delimiters,so the combination of Chinese characters is more flexible,and it's more difficult to determine the entity boundaries in Chinese named entity recognition(NER).Current mainstream methods transform the NER task into a sequence labeling task.This paper studies the predicted label sequence under the BIOES tag scheme and calculates the entity boundary accuracy by separately considering the entity head label B or tail label E,which shows that increasing the boundary accuracy can further improve the accuracy of entity recognition.We expand the boundaries of entities with continuous labels,use the label type of the last character of the entity to correct the entity type,and use the word segmentation information to fill in the entity with incomplete labels.Finally,this paper proposes a BIO⁺ES labeling scheme that adds boundary labels to distinguish non-entity characters at entity boundaries and further improves the performance of Chinese NER.

Key words: Chinese named entity recognition, Tag scheme, Entity extraction

CLC Number:

TP391

LIU Pan, GUO Yanming, LEI Jun, LAO Mingrui, LI Guohui. Study on Chinese Named Entity Extraction Rules Based on Boundary Location and Correction[J].Computer Science, 2023, 50(3): 276-281.

References

[1]PENG N,DREDZE M.Named entity recognition for chinese social media with jointly trained embeddings[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:548-554.
[2]UCHIMOTO K,MA Q,MURATA M,et al.Named entity ex-traction based on a maximum entropy model and transformation rules[C]//Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics.2000:326-335.
[3]RAMSHAW L A,MARCUS M P.Text chunking using transformation-based learning[M]//Natural Language Processing Using Very Large Corpora.Springer,Dordrecht,1999:157-176.
[4]RATNAPARKHI A.Maximum entropy models for natural lan-guage ambiguity resolution[D].Philadelphia:University of Pennsylvania,1998.
[5]VEENSTRA J,SANG E F T K.Representing Text Chunks[C]//Proceedings of the NinthConference of the European Chapter of the Association for Computational Linguistics(EACL’99).Association for Computational Linguistics,1999:173-179.
[6]RATINOV L,ROTH D.Design challenges and misconceptions in named entity recognition[C]//Proceedings of the Thirteenth Conference on Computational Natural Language Learning(CoNLL-2009).2009:147-155.
[7]TKACHENKO A,PETMANSON T,LAUR S.Named entityrecognition in estonian[C]//Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing.2013:78-83.
[8]MALIK M K,SARWAR S M.Named entity recognition system for postpositional languages:urdu as a case study[J].International Journal of Advanced Computer Science and Applications,2016,7(10):141-147.
[9]REIMERS N,GUREVYCH I.Optimal Hyperparameters forDeep LSTM-Networks for Sequence Labeling Tasks[J].arXiv:1707.06799,2017.
[10]YANG J,LIANG S,ZHANG Y.Design Challenges and Misconceptions in Neural Sequence Labeling[C]//Proceedings of the 27th International Conference on Computational Linguistics.2018:3879-3889.
[11]LIU P,GUO Y,WANG F,et al.Chinese named entity recognition:The state of the art[J].Neurocomputing,2022,473:37-53.
[12]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[13]SUN Y,WANG S,LI Y,et al.ERNIE:Enhanced Representation through Knowledge Integration[J].arXiv:1904.09223,2019.
[14]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[15]LAFFERTY J D,MCCALLUM A,PEREIRA F C N.Condi-tional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]//ICML.2001.
[16]SEHANOBISH A,SONG C H.Using Chinese Glyphs forNamed Entity Recognition[J].arXiv:1909.09922,2019.
[17]MENG Y,WU W,WANG F,et al.Glyce:glyph-vectors for chinese character representations[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:2746-2757.
[18]LI X,YAN H,QIU X,et al.FLAT:Chinese NER Using Flat-Lattice Transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:6836-6842.
[19]MA R,PENG M,ZHANG Q,et al.Simplify the Usage of Lexicon in Chinese NER[C]//Proceedings of the 58th Annual Mee-ting of the Association for Computational Linguistics.2020:5951-5960.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Study on Chinese Named Entity Extraction Rules Based on Boundary Location and Correction

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 2

Metrics

Comments

Recommended 0

[1]	XU Jin. Construction and Application of Knowledge Graph for Industrial Assembly [J]. Computer Science, 2021, 48(6A): 285-288.
[2]	YIN Liang, YUAN Fei, XIE Wen-bo, WANG Dong-zhi, SUN Chong-jing. Research Progress and Challenges on Association Graph [J]. Computer Science, 2018, 45(6A): 1-10.