计算机科学 ›› 2023, Vol. 50 ›› Issue (3): 276-281.doi: 10.11896/jsjkx.220200020
刘盼1, 郭延明1, 雷军1, 老明瑞2, 李国辉1
LIU Pan1, GUO Yanming1, LEI Jun1, LAO Mingrui2, LI Guohui1
摘要: 相对于英文天然由单词组成而言,中文由于没有分词符,汉字之间的组词更灵活,在命名实体识别时,其边界更加难以确定。当前的主流方法将命名实体识别任务转化为序列标注任务,文中采用BIOES标注方案,针对预测的标签序列进行研究。通过单独比较实体头部标签B或尾部标签E,计算实体边界准确率,结果表明提高边界准确率能够进一步提升实体识别准确率;对具有连续标签的实体边界进行拓展和重定位,采用实体最后一个字符的类型标签对实体类型进行纠偏,利用分词信息对标签不完整的实体进行填充;最后,提出增加边界标记的BIO+ES标注方案,用于区分实体边界的非实体字符,以进一步提升中文命名实体识别的性能。
中图分类号:
[1]PENG N,DREDZE M.Named entity recognition for chinese social media with jointly trained embeddings[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:548-554. [2]UCHIMOTO K,MA Q,MURATA M,et al.Named entity ex-traction based on a maximum entropy model and transformation rules[C]//Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics.2000:326-335. [3]RAMSHAW L A,MARCUS M P.Text chunking using transformation-based learning[M]//Natural Language Processing Using Very Large Corpora.Springer,Dordrecht,1999:157-176. [4]RATNAPARKHI A.Maximum entropy models for natural lan-guage ambiguity resolution[D].Philadelphia:University of Pennsylvania,1998. [5]VEENSTRA J,SANG E F T K.Representing Text Chunks[C]//Proceedings of the NinthConference of the European Chapter of the Association for Computational Linguistics(EACL’99).Association for Computational Linguistics,1999:173-179. [6]RATINOV L,ROTH D.Design challenges and misconceptions in named entity recognition[C]//Proceedings of the Thirteenth Conference on Computational Natural Language Learning(CoNLL-2009).2009:147-155. [7]TKACHENKO A,PETMANSON T,LAUR S.Named entityrecognition in estonian[C]//Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing.2013:78-83. [8]MALIK M K,SARWAR S M.Named entity recognition system for postpositional languages:urdu as a case study[J].International Journal of Advanced Computer Science and Applications,2016,7(10):141-147. [9]REIMERS N,GUREVYCH I.Optimal Hyperparameters forDeep LSTM-Networks for Sequence Labeling Tasks[J].arXiv:1707.06799,2017. [10]YANG J,LIANG S,ZHANG Y.Design Challenges and Misconceptions in Neural Sequence Labeling[C]//Proceedings of the 27th International Conference on Computational Linguistics.2018:3879-3889. [11]LIU P,GUO Y,WANG F,et al.Chinese named entity recognition:The state of the art[J].Neurocomputing,2022,473:37-53. [12]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [13]SUN Y,WANG S,LI Y,et al.ERNIE:Enhanced Representation through Knowledge Integration[J].arXiv:1904.09223,2019. [14]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [15]LAFFERTY J D,MCCALLUM A,PEREIRA F C N.Condi-tional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]//ICML.2001. [16]SEHANOBISH A,SONG C H.Using Chinese Glyphs forNamed Entity Recognition[J].arXiv:1909.09922,2019. [17]MENG Y,WU W,WANG F,et al.Glyce:glyph-vectors for chinese character representations[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:2746-2757. [18]LI X,YAN H,QIU X,et al.FLAT:Chinese NER Using Flat-Lattice Transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:6836-6842. [19]MA R,PENG M,ZHANG Q,et al.Simplify the Usage of Lexicon in Chinese NER[C]//Proceedings of the 58th Annual Mee-ting of the Association for Computational Linguistics.2020:5951-5960. |
[1] | 李帅, 徐彬, 韩祎珂, 廖同鑫. SS-GCN:情感增强和句法增强的方面级情感分析模型 SS-GCN:Aspect-based Sentiment Analysis Model with Affective Enhancement and Syntactic Enhancement 计算机科学, 2023, 50(3): 3-11. https://doi.org/10.11896/jsjkx.220700238 |
[2] | 汪璟玢, 赖晓连, 林新宇, 杨心逸. 基于关系约束的上下文感知时态知识图谱补全 Context-aware Temporal Knowledge Graph Completion Based on Relation Constraints 计算机科学, 2023, 50(3): 23-33. https://doi.org/10.11896/jsjkx.220400255 |
[3] | 陈富强, 寇嘉敏, 苏利敏, 李克. 基于图神经网络的多信息优化实体对齐模型 Multi-information Optimized Entity Alignment Model Based on Graph Neural Network 计算机科学, 2023, 50(3): 34-41. https://doi.org/10.11896/jsjkx.220700242 |
[4] | 邓亮, 齐攀虎, 刘振龙, 李敬鑫, 唐积强. BGPNRE:一种基于BERT的全局指针网络实体关系联合抽取方法 BGPNRE:A BERT-based Global Pointer Network for Named Entity-Relation Joint Extraction Method 计算机科学, 2023, 50(3): 42-48. https://doi.org/10.11896/jsjkx.220600239 |
[5] | 李志飞, 赵月, 张龑. 基于表示学习的知识图谱推理研究综述 Survey of Knowledge Graph Reasoning Based on Representation Learning 计算机科学, 2023, 50(3): 94-113. https://doi.org/10.11896/jsjkx.220900136 |
[6] | 饶丹, 时宏伟. 基于深度聚类的航空交通流识别与异常检测研究 Study on Air Traffic Flow Recognition and Anomaly Detection Based on Deep Clustering 计算机科学, 2023, 50(3): 121-128. https://doi.org/10.11896/jsjkx.220100086 |
[7] | 段顺然, 尹美娟, 刘粉林, 焦隆隆, 于岚岚. 一种基于影响力预测的节点排序模型 Nodes’ Ranking Model Based on Influence Prediction 计算机科学, 2023, 50(3): 155-163. https://doi.org/10.11896/jsjkx.211200261 |
[8] | 董永峰, 黄港, 薛婉若, 李林昊. 融合IRT的图注意力深度知识追踪模型 Graph Attention Deep Knowledge Tracing Model Integrated with IRT 计算机科学, 2023, 50(3): 173-180. https://doi.org/10.11896/jsjkx.211200134 |
[9] | 梅鹏程, 杨吉斌, 张强, 黄翔. 一种基于三维卷积的声学事件联合估计方法 Sound Event Joint Estimation Method Based on Three-dimension Convolution 计算机科学, 2023, 50(3): 191-198. https://doi.org/10.11896/jsjkx.220500259 |
[10] | 白雪飞, 马亚楠, 王文剑. 基于特征融合的边缘引导乳腺超声图像分割方法 Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion 计算机科学, 2023, 50(3): 199-207. https://doi.org/10.11896/jsjkx.211200294 |
[11] | 刘航, 普园媛, 吕大华, 赵征鹏, 徐丹, 钱文华. 极化自注意力约束颜色溢出的图像自动上色 Polarized Self-attention Constrains Color Overflow in Automatic Coloring of Image 计算机科学, 2023, 50(3): 208-215. https://doi.org/10.11896/jsjkx.220100149 |
[12] | 刘松岳, 王欢. 基于多粒度特征融合的叶片分类与分级方法 Leaf Classification and Ranking Method Based on Multi-granularity Feature Fusion 计算机科学, 2023, 50(3): 216-222. https://doi.org/10.11896/jsjkx.211100203 |
[13] | 张卫良, 陈秀宏. 跨层融合和感受野扩增的SSD目标检测算法 SSD Object Detection Algorithm with Cross-layer Fusion and Receptive Field Amplification 计算机科学, 2023, 50(3): 231-237. https://doi.org/10.11896/jsjkx.211100281 |
[14] | 陈亮, 王璐, 李生春, 刘昌宏. 基于深度学习的可视化仪表板生成技术研究 Study on Visual Dashboard Generation Technology Based on Deep Learning 计算机科学, 2023, 50(3): 238-245. https://doi.org/10.11896/jsjkx.230100064 |
[15] | 张译, 吴秦. 特征增强损失与前景注意力人群计数网络 Crowd Counting Network Based on Feature Enhancement Loss and Foreground Attention 计算机科学, 2023, 50(3): 246-253. https://doi.org/10.11896/jsjkx.220100219 |
|