计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 241000154-6.doi: 10.11896/jsjkx.241000154
杨华, 王宝会
YANG Hua, WANG Baohui
摘要: 招标文件的编制和审核,是确保招标过程顺利进行的重要环节。实体识别技术在招标文件审核过程中可以显著提高信息提取的准确性和效率,增强信息的可读性和可检索性。但招标文件内容复杂,专业术语多,长实体识别难度大,传统命名实体识别方法在此类任务中的表现欠佳。为此,提出了一种命名实体识别技术,该技术整合了多头注意力机制、词汇特征融合以及基于RoBERTa的BiLSTM-CRF模型,简称为RoBERTa-DFF-BiLSTM-MHA-CRF。此方法利用RoBERTa模型作为基础输入层,有效提升了对长距离依赖特征的识别能力;通过引入多头自注意力机制,进一步增强了对长跨度实体的识别能力;融合领域专业术语的词典特征,解决了专业术语边界不明显的问题。实验结果表明,该模型在招标文件的命名实体识别任务中显著提升了信息提取的准确性和效率,相较于BERT-BiLSTM-CRF,在Precision上提升了2.49个百分点,在Recall上提升了4.28个百分点,在F1上提升了3.37个百分点,降低了时间和人力成本,为招投标文件的信息提取提供了一种高效的新方案。
中图分类号:
| [1]SHI B. Problems in the preparation of bidding documents and rationalization suggestions [J].China Tendering,2023(8):137-138. [2]MCCALLUM A,FREITAG D,PEREIRA F.Maximum entropyMarkov models for information extraction and segmentation[C]//Proceedings of 17th International Conference on Machine Learning.2000:591-598. [3]LAFFERTY J,MCCALLUM A,PEREIRA F.Conditionalran-dom fields:probabilistic models for segmenting andlabeling sequence data[C]//Proceedings of 17th International Conference on Machine Learning.San Francisco:MorganKaufmann Publishers,2001:282-289. [4]YU J D,FAN X Z,YIN J H.Application of hidden Markov model in natural language processing[J].Computer Engineering and Design,2007,28(22):5514-5516. [5]KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P.A convolutional neural network for modelling sentences[C]//Proceedings of the Association for Computational Linguistics(ACL).2014:655-665. [6]LAMPLE G,BALLESTEROS M,SUBRAMANIAN S,et al.Neural architectures for named entity recognition[C]//Procee-dings of NAACL-HLT.2016:260-270. [7]ZHU X,SOBIHANI P,GUO H.Long short-term memory over recursive structures[C]//Proceedings of the 32nd International Conference on Machine Learning(LCML-15).2015:1604-1612. [8]HUANG Z,WEI X,KAI Y.Bidirectional LSTM-CRF modelsfor sequence tagging[J].arXiv:1508.01991,2015. [9]ZHANG S F,WEN L Y,BIAN X,et al.Oc-clusion-aware r-cnn:Detecting pedestrians in a crowd[J].The European Conference on Computer Vision(ECCV),2018,11207:657-674. [10]LIU W,LIAO S C,HU W D,et al.Learningefficient single-stage pedestrian detectors by asymptoticlocalization fitting[C]//Computer Vision-ECCV 2018.2018:643-659. [11]ZHANG S S,BENENSON R,SCHIELE B.Citypersons:A diverse dataset for pedestrian detection[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:4457-4465. [12]LUO B,ZHANG X F,DUAN L,et al.Military Named Entity Recognition Based on RoBERTa-Span-Attack Label Pointer Network[J].Journal of Naval University of Engineering,2024,36(1):76-82,93. [13]LI J H,XIONG W,GONG K,et al.Research on Entity Recognition of Power Equipment Defects Integrating BERT-WWM and Attention Mechanism[J].Journal of Electric Power,2024,39(2):126-135. [14]ZHANG Y C,YANG Y,JIANG R,et al.A Business Entity Recognition Model Based on BiLSTM-CRF[J].Computer Engineering,2019,45(5):308-314. [15]MI J X,XIE H W.Research and Application of Named Entity Recognition for Bidding Materials[J].Computer Engineering and Applications,2023,59(2):314-320. [16]AEJAS B,BELHI A,ZHANG H,et al.Deep learning-based automatic analysis of legal contracts:a named entity recognition benchmark [J].Neural Computing and Applications,2024,36(23):14465-14481. [17]AHMET T,METIN T.Enhanced Named Entity Recognition algorithm for financial document verification [J].The Journal of Supercomputing,2023,79(17):19431-19451. [18]MA J,YU Y.Automatic Extraction Method for Key Information in Logistics Bidding Documents[J].Computer and Digital Engineering,2024,52(5):1400-1405. [19]HEIM G.Named entity recognition indigitalen sammlungenein werkstattbericht aus der badischen landesbibliothek[J].Bibliotheksdienst,2023,57(6):364-375. [20]PEI D,JING M,LIU H,et al.A fast RetinaNet fusionframework for multi-spectral pedestrian detection[EB/OL].https://doi.org/10.1016/j.infrared.2019.103178. [21]MAO H L,AIZIERGUL I,CHEN D G.Named Entity Recognition in Power Grid Dispatching Domain Based on Multi-Head Attention[J].Computer Technology and Development,2023,33(2):181-186,194. [22]LUO X,XIA X Y,AN Y,et al.Chinese Clinical Entity Recognition Combining Multi-Head Self-Attention Mechanism and BiLSTM-CRF[J].Journal of Hunan University (Natural Sciences Edition),2021,48(4):45-55. [23]LI B,WANG H C.Implementation and Application of a Chinese Grammar Error Diagnosis System Based on CRF [J].Computer Science,2024,51(S1):1141-1146. |
|
||