计算机科学 ›› 2024, Vol. 51 ›› Issue (10): 178-186.doi: 10.11896/jsjkx.230800191

• 计算机软件 • 上一篇    下一篇

面向业务的资源按需解析模型构建研究

刘耀1, 秦迅2, 刘天吉2   

  1. 1 中国科学技术信息研究所 北京 100038
    2 北京大学软件与微电子学院 北京 102600
  • 收稿日期:2023-08-30 修回日期:2024-01-22 出版日期:2024-10-15 发布日期:2024-10-11
  • 通讯作者: 刘耀(liuy@istic.ac.cn)
  • 基金资助:
    国家社会科学基金(21BTQ011)

Study on Building Business-oriented Resource On-demand Resolution Model

LIU Yao1, QIN Xun2, LIU Tianji2   

  1. 1 Engineering Center,Institute of Scientific and Technical Information of China,Beijing 100038,China
    2 School of Software and Microelectronics,Peking University,Beijing 102600,China
  • Received:2023-08-30 Revised:2024-01-22 Online:2024-10-15 Published:2024-10-11
  • About author:LIU Yao,born in 1972,Ph.D,resear-cher,is a distinguished member of CCF(No.17606D).His main research in-terests include natural language proces-sing,knowledge organization,and know-ledge engineering
  • Supported by:
    National Social Science Foundation of China(21BTQ011).

摘要: 针对在项目开发过程中新需求来临时,需要对自然语言处理工具和资源解析插件进行重新需求分析、重复开发等问题,提出了一套面向业务的资源按需解析方案。首先,提出了一种从需求到代码的资源按需解析方法,针对需求文本本身进行需求概念标引模型的构建。构建的需求概念标引模型的准确率、召回率、F1值等指标均高于其他分类模型。然后,针对需求文本与代码的关联,建立从需求文本到代码库类别的映射机制。对于模型的映射结果,使用前K准确率(percision@K)作为评价指标,最终准确率达到60%,具有一定的实用价值。综上所述,探索了一套具有需求解析能力、实现需求与代码关联的资源按需解析关键技术,并贯穿需求文本分类、需求代码库分类、代码库检索到插件生成的整个流程,形成了完整的“需求-代码-插件-解析”的业务闭环,通过实验验证了所提方法对于资源按需解析的有效性,为业务需求分析与软件复用提供了思路,与现有用于业务需求的解析和代码生成的大语言模型相比,所提方法聚焦于具体业务领域内的含有业务特点的插件代码复用全流程的实现。

关键词: 自然语言处理, 需求模型, 代码复用, 文本解析, 代码分类, 代码检索

Abstract: To address the issue of re-analyzing and repeating development of natural language processing tools and resource ana-lysis plugins when new requirements arise during project development,this paper proposes a business-oriented on-demand resource analysis solution.Firstly,a demand-driven resource analysis method from requirement to code is proposed,focusing on the construction of a demand concept indexing model for the requirement text itself.The constructed demand concept indexing model outperforms other classification models in terms of accuracy,recall,and F1 score.Secondly,this paper establishes a mapping mechanism from requirement text to code library categories based on the correlation between requirement text and code.For the mapping results,the precison@K is used as an evaluation metric,with an ultimate accuracy rate of 60%,demonstrating a certain practical value.In summary,this paper explores a set of key technologies for on-demand resource analysis with demand parsing capabilities and implements the correlation between requirements and code,covering the entire process from requirement text classification,code library classification,code library retrieval to plugin generation.The proposed method forms a complete business loop of “requirement-code-plugin-analysis” and experimentally verifies to be effective for on-demand resource analysis.Compared to existing large language models for business requirement analysis and code generation,this method focuses on the implementation of the full process of plugin code reuse within specific business domains,containing business characteristics.

Key words: Natural language processing, Requirements model, Code reuse, Text parsing, Code categorization, Code retrieval

中图分类号: 

  • TP391
[1]LAMPLE G,BALLESTEROS M,SUBRAMANIAN S,et al.Neural architectures for named entity recognition[J].arXiv:1603.01360,2016.
[2]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[3]LECUN Y,BOTTOU L.Gradient-based learning applied to do-cument recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[4]BENGIO Y,SIMARD P,FRASCONI P.Learning long-term dependencies with gradient descent is difficult[J].IEEE transactions on neural networks,1994,5:157-166.
[5]KIM Y.Convolutional neural networks for sentence classification[J].arXiv:1408.5882,2014.
[6]SHI B,XIANG B,CONG Y.An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2016,39(11):2298-2304.
[7]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9:1735-1780.
[8]GERS F A,SCHMIDHUBER J,CUMMINS F A.Learning to Forget:Continual Prediction with LSTM[J].Neural Computation,2000,12:2451-2471.
[9]GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(7):602-610.
[10]CHUNG J,GULCEHRE C,CHO K,et al.Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[J].arXiv:1412.3555,2014.
[11]PETERS M,NEUMANN M,IYYER M,et al.Deep Contextua-lized Word Representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long Papers).2018:2227-2237.
[12]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[EB/OL].https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[13]LU J,YING W,SUN X,et al.Interactive Query Reformulation for Source-code Search with Word Relations[J].IEEE Access,2018,6:75660-75668.
[14]MCMILLAN C,GRECHANIK M,POSHYVANYK D,et al.Portfolio:finding relevant functions and their usage[C]//Proceedings of the 33rd International Conference on Software Engineering(ICSE 2011).Waikiki,Honolulu,HI,USA,2011(5):111-120.
[15]LV F,ZHANG H,LOU J,et al.Codehow:Effective code search based on API understanding and extendedboolean model[C]//30th IEEE/ACM International Conference on Automated Software Engineering.ASE 2015,Lincoln,NE,USA,2015:260-270.
[16]RAHMAN M M,CHANCHAL R.Nlp2api:Query reformula-tion for code search using crowdsourced knowledge and extra-large data analytics [C]//2018 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2018:714-714.
[17]HUSAIN H,WU H,GAZIT T,et al.Codesearchnet challenge:Evaluating the state of semantic code search[J].arXiv:1909.09436,2020.
[18]GU X,ZHANG H,KIM S.Deep code search[C]//Proceedings of the 40th International Conference on Software Engineering.ICSE 2018,2018:933-944.
[19]YIN P,NEUBIG G.A syntactic neural model for general purposecode generation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(ACL 2017)Vancouver,Canada,Volume 1:Long Papers,Association for Computational Linguistics.2017:440-450.
[20]ZHANG J,WANG X,ZHANG H,et al.A novel neural source code representation based on abstract syntax tree[C]//Procee-dings of the 41st International Conference on Software Enginee-ring(ICSE 2019).Montreal,QC,Canada.IEEE / ACM,2019:783-794.
[21]WAN Y,SHU J,SUI Y,et al.Multi-modal attention networklearning for semantic source code retrieval[C]//34th IEEE/ACM International Conference on Automated Software Engineering(ASE 2019).San Diego,CA,USA.IEEE,2019:13-25.
[22]YANG H.BERT meets chinese word segmentation[J].arXiv:1909.09292,2019.
[23]SCHICK T,SCHÜTZE H.Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:Main Volume.2020:255-269.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!