计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 239-247.doi: 10.11896/jsjkx.240900123
程大伟1,2,3, 吴佳璇1, 李江彤1, 丁志军1,2,3, 蒋昌俊1,2,3
CHENG Dawei1,2,3, WU Jiaxuan1, LI Jiangtong1, DING Zhijun1,2,3, JIANG Changjun1,2,3
摘要: 随着大模型技术的快速发展,其在金融领域的应用已成为推动行业变革的重要力量。构建标准化、系统化的金融能力评测框架是衡量大模型金融场景能力的重要途径,但是现有的评测方法存在评测数据集泛化性弱、任务场景覆盖面窄等缺点。因此,提出了一种面向大模型金融能力的评测框架CFBenchmark,该框架由金融自然语言处理、金融场景计算、金融分析与解读,以及金融合规与安全四大核心评估模块构成,基于模块内的多任务场景设计和系统化评测指标来为金融领域大模型的能力评估提供标准化、系统化的解决途径。实验结果表明,大模型在金融场景下的表现与模型参数、架构和训练过程息息相关,同时大模型在金融合规与安全领域仍有很大改进空间。未来随着大模型在金融领域的应用愈发广泛,大模型金融能力测评框架需完善更多真实场景的任务设计与高质量测评数据的收集,以提升大模型在多样化金融场景下的泛化能力。
中图分类号:
[1]SON G,JUNG H,HAHM M,et al.Beyond classification:Financial reasoning in state-of-the-art language models[J].arXiv:2305.01505,2023. [2]NISZCZOTA P,ABBAS S.GPT has become financially literate:Insights from financial literacy tests of GPT and a preliminary test of how people use it as a source of advice [J].arXiv:2309.00649,2023. [3]NI X,LI P,LI H.Unified text structuralization with instruction-tuned language models[J].arXiv:2303.14956,2023. [4]ABDALJALIL S,BOUAMOR H.An exploration of automatic text summarization of financial reports[C]//Proceedings of the Third Workshop on Financial Technology and Natural Language Processing.2021:1-7. [5]YEPES A J,YOU Y,MILCZEK J,et al.Financial Report Chunking for Effective Retrieval Augmented Generation[J].arXiv:2402.05131,2024. [6]KHANNA U,GHODRATNAMA S,MOLLA D,et al.Transformer-based models for long document summarisation in financial domain[C]//Financial Narrative Processing Workshop(4th:2022).European Language Resources Association(ELRA),2022:73-78. [7]WANG D,RAMAN N,SIBUE M,et al.DocLLM:A Layout-Aware Generative Language Model for Multimodal Document Understanding[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers),ACL 2024,Bangkok,Thailand,August 11-16,2024.Association for Computational Linguistics,2024:8529-8548. [8]DENG X,BASHLOVKINA V,HAN F,et al.What do llmsknow about financial markets? a case study on reddit market sentiment analysis[C]//Companion Proceedings of the ACM Web Conference 2023.2023:107-110. [9]VAMOSSY D F,SKOG R.Emtract:Extracting emotions fromsocial media[J].arXiv:2112.03868,2021. [10]JEONG C.Fine-tuning and utilization methods of domain-speci-fic llms[J].arXiv:2401.02981,2024. [11]BĂROIU A C,TRĂUŞAN-MATU Ş.How capable are state-of-the-art language models to cope with sarcasm?[C]//2023 24th International Conference on Control Systems and Computer Science(CSCS).IEEE,2023:399-402. [12]WU S,IRSOY O,LU S,et al.Bloomberggpt:A large language model for finance[J].arXiv:2303.17564,2023. [13]KIM A,MUHN M,NIKOLAEV V V.Bloated disclosures:can ChatGPT help investors process information?[J].arXiv:2306.10224,2023. [14]BHATIA G,NAGOUDI E M B,CAVUSOGLU H,et al.Fin-tral:A family of gpt-4 level multimodal financial large language models[J].arXiv:2402.10986,2024. [15]LEIPPOLD M.Sentiment spin:Attacking financial sentimentwith GPT-3[J].Finance Research Letters,2023,55:103957. [16]JIANG Y,PAN Z,ZHANG X,et al.Empowering time seriesanalysis with large language models:A survey[J].arXiv:2402.03182,2024. [17]ZHANG Z,SUN Y,WANG Z,et al.Large language models for mobility in transportation systems:A survey on forecasting tasks[J].arXiv:2405.02357,2024. [18]JIN M,ZHANGY F,CHEN W,et al.Position:What Can Large Language Models Tell Us about Time Series Analysis[C]//Proceedings of the Forty-first International Conference on Machine Learning.2024. [19]PAN Z,JIANG Y,GARG S,et al.S2 IP-LLM:Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting[C]//Forty-first International Conference on Machine Learning.2024. [20]GRUVER N,FINZI M,QIU S,et al.Large language models are zero-shot time series forecasters [J].arXiv:2310.07820,2023. [21]HENDRYCKS D,BURNS C,BASART S,et al.Measuring Massive Multitask Language Understanding[C]//Proceedings of the International Conference on Learning Representations(ICLR).2021. [22]WELBL J,LIU N F,GARDNER M.Crowdsourcing MultipleChoice Science Questions[C]//Proceedings of the 3rd Workshop on Noisy User-generated Text,NUT@EMNLP 2017,Copenhagen,Denmark,September 7,2017.Association for Computational Linguistics,2017:94-106. [23]ZHONG W J,CUI R X,GUO Y D,et al.AGIEval:A Human-Centric Benchmark for Evaluating Foundation Models[C]//Findings of the Association for Computational Linguistics:NAACL 2024,Mexico City,Mexico,June 16-21,2024.Association for Computational Linguistics,2024:2299-2314. [24]ZHONG M,YIN D,YU T,et al.QMSum:A New Benchmark for Query-based Multi-domain Meeting Summarization[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,NAACL-HLT 2021,Online,June 6-11,2021.Association for Computational Linguistics,2021:5905-5921. [25]SHAH R S,CHAWLA K,EIDNANI D,et al.When flue meets flang:Benchmarks and large pre-trained language model for financial domain[J].arXiv:2211.00083,2022. [26]XIE Q,HAN W,ZHANG X,et al.Pixiu:A large language mo-del,instruction data and evaluation benchmark for finance[J].arXiv:2306.05443,2023. [27]LU D,WU H,LIANG J,et al.Bbt-fin:Comprehensive construction of chinese financial domain pre-trained language model,corpus and benchmark[J].arXiv:2302.09432,2023. [28]ZHANG L,CAI W,LIU Z,et al.Fineval:A chinese financial domain knowledge evaluation benchmark for large language mo-dels[J].arXiv:2308.09975,2023. [29]LI J,LEI Y,BIAN Y,et al.RA-CFGPT:Chinese financial assistant with retrieval-augmented large language mode[J].Frontiers of Computer Science,2024,18(5):185350. [30]ZENG W,REN X,SU T,et al.PanGu-α:Large-scale autoregressive pretrained Chinese language models with auto-parallel computation[J].arXiv:2104.12369,2021. [31]LIANG X,CHENG D,YANG F,et al.F-HMTC:Detecting Financial Events for Investment Decisions Based on Neural Hierarchical Multi-Label Text Classification[C]//IJCAI.2020:4490-4496. [32]KOJIMA T,GU S S,REID M,et al.Large language models are zero-shot reasoners[J].Advances in Neural Information Processing Systems,2022,35:22199-22213. [33]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901. [34]SUN Y,WANG S,FENG S,et al.Ernie 3.0:Large-scale knowledge enhanced pre-training for language understanding and generation[J].arXiv:2107.02137,2021. [35]YANG A,YANG B,HUI B,et al.Qwen2 technical report[J].arXiv:2407.10671,2024. [36]YANG A,XIAO B,WANG B,et al.Baichuan 2:Open large-scale language models[J].arXiv:2309.10305,2023. [37]ZENG A,LIU X,DU Z,et al.GLM-130B:An Open Bilingual Pre-trained Mode[C]//the Eleventh International Conference on Learning Representations(ICLR 2023),Kigali,Rwanda,May 1-5.2023. [38]DU Z,QIAN Y,LIU X,et al.GLM:General Language Model Pretraining with Autoregressive Blank Infilling[C]//Procee-dings of the Annual Meeting of the Association for Computa-tional Linguistics.2021:320-335. [39]TEAM I L M.Internlm:A multilingual language model withprogressively enhanced capabilities[J/OL].(2023-01-06)[2023-09-27].https://github.com/InternLM/InternLM,2023. |
|