Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 241000129-7.doi: 10.11896/jsjkx.241000129
• Artificial Intelligence • Previous Articles Next Articles
LIANG Binghao, ZHANG Chuangang, YUAN Mingming
CLC Number:
| [1]HANG Y P,WANG X,WANG J D,et al.A Survey on Evaluation of Large Language Models [J].arXiv:2310.19736,2023. [2]WANG A,SINGH A,MICHAEL J,et al.GLUE:A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[J],arXiv:1804.07461,2018. [3]WANG A,PRUKSACHATKUN Y,NANGIA N,et al.SuperGLUE:A Stickier Benchmark for General-Purpose Language Understanding Systems[J].arXiv:1905.00537,2019. [4]ZHONG W J,CUI R X,GUO Y D,et al.AGIEval:A Human-Centric Benchmark for Evaluating Foundation Models [J].ar-Xiv:2304.06364,2023. [5]DAN H,COLLIN B,STEVEN B,et al.Measuring MassiveMultitask Language Understanding [J].arXiv:2009.03300,2021. [6]SRIVASTAVA A,RASTOGI A,RAO A,et al.Beyond the Imi-tation Game:Quantifying and extrapolating the capabilities of language models [J].arXiv:2206.04615,2023. [7]HUANG Y Z,BAI Y Z,ZHU Z H,et al.C-Eval:A Multi-LevelMulti-Discipline Chinese Evaluation Suite for Foundation Mo-dels[J].arXiv:2305.08322,2023. [8]ZENG H.Measuring Massive Multitask Chinese Understanding [J].arXiv:2304.12986,2023. [9]RAJ S S,KUNAL C,DHEERAJ E,et al.When Flue MeetsFlang:Benchmarks and Large Pre-trained Language Model for Financial Domain [J].arXiv:2211.00083,2022. [10]ZHANG L W,CAI W G,LIU Z W,et al.FinEval:A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models [J].arXiv:2308.09975,2023. [11]LEI Y,LI J T,CHENG D W,et al.CFBenchmark:Chinese Financial Assistant Benchmark for Large Language Model [J].arXiv:2311.05812v2,2024. [12]FEI Z W,SHEN X Y,ZHU D W,et al.LawBench:Benchmar-king Legal Knowledge ofLarge Language Models [J].arXiv:2309.16289,2023. [13]LIU M X,HU W G,DING J R,et al.MedBench:A Comprehensive,Standardized,and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models [J].arXiv:2407.10990,2024. [14]BAI J Z,BAI S,CHU Y F,et al.Qwen technical report [J].arXiv:2309.16609,2023. [15]YANG A Y,XIAO B,WANG B N,et al.Baichuan 2:Open large-scale language models [J].arXiv:2309.10305,2023. |
| [1] | JIAO Ruodan, GAO Donghui, HUANG Yanhua, LIU Shuo, DUAN Xuanfei, WANG Rui, LIU Weidong. Study and Verification on Few-shot Evaluation Methods for AI-based Quality Inspection in Production Lines [J]. Computer Science, 2024, 51(6A): 230700086-8. |
| [2] | CHEN Chong, CHEN Jie, ZHANG Hui, CAI Lei, XUE Yaru. Review on Interpretability of Deep Learning [J]. Computer Science, 2023, 50(5): 52-63. |
| [3] | ZHANG Jie-hui, PAN Chao, ZHANG Yong. Network System Risk Assessment Model with Optimal Weights [J]. Computer Science, 2019, 46(6): 148-152. |
| [4] | YUE Chuan, PENG Xiao-hong. Evaluation Model of Software Quality with Interval Data [J]. Computer Science, 2019, 46(10): 209-214. |
| [5] | LENG Qiang, YANG Ying-jie, HU Hao. Self-adaption Adjustment Method for Experts in Risk Assessment [J]. Computer Science, 2018, 45(12): 98-103. |
| [6] | LIU Chang and FAN Bin. Weighted Least Squares Support Vector Machine Based on Entropy Evaluation [J]. Computer Science, 2017, 44(Z11): 428-431. |
| [7] | LI Hong-jun, CUI Xi-ning, MU Ming and HAN Wei. Research on Distributed Embedded Computer Performance Evaluation Model [J]. Computer Science, 2017, 44(4): 153-156. |
| [8] | WU Ju-hua, CHENG Xiao-yan, CAO Qiang and MO Zan. Trustworthy Web Servcie Selection Based on Social Network [J]. Computer Science, 2016, 43(1): 141-144. |
| [9] | YOU Meng-li and LEI Xiu-juan. Study and Application of Evaluating Methods of PPI Network Clustering [J]. Computer Science, 2013, 40(12): 254-258. |
| [10] | . Load Evaluation Method about Cloud Computing Cluster Based on the Load Grayscale Mapping Model [J]. Computer Science, 2012, 39(3): 23-27. |
| [11] | WANG Hui-mei,LI Xu,XIAN Ming,WANG Guo-yu. Genetic Projection Pursuit Evaluation Method of Network Attack Resistance Ability [J]. Computer Science, 2010, 37(6): 43-45. |
| [12] | YANG Xiao-ping,ZHANG Wei-qun,ZHOU Xiang-bing. Component Composition Evaluation Method Based on Grey Correlation [J]. Computer Science, 2009, 36(8): 174-176. |
| [13] | . [J]. Computer Science, 2007, 34(2): 181-185. |
|
||