战术导弹技术

2025 01 No.229 42-52

AIGC军事大模型评估体系框架研究

张龙;王数;雷震;冯轩铭;杨波;

基金项目(Foundation):

邮箱(Email):

DOI: 10.16358/j.issn.1009-1300.20240112

中文作者单位:

军事科学院系统工程研究院;

摘要(Abstract):

生成式人工智能（AI-Generated Content,AIGC）关键技术突破推动多模态大语言模型（Multimodal Large Language Models,MLLMs）军事垂直领域应用过程中存在评估体系评估指标不够健全的问题，为解决此问题，采用自顶向下正向设计与自底向上聚合评估相结合的方法，构建包含智能化军事需求—智能化场景任务—系统性能评估—体系效能评估的“四域”，与基础支撑服务—算法指标体系—综合安全防护的“三维”军事大模型评估体系框架，提出评估大模型的主要维度、关键指标和基本流程，并定性定量相结合给出相应评估指标体系，为军事大模型赋能装备体系和作战效能提供评估支撑。

关键词(KeyWords): 生成式人工智能(AIGC);多模态大语言模型(MLLMs);军事大模型;智能化;评估;体系效能;体系框架

344	0	208
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

全文参考文献出版信息相关文章

如需获取全文，请访问cnki.net

参考文献

[1]许志伟，李海龙，李博，等. AIGC大模型测评综述：使能技术，安全隐患和应对[J].计算机科学与探索，2024,18(9):2293-2325.

[2]蔡磊，孟宪波，韩冬梅，等.大模型在军事垂直领域的应用[J].舰船科学技术，2024,46(5):171-175.

[3]赵睿卓，曲紫畅，陈国英，等.大语言模型评估技术研究进展[J].数据采集与处理，2024(3):502-523.

[4]赵月，何锦雯，朱申辰，等.大语言模型安全现状与挑战[J].计算机科学，2024,51(1):68-71.

[5] Jhong K Y. Evaluating artificial intelligence for operations in the information environment[D]. Monterey,CA:Naval Postgraduate School,2023.

[6] Li B,Fang G,Yang Y,et al. Evaluating ChatGPT’s information extraction capabilities:An assessment of performance,explainability,calibration,and faithfulness[DB/OL]. 2024-07-17. https：//arxiv.org/abs/2304.11633v1.

[7] Yu H,Liu J,Zhang X,et al. A survey on evaluation of out-of-distribution generalization[DB/OL]. 2024-07-29.http：//arxiv.org/abs/2403.01874.

[8] Burns G R,Collier R T,Cornish R J,et al. Evaluating artificial intelligence methods for use in kill chain functionS[R]. Monterey, CA:Naval Postgraduate School,2021.

[9] Long L,Wang R,Xiao R,et al. On llms-driven synthetic data generation, curation, and evaluation:A survey[DB/OL]. 2024-07-29. http：//arxiv. org/abs/2406. 15126.

[10] Tian J,Li Y,Chen W,et al. Diagnosing the first-order logical reasoning ability through logicnLI[C]. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,2021:3738-3747.

[11] Hendrycks D,Burns C,Kadavath S,et,al. Measuring mathematical problem solving with the math dataset[DB/OL]. 2024-09-21. https：//arxiv. org/abs/2103.03874v2.

[12] Mitchell T,Sheffield N,Richardson D,et,al. An analytic framework for assessing artificial intelligence and assistive automation enabled command and control decision aids for mission effectiveness[J]. Industrial and Systems Engineering Review,2023,11(1-2):1-8.

[13] Freeman L,Kauffman J,Sobien D,et al. Best practices for addressing new challenges in testing and evaluating artificial intelligence enabled systems[J]. AIRC Perspectives,2022:11.

[14] Fan Z, Ghaddar B, Wang X, et al. Artificial intelligence for operations research:Revolutionizing the operations research process[DB/OL]. 2024-10-11. http：//arxiv. org/abs/2401. 03244.

[15]孙毅，裘杭萍，郑雨，等.自然语言预训练模型知识增强方法综述[J].中文信息学报，2021,35(7):10-29.

[16] Liu P,Yuan W,Fu J,et al. Pre-train,prompt,and predict:A systematic survey of prompting methods in natural language processing[DB/OL]. 2024-07-05.http：//arxiv. org/abs/2107.13586.

[17]刘文炎，沈楚云，王祥丰，等.可信机器学习的公平性综述[J].软件学报，2021,32(5):1404-1426..

[18] XAI—Explainable artificial intelligence|Science Robotics[EB/OL]. 2024-07-18. https：//www. science. org/doi/10. 1126/scirobotics. aay7120.

[19] La Malfa E. On robustness for natural language processing[D]. Oxford:University of Oxford,2023.

[20] Reed A R. Uncertainty quantification:Artificial intelligence and machine learning in military systems[J]. Air&Space Operations Review,2023,2(1).

[21] Pfaff C A,Lowrance C J,Washburn B M,et al. Trusting AI:Integrating artificial intelligence into the army’s professional expert knowledge[M]. USAWC Press,2023.

[22] Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[DB/OL]. 2024-07-08.http：//arxiv. org/abs/1707. 06347.

[23] Chen D, Chen R, Zhang S, et, al. MLLM-as-aJudge:Assessing multimodal llm-as-a-judge with vision-language benchmark[DB/OL]. 2024-07-19. https：//arxiv. org/abs/2402. 04788v3.

[24] Zhao W X,Zhou K,Li J,et,al. A survey of large language models[DB/OL]. 2024-07-05. http：//arxiv. org/abs/2303. 18223.

[25]王立盟. 2023年国外军事人工智能领域科技发展综述[J].战术导弹技术，2024(2):17-26.

[26] Li Z,Xu X,Shen T,et al. Leveraging large language models for nlg evaluation:Advances and challenges[DB/OL]. 2024-07-02.

[27]王亚珅，陈浩，葛悦涛，等. 2023年人工智能领域科技发展综述[J].战术导弹技术，2024(1):20-32+67.

[28] Dong Y,Mu R,Zhang Y,et,al. Safeguarding large language models:A survey[DB/OL]. 2024-07-02.http：//arxiv. org/abs/2406. 02622.

[29]张子春，刘增良，余达太.一种大数据条件下军事信息服务安全评估模型[J].信息安全与通信保密，2014(6):90-94+99.

[30] Shadowcast：针对视觉语言模型的隐蔽数据中毒攻击[DB/OL]. 2024-07-16. https：//arxiv.org/abs/2402.06659.

[31]龙育诚.纵向联邦学习对抗攻击和鲁棒性研究[D].广州：广州大学，2024.

[32]党亚娟. ChatGPT潜在军事应用及风险分析[J].国防科技工业，2023(3):54-56.

[33] Khoshnoodi M,Jain V,Gao M,et al. A comprehensive survey of accelerated generation techniques in large language models[DB/OL]. 2024-07-02. http：//arxiv.org/abs/2405.13019.

基本信息:

DOI：10.16358/j.issn.1009-1300.20240112

中图分类号:E91

引用信息:

[1]张龙,王数,雷震等.AIGC军事大模型评估体系框架研究[J].战术导弹技术,2025,No.229(01):42-52.DOI:10.16358/j.issn.1009-1300.20240112.

基金信息:

请选择需要下载的pdf数据

战术导弹技术

Summary

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文