CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity

التفاصيل البيبلوغرافية
العنوان: CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity
المؤلفون: Yu, Zhengmin, Zeng, Jiutian, Chen, Siyi, Xu, Wenhan, Xu, Dandan, Liu, Xiangyu, Ying, Zonghao, Wang, Nan, Zhang, Yuan, Yang, Min
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Cryptography and Security
الوصف: Over the past year, there has been a notable rise in the use of large language models (LLMs) for academic research and industrial practices within the cybersecurity field. However, it remains a lack of comprehensive and publicly accessible benchmarks to evaluate the performance of LLMs on cybersecurity tasks. To address this gap, we introduce CS-Eval, a publicly accessible, comprehensive and bilingual LLM benchmark specifically designed for cybersecurity. CS-Eval synthesizes the research hotspots from academia and practical applications from industry, curating a diverse set of high-quality questions across 42 categories within cybersecurity, systematically organized into three cognitive levels: knowledge, ability, and application. Through an extensive evaluation of a wide range of LLMs using CS-Eval, we have uncovered valuable insights. For instance, while GPT-4 generally excels overall, other models may outperform it in certain specific subcategories. Additionally, by conducting evaluations over several months, we observed significant improvements in many LLMs' abilities to solve cybersecurity tasks. The benchmarks are now publicly available at https://github.com/CS-EVAL/CS-Eval.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2411.16239
رقم الانضمام: edsarx.2411.16239
قاعدة البيانات: arXiv