Report
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
العنوان: | DiscoveryBench: Towards Data-Driven Discovery with Large Language Models |
---|---|
المؤلفون: | Majumder, Bodhisattwa Prasad, Surana, Harshit, Agarwal, Dhruv, Mishra, Bhavana Dalvi, Meena, Abhijeetsingh, Prakhar, Aryan, Vora, Tirth, Khot, Tushar, Sabharwal, Ashish, Clark, Peter |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning |
الوصف: | Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systematically assess current model capabilities in discovery tasks and provide a useful resource for improving them. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering, by manually deriving discovery workflows from published papers to approximate the real-world challenges faced by researchers, where each task is defined by a dataset, its metadata, and a discovery goal in natural language. We additionally provide 903 synthetic tasks to conduct controlled evaluations across task complexity. Furthermore, our structured formalism of data-driven discovery enables a facet-based evaluation that provides useful insights into different failure modes. We evaluate several popular LLM-based reasoning frameworks using both open and closed LLMs as baselines on DiscoveryBench and find that even the best system scores only 25%. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress. Comment: Website: https://github.com/allenai/discoverybench |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2407.01725 |
رقم الانضمام: | edsarx.2407.01725 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |