Low-Cost and Comprehensive Non-textual Input Fuzzing with LLM-Synthesized Input Generators

التفاصيل البيبلوغرافية
العنوان:	Low-Cost and Comprehensive Non-textual Input Fuzzing with LLM-Synthesized Input Generators
المؤلفون:	Zhang, Kunpeng, Li, Zongjie, Wu, Daoyuan, Wang, Shuai, Xia, Xin
المصدر:	The 34th USENIX Security Symposium, 2025
سنة النشر:	2025
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Software Engineering
الوصف:	Modern software often accepts inputs with highly complex grammars. Recent advances in large language models (LLMs) have shown that they can be used to synthesize high-quality natural language text and code that conforms to the grammar of a given input format. Nevertheless, LLMs are often incapable or too costly to generate non-textual outputs, such as images, videos, and PDF files. This limitation hinders the application of LLMs in grammar-aware fuzzing. We present a novel approach to enabling grammar-aware fuzzing over non-textual inputs. We employ LLMs to synthesize and also mutate input generators, in the form of Python scripts, that generate data conforming to the grammar of a given input format. Then, non-textual data yielded by the input generators are further mutated by traditional fuzzers (AFL++) to explore the software input space effectively. Our approach, namely G2FUZZ, features a hybrid strategy that combines a holistic search driven by LLMs and a local search driven by industrial quality fuzzers. Two key advantages are: (1) LLMs are good at synthesizing and mutating input generators and enabling jumping out of local optima, thus achieving a synergistic effect when combined with mutation-based fuzzers; (2) LLMs are less frequently invoked unless really needed, thus significantly reducing the cost of LLM usage. We have evaluated G2FUZZ on a variety of input formats, including TIFF images, MP4 audios, and PDF files. The results show that G2FUZZ outperforms SOTA tools such as AFL++, Fuzztruction, and FormatFuzzer in terms of code coverage and bug finding across most programs tested on three platforms: UNIFUZZ, FuzzBench, and MAGMA. Comment: USENIX Security 2025
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2501.19282
رقم الانضمام:	edsarx.2501.19282
قاعدة البيانات:	arXiv

View record in Arxiv

الوصف
الوصف غير متاح.