Report
Investigating Efficacy of Perplexity in Detecting LLM-Generated Code
العنوان: | Investigating Efficacy of Perplexity in Detecting LLM-Generated Code |
---|---|
المؤلفون: | Xu, Jinwei, Zhang, He, Yang, Yanjin, Cheng, Zeru, Lyu, Jun, Liu, Bohan, Zhou, Xin, Yang, Lanxin, Bacchelli, Alberto, Chiam, Yin Kia, Chiew, Thiam Kian |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Software Engineering |
الوصف: | Large language model-generated code (LLMgCode) has become increasingly prevalent in software development. Many studies report that LLMgCode has more quality and security issues than human-authored code (HaCode). It is common for LLMgCode to mix with HaCode in a code change, while the change is signed by only human developers, without being carefully checked. Many automated methods have been proposed to detect LLMgCode from HaCode, in which the perplexity-based method (PERPLEXITY for short) is the state-of-the-art method. However, the efficacy evaluation of PERPLEXITY has focused on the detection accuracy. In this article, we are interested in whether PERPLEXITY is good enough in a wider range of realistic evaluation settings. To this end, we devise a large-scale dataset that includes 11,664 HaCode snippets and 13,164 LLMgCode snippets, and based on that, we carry out a family of experiments to compare PERPLEXITY against feature-based and pre-training-based methods from three perspectives: (1) detection accuracy in terms of programming language, degree of difficulty, and scale of solution, (2) generalization capability, and (3) inference efficiency. The experimental results show that PERPLEXITY has the best generalization capability while it has low accuracy and efficiency in most cases. Based on the experimental results and detection mechanism of PERPLEXITY, we discuss implications into both the strengths and limitations of PERPLEXITY, e.g., PERPLEXITY is unsuitable for high-level programming languages while it has good interpretability. As the first large-scale investigation on detecting LLMgCode from HaCode, this article provides a wide range of evidence for future improvement. Comment: 15 pages, 6 images, 10 tables; submitted to TSE journal |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2412.16525 |
رقم الانضمام: | edsarx.2412.16525 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |