Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups

التفاصيل البيبلوغرافية
العنوان: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
المؤلفون: Ghilardi, Davide, Belotti, Federico, Molinari, Marco
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
الوصف: Sparse AutoEnocders (SAEs) have recently been employed as an unsupervised approach for understanding the inner workings of Large Language Models (LLMs). They reconstruct the model's activations with a sparse linear combination of interpretable features. However, training SAEs is computationally intensive, especially as models grow in size and complexity. To address this challenge, we propose a novel training strategy that reduces the number of trained SAEs from one per layer to one for a given group of contiguous layers. Our experimental results on Pythia 160M highlight a speedup of up to 6x without compromising the reconstruction quality and performance on downstream tasks. Therefore, layer clustering presents an efficient approach to train SAEs in modern LLMs.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2410.21508
رقم الانضمام: edsarx.2410.21508
قاعدة البيانات: arXiv