Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups

التفاصيل البيبلوغرافية
العنوان:	Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
المؤلفون:	Ghilardi, Davide, Belotti, Federico, Molinari, Marco
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computation and Language, Computer Science - Artificial Intelligence
الوصف:	Sparse AutoEnocders (SAEs) have recently been employed as an unsupervised approach for understanding the inner workings of Large Language Models (LLMs). They reconstruct the model's activations with a sparse linear combination of interpretable features. However, training SAEs is computationally intensive, especially as models grow in size and complexity. To address this challenge, we propose a novel training strategy that reduces the number of trained SAEs from one per layer to one for a given group of contiguous layers. Our experimental results on Pythia 160M highlight a speedup of up to 6x without compromising the reconstruction quality and performance on downstream tasks. Therefore, layer clustering presents an efficient approach to train SAEs in modern LLMs.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2410.21508
رقم الانضمام:	edsarx.2410.21508
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.