Adaptive Width Neural Networks

التفاصيل البيبلوغرافية
العنوان: Adaptive Width Neural Networks
المؤلفون: Errica, Federico, Christiansen, Henrik, Zaverkin, Viktor, Niepert, Mathias, Alesiani, Francesco
سنة النشر: 2025
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
الوصف: For almost 70 years, researchers have mostly relied on hyper-parameter tuning to pick the width of neural networks' layers out of many possible choices. This paper challenges the status quo by introducing an easy-to-use technique to learn an unbounded width of a neural network's layer during training. The technique does not rely on alternate optimization nor hand-crafted gradient heuristics; rather, it jointly optimizes the width and the parameters of each layer via simple backpropagation. We apply the technique to a broad range of data domains such as tables, images, texts, and graphs, showing how the width adapts to the task's difficulty. By imposing a soft ordering of importance among neurons, it is possible to truncate the trained network at virtually zero cost, achieving a smooth trade-off between performance and compute resources in a structured way. Alternatively, one can dynamically compress the network with no performance degradation. In light of recent foundation models trained on large datasets, believed to require billions of parameters and where hyper-parameter tuning is unfeasible due to huge training costs, our approach stands as a viable alternative for width learning.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2501.15889
رقم الانضمام: edsarx.2501.15889
قاعدة البيانات: arXiv