Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

التفاصيل البيبلوغرافية
العنوان: Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes
المؤلفون: Meng, Si Yi, Orvieto, Antonio, Cao, Daniel Yiming, De Sa, Christopher
سنة النشر: 2024
المجموعة: Computer Science
Mathematics
مصطلحات موضوعية: Computer Science - Machine Learning, Mathematics - Optimization and Control
الوصف: We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical step size $2/\lambda$, where $\lambda$ is the largest eigenvalue of the Hessian at the solution. Using a smaller-than-critical step size guarantees convergence if initialized nearby the solution: but does this suffice globally? In one dimension, we show that a step size less than $1/\lambda$ suffices for global convergence. However, for all step sizes between $1/\lambda$ and the critical step size $2/\lambda$, one can construct a dataset such that GD converges to a stable cycle. In higher dimensions, this is actually possible even for step sizes less than $1/\lambda$. Our results show that although local convergence is guaranteed for all step sizes less than the critical step size, global convergence is not, and GD may instead converge to a cycle depending on the initialization.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2406.05033
رقم الانضمام: edsarx.2406.05033
قاعدة البيانات: arXiv