A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

التفاصيل البيبلوغرافية
العنوان: A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization
المؤلفون: Zhehui Chen, Tuo Zhao, Enlu Zhou, Tianyi Liu
المصدر: Stochastic Systems. 11:307-323
بيانات النشر: Institute for Operations Research and the Management Sciences (INFORMS), 2021.
سنة النشر: 2021
مصطلحات موضوعية: Statistics and Probability, Momentum (technical analysis), Stochastic gradient descent, Optimization problem, Computer science, Modeling and Simulation, Bayesian probability, Applied mathematics, Deep neural networks, Management Science and Operations Research, Statistics, Probability and Uncertainty, Heavy traffic approximation
الوصف: Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference, etc.). Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.
تدمد: 1946-5238
DOI: 10.1287/stsy.2021.0083
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::564063907b500f454a63a9d4d5e84039
https://doi.org/10.1287/stsy.2021.0083
Rights: OPEN
رقم الانضمام: edsair.doi...........564063907b500f454a63a9d4d5e84039
قاعدة البيانات: OpenAIRE
الوصف
تدمد:19465238
DOI:10.1287/stsy.2021.0083