Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

التفاصيل البيبلوغرافية
العنوان:	Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment
المؤلفون:	Hu, Bingshan, Huang, Zhiming, Mehta, Nishant A., Hegde, Nidhi
سنة النشر:	2021
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Machine Learning
الوصف:	In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O \left(\sum_{j: \Delta_j>0} \frac{\ln(T)}{\min \left\{\Delta_j, \epsilon \right\}} \right)$ instance-dependent regret bound, where $T$ is the finite learning horizon, $\Delta_j$ denotes the suboptimality gap between the optimal arm and a suboptimal arm $j$, and $\epsilon$ is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an $\Omega \left(\frac{\ln(K)}{\min \left\{\Delta_{\min}, \epsilon \right\}} \right)$ instance-dependent regret lower bound and an $\Omega\left(\sqrt{T\ln(K)} + \frac{\ln(K)}{\epsilon}\right)$ minimax lower bound, where $K$ is the total number of actions and $\Delta_{\min}$ denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an $\epsilon$-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra $\log(T)$ factor. Comment: 40 pages. New in v3: (i) Removed Hybrid-UCB (although its analysis is correct to our knowledge); (ii) Added Lazy-DP-TS from UAI 2022 paper of Hu and Hegde (2022)
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2102.07929
رقم الانضمام:	edsarx.2102.07929
قاعدة البيانات:	arXiv

View record in Arxiv

الوصف
الوصف غير متاح.