Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

التفاصيل البيبلوغرافية
العنوان: Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment
المؤلفون: Hu, Bingshan, Huang, Zhiming, Mehta, Nishant A., Hegde, Nidhi
سنة النشر: 2021
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Machine Learning
الوصف: In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O \left(\sum_{j: \Delta_j>0} \frac{\ln(T)}{\min \left\{\Delta_j, \epsilon \right\}} \right)$ instance-dependent regret bound, where $T$ is the finite learning horizon, $\Delta_j$ denotes the suboptimality gap between the optimal arm and a suboptimal arm $j$, and $\epsilon$ is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an $\Omega \left(\frac{\ln(K)}{\min \left\{\Delta_{\min}, \epsilon \right\}} \right)$ instance-dependent regret lower bound and an $\Omega\left(\sqrt{T\ln(K)} + \frac{\ln(K)}{\epsilon}\right)$ minimax lower bound, where $K$ is the total number of actions and $\Delta_{\min}$ denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an $\epsilon$-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra $\log(T)$ factor.
Comment: 40 pages. New in v3: (i) Removed Hybrid-UCB (although its analysis is correct to our knowledge); (ii) Added Lazy-DP-TS from UAI 2022 paper of Hu and Hegde (2022)
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2102.07929
رقم الانضمام: edsarx.2102.07929
قاعدة البيانات: arXiv