Report
Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment
العنوان: | Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment |
---|---|
المؤلفون: | Hu, Bingshan, Huang, Zhiming, Mehta, Nishant A., Hegde, Nidhi |
سنة النشر: | 2021 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Machine Learning |
الوصف: | In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O \left(\sum_{j: \Delta_j>0} \frac{\ln(T)}{\min \left\{\Delta_j, \epsilon \right\}} \right)$ instance-dependent regret bound, where $T$ is the finite learning horizon, $\Delta_j$ denotes the suboptimality gap between the optimal arm and a suboptimal arm $j$, and $\epsilon$ is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an $\Omega \left(\frac{\ln(K)}{\min \left\{\Delta_{\min}, \epsilon \right\}} \right)$ instance-dependent regret lower bound and an $\Omega\left(\sqrt{T\ln(K)} + \frac{\ln(K)}{\epsilon}\right)$ minimax lower bound, where $K$ is the total number of actions and $\Delta_{\min}$ denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an $\epsilon$-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra $\log(T)$ factor. Comment: 40 pages. New in v3: (i) Removed Hybrid-UCB (although its analysis is correct to our knowledge); (ii) Added Lazy-DP-TS from UAI 2022 paper of Hu and Hegde (2022) |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2102.07929 |
رقم الانضمام: | edsarx.2102.07929 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |