التفاصيل البيبلوغرافية
العنوان: |
Offline Planning and Online Learning Under Recovering Rewards. |
المؤلفون: |
Simchi-Levi, David1,2,3 (AUTHOR) dslevi@mit.edu, Zheng, Zeyu4 (AUTHOR) zyzheng@berkeley.edu, Zhu, Feng3 (AUTHOR) fengzhu@mit.edu |
المصدر: |
Management Science. Jan2025, Vol. 71 Issue 1, p298-317. 20p. |
مصطلحات موضوعية: |
*DATA science, *ELECTRONIC commerce, MULTI-armed bandit problem (Probability theory), ONLINE education, REGRET |
مستخلص: |
Motivated by emerging applications, such as live-streaming e-commerce, promotions, and recommendations, we introduce and solve a general class of nonstationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to K(≥1) out of N different arms in each time period and (ii) the expected reward of an arm immediately drops after it is pulled and then nonparametrically recovers as the arm's idle time increases. With the objective of maximizing the expected cumulative reward over T time periods, we design a class of purely periodic policies that jointly set a period to pull each arm. For the proposed policies, we prove performance guarantees for both the offline and the online problems. For the offline problem when all model parameters are known, the proposed periodic policy obtains a long-run approximation ratio that is at the order of 1−O(1/K) , which is asymptotically optimal when K grows to infinity. For the online problem when the model parameters are unknown and need to be dynamically learned, we integrate the offline periodic policy with the upper confidence bound procedure to construct on online policy. The proposed online policy is proved to approximately have O˜(NT) regret against the offline benchmark. Our framework and policy design may shed light on broader offline planning and online learning applications with nonstationary and recovering rewards. This paper was accepted by J. George Shanthikumar, data science. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2021.04202. [ABSTRACT FROM AUTHOR] |
|
Copyright of Management Science is the property of INFORMS: Institute for Operations Research and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
قاعدة البيانات: |
Business Source Index |