Report
Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignment
العنوان: | Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignment |
---|---|
المؤلفون: | Zhou, Weichao, Li, Wenchao |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Machine Learning, Computer Science - Artificial Intelligence |
الوصف: | Many imitation learning (IL) algorithms use inverse reinforcement learning (IRL) to infer a reward function that aligns with the demonstration. However, the inferred reward functions often fail to capture the underlying task objectives. In this paper, we propose a novel framework for IRL-based IL that prioritizes task alignment over conventional data alignment. Our framework is a semi-supervised approach that leverages expert demonstrations as weak supervision to derive a set of candidate reward functions that align with the task rather than only with the data. It then adopts an adversarial mechanism to train a policy with this set of reward functions to gain a collective validation of the policy's ability to accomplish the task. We provide theoretical insights into this framework's ability to mitigate task-reward misalignment and present a practical implementation. Our experimental results show that our framework outperforms conventional IL baselines in complex and transfer learning scenarios. Comment: arXiv admin note: substantial text overlap with arXiv:2306.01731 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2410.23680 |
رقم الانضمام: | edsarx.2410.23680 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |