Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignment

التفاصيل البيبلوغرافية
العنوان:	Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignment
المؤلفون:	Zhou, Weichao, Li, Wenchao
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Machine Learning, Computer Science - Artificial Intelligence
الوصف:	Many imitation learning (IL) algorithms use inverse reinforcement learning (IRL) to infer a reward function that aligns with the demonstration. However, the inferred reward functions often fail to capture the underlying task objectives. In this paper, we propose a novel framework for IRL-based IL that prioritizes task alignment over conventional data alignment. Our framework is a semi-supervised approach that leverages expert demonstrations as weak supervision to derive a set of candidate reward functions that align with the task rather than only with the data. It then adopts an adversarial mechanism to train a policy with this set of reward functions to gain a collective validation of the policy's ability to accomplish the task. We provide theoretical insights into this framework's ability to mitigate task-reward misalignment and present a practical implementation. Our experimental results show that our framework outperforms conventional IL baselines in complex and transfer learning scenarios. Comment: arXiv admin note: substantial text overlap with arXiv:2306.01731
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2410.23680
رقم الانضمام:	edsarx.2410.23680
قاعدة البيانات:	arXiv

View record in Arxiv

الوصف
الوصف غير متاح.