Direct Value Learning: a Preference-based Approach to Reinforcement Learning

التفاصيل البيبلوغرافية
العنوان: Direct Value Learning: a Preference-based Approach to Reinforcement Learning
المؤلفون: Meunier, David, Deguchi, Yutaka, Akrour, Riad, Suzuki, Enoshin, Schoenauer, Marc, Sebag, Michèle
المساهمون: Laboratoire de Recherche en Informatique (LRI), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Machine Learning and Optimisation (TAO), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Dept. Informatics, ISEE, Kyushu University, Johannes Fürnkranz and Eyke Hüllermeier
المصدر: ECAI-12 Workshop on Preference Learning: Problems and Applications in AI ; https://inria.hal.science/hal-00932976 ; ECAI-12 Workshop on Preference Learning: Problems and Applications in AI, Aug 2012, Montpellier, France. pp.42-47 ; www2.lirmm.fr/ecai2012/images/stories/ecai_doc/pdf/workshop/W30_PL12-Proceedings.pdf
بيانات النشر: HAL CCSD
سنة النشر: 2012
المجموعة: Université de Rennes 1: Publications scientifiques (HAL)
مصطلحات موضوعية: ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.6: Learning, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
جغرافية الموضوع: Montpellier, France
الوصف: International audience ; Learning by imitation, among the most promising techniques for reinforcement learning in complex domains, critically depends on the human designer ability to provide sufficiently many demonstrations of satisfactory quality. The approach presented in this paper, referred to as DIVA (Direct Value Learning for Reinforcement Learning), aims at addressing both above limitations by exploiting simple experiments. The approach stems from a straightforward remark: while it is rather easy to set a robot in a target situation, the quality of its situation will naturally deteriorate upon the action of naive controllers. The demonstration of such naive controllers can thus be used to learn directly a value function, through a preference learning approach. Under some conditions on the transition model, this value function enables to define an optimal controller. The DIVA approach is experimentally demonstrated by teaching a robot to follow another robot. Importantly, the approach does not require any robotic simulator to be available, nor does it require any pattern-recognition primitive (e.g. seeing the other robot) to be provided.
نوع الوثيقة: conference object
اللغة: English
Relation: hal-00932976; https://inria.hal.science/hal-00932976; https://inria.hal.science/hal-00932976/document; https://inria.hal.science/hal-00932976/file/06-sebag.pdf
الاتاحة: https://inria.hal.science/hal-00932976
https://inria.hal.science/hal-00932976/document
https://inria.hal.science/hal-00932976/file/06-sebag.pdf
Rights: info:eu-repo/semantics/OpenAccess
رقم الانضمام: edsbas.B08D31D4
قاعدة البيانات: BASE