التفاصيل البيبلوغرافية
العنوان: |
Offline reinforcement learning for ambulance dispatch |
المؤلفون: |
Lamarca Ferrés, Enric |
المساهمون: |
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Stephan Robert, Martín Muñoz, Mario |
بيانات النشر: |
Universitat Politècnica de Catalunya |
سنة النشر: |
2023 |
المجموعة: |
Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge |
مصطلحات موضوعية: |
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial, Reinforcement learning, Data sets, Aprenentatge de Reforç Fora de Línia, Aprenentatge d'Imitació, Clonació Conductual, Conservative Q-Learning, Behaviour Regularized Actor-Critic, Q-Learning, Xarxes Neuronals Profundes, Enviament d'ambulàncies, Cerca Aleatòria, Construcció de conjunt de dades, Offline Reinforcement Learning, Imitation Learning, Behavioural Cloning, Deep Neural Networks, Ambulance Dispatch, Random Search, Dataset Building, Aprenentatge per reforç, Conjunts de dades |
الوصف: |
This master's thesis is focused on applying offline reinforcement learning techniques to the ambulance dispatch problem, which involves selecting the most appropriate ambulance to dispatch when an incident occurs. The research is part of the SIA-REMU project, conducted both in France and Switzerland. An incident has an associated priority level. Incidents of priority 0 are vital emergencies, incidents of priority 1 are non-vital emergencies, and incidents of priority 2 are non-emergencies. The primary objective of the presented work is to train reinforcement learning agents capable of prioritizing incidents appropriately when dispatching ambulances. Initially, a dataset of experiences was constructed using data provided by the Centre de régulation du Centre Hospitalier Universitaire Vaudois (CHUV), which contains valuable information about incidents, interventions, and resources. This dataset of experiences served as a static dataset for training reinforcement learning agents in an offline setting, without interacting with an environment. State-of-the-art offline reinforcement learning algorithms were employed to train the agents, and their hyperparameters were tuned performing a Random Search. To evaluate and test the trained agents, a virtual environment was implemented. Finally, the policies learned by the agents were analyzed to draw meaningful conclusions from the obtained results. |
نوع الوثيقة: |
master thesis |
وصف الملف: |
application/pdf |
اللغة: |
English |
Relation: |
http://hdl.handle.net/2117/405475; 178087 |
الاتاحة: |
http://hdl.handle.net/2117/405475 |
Rights: |
Open Access |
رقم الانضمام: |
edsbas.C62FFB50 |
قاعدة البيانات: |
BASE |