الوصف: |
This thesis concerns sample-efficient embodied machine learning. Machine learning success in sequential decision problems has been limited to domains with a narrow range of goals, requiring orders more experience than humans. Additionally, they lack the ability to generalise to new related goals. In contrast, humans are continual learners. Given their embodiment and computational constraints, humans are forced to reuse knowledge (compressed abstractions of repeated structures present across their lifetime) to tackle novel scenarios in as sample-efficient and safe manner as possible. In robotics, similar traits are desired, given they are also embodied learners. Taking inspiration from humans, the central claim of this thesis is that knowledge abstractions acquired from prior experience can be used to design domain-independent sample-efficient algorithms that improve generalisation across modular domains. We refer to modular domains as Markov decision processes (MDPs) whose optimal policies can be obtained when reasoning and acting occurs over compressed abstractions shared across them. The challenge is how to discover these abstractions with minimal supervision and sample-efficiently. Additionally, for embodied machine learning it is important the approach supports continuous, potentially unbounded, state-action spaces. Adhering to these constraints, we first develop novel self- (Chapter 3) and weakly-supervised (Chapter 4) knowledge abstraction, domain adaptation, methods for zero-shot generalisation to unseen domains. We demonstrate their potential on robotic applications including sim2real transfer (Chapter 3) and generalisation using a human-robot command interface (Chapter 4). We continue by developing novel unsupervised knowledge abstraction, transfer learning, methods for sample-efficient adaptation to unseen domains (Chapters 5 and 6). We highlight their relevance in robotics and continual learning. We introduce a hierarchical KL-regularised RL approach based on novel theory behind the transferability-expressivity trade-off of abstractions (Chapter 5) and develop the first, to our knowledge, bottleneck-options approach adhering to the aforementioned embodied machine learning constraints (Chapter 6). |