Search for a command to run...
Reinforcement learning describes how a learning agent can achieve optimal behaviour based on in-teractions with its environment and reward feed-back. A limiting factor in reinforcement learning as employed in artificial intelligence is the need for an often prohibitively large number of environ-ment samples before the agent reaches a desirable level of performance. Learning from demonstra-tion is an approach that provides the agent with demonstrations by a supposed expert, from which it should derive suitable behaviour. Yet, one of the challenges of learning from demonstration is that no guarantees can be provided for the qual-ity of the demonstrations, and thus the learned be-havior. In this paper, we investigate the intersec-tion of these two approaches, leveraging the theo-retical guarantees provided by reinforcement learn-ing, and using expert demonstrations to speed up this learning by biasing exploration through a pro-cess called reward shaping. This approach allows us to leverage human input without making an er-roneous assumption regarding demonstration op-timality. We show experimentally that this ap-proach requires significantly fewer demonstrations, is more robust against suboptimality of demonstra-tions, and achieves much faster learning than the recently developed HAT algorithm. 1