way to the goal location. Each test required approxi-
mately 15m of robot motion through the maze.
Evaluation of end-to-end RL on different environment
types with non-holonomic vehicles showed the ad-
vantage of training on more complex environments
(partial braid and perfect mazes) in terms of both
probability of success and the length of the found path
to the goal. Validation with real hardware demon-
strated that the assumptions made in terms of the E2E-
RL formulation were realistic.
This work was supported by the Natural Sciences and
Engineering Research Council (NSERC) through the
NSERC Canadian Robotics Network (NCRN).
