respect to a given normative system, and assigning
punishments to a MORL agent simultaneously learn-
ing to achieve ethical and non-ethical objectives.
NGRL offers more versatility with respect to the
complexity of the norms to be adhered to, than di-
rectly assigning rewards to specific events or with re-
spect to simple constraints. It may be the case that
there is no obvious or coherent way to summarize an
entire normative system by selecting specific events
and assigning punishments to them. By using NGRL,
we expand what kinds of normatively compliant be-
haviour we can learn, and are allowed to specify them
in a more natural way.
Our experimental results showed that NGRL was
effective in producing an agent that learned to avoid
most violations – even in a stochastic environment –
while still pursuing its non-ethical goal. However,
these results also revealed that we achieve optimal
results when we use NGRL in conjunction with the
normative supervisor as originally intended, as a real-
time compliance-checker. NGRL allows us to circum-
vent the weaknesses of the normative supervision ap-
proach – namely, its inability to preemptively avoid
violations – while normative supervision allows us to
maintain a better guarantee of compliance.
As discussed in Sect. 3.3.1, NGRL can be further
developed in its handling of normative conflict and
contrary-to-duty obligations. Moreover, as this ap-
proach applies only to MORL variants of Q-learning,
it will fall prey to the same scaling issues. Adapting
NGRL to be used with Q-learning with function ap-
proximation, for example, will broaden the domains
to which NGRL can applied.
