Machine Learning-Volume 70, pages 166–175. JMLR.
org.
Bacchus, F., Boutilier, C., and Grove, A. (1996). Rewarding
behaviors. In Proceedings of the National Conference
on Artificial Intelligence, pages 1160–1167.
Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P.,
Vitvitskyi, A., Guo, D., and Blundell, C. (2020).
Agent57: Outperforming the atari human benchmark.
arXiv preprint arXiv:2003.13350.
Bellemare, M. G., Candido, S., Castro, P. S., Gong, J.,
Machado, M. C., Moitra, S., Ponda, S. S., and
Wang, Z. (2020). Autonomous navigation of strato-
spheric balloons using reinforcement learning. Na-
ture, 588(7836):77–82.
Bellemare, M. G., Dabney, W., and Munos, R. (2017).
A distributional perspective on reinforcement learn-
ing. In Proceedings of the 34th International Con-
ference on Machine Learning-Volume 70, pages 449–
458. JMLR. org.
Brafman, R., De Giacomo, G., and Patrizi, F. (2018).
Ltlf/ldlf non-markovian rewards. In Proceedings of
the AAAI Conference on Artificial Intelligence, vol-
ume 32.
Chevalier-Boisvert, M., Willems, L., and Pal, S. (2018).
Minimalistic gridworld environment for openai gym.
https://github.com/maximecb/gym-minigrid.
Giacomo, G. D., Iocchi, L., Favorito, M., and Patrizi, F.
(2019). Foundations for restraining bolts: Reinforce-
ment learning with LTLf/LDLf restraining specifica-
tions. In Benton, J., Lipovetzky, N., Onaindia, E.,
Smith, D. E., and Srivastava, S., editors, Proceed-
ings of the Twenty-Ninth International Conference on
Automated Planning and Scheduling, ICAPS 2018,
Berkeley, CA, USA, July 11-15, 2019, pages 128–136.
AAAI Press.
Hammond, L., Abate, A., Gutierrez, J., and Wooldridge,
M. (2021). Multi-agent reinforcement learning with
temporal logic specifications. In Proceedings of the
20th International Conference on Autonomous Agents
and MultiAgent Systems, pages 583–592.
Hill, F., Lampinen, A., Schneider, R., Clark, S., Botvinick,
M., McClelland, J. L., and Santoro, A. (2020). Envi-
ronmental drivers of systematicity and generalization
in a situated agent. In International Conference on
Learning Representations.
Hill, F., Tieleman, O., von Glehn, T., Wong, N., Merzic, H.,
and Clark, S. (2021). Grounded language learning fast
and slow. In International Conference on Learning
Representations.
Icarte, R. T., Waldie, E., Klassen, T., Valenzano, R., Cas-
tro, M., and McIlraith, S. (2019). Learning reward
machines for partially observable reinforcement learn-
ing. In Advances in Neural Information Processing
Systems, pages 15497–15508.
Illanes, L., Yan, X., Icarte, R. T., and McIlraith, S. A.
(2020). Symbolic plans as high-level instructions for
reinforcement learning. In Proceedings of the In-
ternational Conference on Automated Planning and
Scheduling, volume 30, pages 540–550.
Kupferman, O. and Vardi, M. Y. (2001). Formal Methods
in System Design, 19(3):291–314.
Lake, B. M. (2019). Compositional generalization through
meta sequence-to-sequence learning. In Advances in
Neural Information Processing Systems, pages 9788–
9798.
Le
´
on, B. G. and Belardinelli, F. (2020). Extended markov
games to learn multiple tasks in multi-agent reinforce-
ment learning. In Giacomo, G. D., Catal
´
a, A., Dilkina,
B., Milano, M., Barro, S., Bugar
´
ın, A., and Lang, J.,
editors, ECAI 2020 - 24th European Conference on
Artificial Intelligence, 29 August-8 September 2020,
Santiago de Compostela, Spain, August 29 - Septem-
ber 8, 2020 - Including 10th Conference on Pres-
tigious Applications of Artificial Intelligence (PAIS
2020), volume 325 of Frontiers in Artificial Intelli-
gence and Applications, pages 139–146. IOS Press.
Le
´
on, B. G., Shanahan, M., and Belardinelli, F. (2020). Sys-
tematic generalisation through task temporal logic and
deep reinforcement learning. CoRR, abs/2006.08767.
Le
´
on, B. G., Shanahan, M., and Belardinelli, F. (2021).
In a nutshell, the human asked for this: Latent goals
for following temporal specifications. arXiv preprint
arXiv:2110.09461.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M., Fid-
jeland, A. K., Ostrovski, G., et al. (2015). Human-
level control through deep reinforcement learning.
Nature, 518(7540):529.
Oh, J., Singh, S., Lee, H., and Kohli, P. (2017). Zero-
shot task generalization with multi-task deep rein-
forcement learning. In Proceedings of the 34th In-
ternational Conference on Machine Learning-Volume
70, pages 2661–2670. JMLR. org.
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Fo-
erster, J., and Whiteson, S. (2018). Qmix: Monotonic
value function factorisation for deep multi-agent rein-
forcement learning. In International Conference on
Machine Learning, pages 4295–4304. PMLR.
Samvelyan, M., Rashid, T., Schroeder de Witt, C., Farquhar,
G., Nardelli, N., Rudner, T. G., Hung, C.-M., Torr,
P. H., Foerster, J., and Whiteson, S. (2019). The star-
craft multi-agent challenge. In Proceedings of the 18th
International Conference on Autonomous Agents and
MultiAgent Systems, pages 2186–2188. International
Foundation for Autonomous Agents and Multiagent
Systems.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai,
M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D.,
Graepel, T., et al. (2017). Mastering chess and shogi
by self-play with a general reinforcement learning al-
gorithm. arXiv preprint arXiv:1712.01815.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learn-
ing: An introduction. MIT press.
Toro Icarte, R., Klassen, T. Q., Valenzano, R., and McIl-
raith, S. A. (2018). Teaching multiple tasks to an RL
agent using LTL. In Proceedings of the 17th Interna-
tional Conference on Autonomous Agents and Multi-
Agent Systems, pages 452–461.
Toro Icarte, R., Waldie, E., Klassen, T., Valenzano, R., Cas-
tro, M., and McIlraith, S. (2019). Learning reward ma-
ICAART 2022 - 14th International Conference on Agents and Artificial Intelligence
456