Table 4: Results of the algorithms in all environments. The columns on the right point to the tested alleged symmetry, indexed
by its label as reported in Table 1 for the toroidal Grid, Table 2 for CartPole and Table 3 for Acrobot. All experiments were
performed with q = 0.1. The true symmetries are displayed in bold.
Environment Metrics Alleged Transformation
TRSAI SDAI ODAI ODWA TI TIOD
Grid
ν
k
0.6 ± 0.1 0.0 ± 0.0 0.4 ± 0.1 0.0 ± 0.0 0.5 ± 0.1 0.0 ± 0.0
∆ 27 ± 9 -14 ± 3 47 ± 12 -14 ± 3 39 ± 9 -17 ± 2
SAR ISR AI SFI TI
CartPole
ν
k
0.80 ± 0.05 0.00 ± 0.00 0.00 ± 0.00 0.05 ± 0.03 0.25 ± 0.23
∆ 4 ± 2 ×
×
×1
1
10
0
0
−4
-10 ± 1 × 10
−2
-8 ± 1 × 10
−3
-4 ± 3 × 10
−3
3 ± 2 ×
×
×1
1
10
0
0
−4
AAVI CAVI AI SSI
Acrobot
ν
k
0.86 ± 0.03 0.00 ± 0.00 0.35 ± 0.09 0.00 ± 0.00
∆ 3.3 ± 6.6 ×
×
×1
1
10
0
0
−3
−3.9 ± 1.9 × 10
−2
−0.6 ± 1.3 × 10
−2
−9.5 ± 4.3 × 10
−2
tificial Intelligence and Machine Learning group of
the UPF for their warm hospitality.
REFERENCES
Abel, D., Umbanhowar, N., Khetarpal, K., Arumugam, D.,
Precup, D., and Littman, M. (2020). Value Preserv-
ing State-Action Abstractions. In International Con-
ference on Artificial Intelligence and Statistics, pages
1639–1650. PMLR.
Bellman, R. (1966). Dynamic Programming. Science,
153(3731):34–37.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nAI Gym. arXiv preprint arXiv:1606.01540.
Castro, P. S. (2020). Scalable Methods for Computing
State Similarity in Deterministic Markov Decision
Processes. In Proceedings of the AAAI Conference
on Artificial Intelligence, volume 34, pages 10069–
10076.
Dean, T. and Givan, R. (1997). Model Minimization in
Markov Decision Processes. In AAAI/IAAI, pages
106–111.
Dinh, L., Krueger, D., and Bengio, Y. (2015). NICE: Non-
linear Independent Components Estimation. In Ben-
gio, Y. and LeCun, Y., editors, 3rd International Con-
ference on Learning Representations, ICLR 2015, San
Diego, CA, USA, May 7-9, 2015, Workshop Track Pro-
ceedings.
Ferns, N., Panangaden, P., and Precup, D. (2004). Metrics
for Finite Markov Decision Processes. In UAI, vol-
ume 4, pages 162–169.
Givan, R., Dean, T., and Greig, M. (2003). Equivalence
Notions and Model Minimization in Markov Decision
Processes. Artificial Intelligence, 147(1-2):163–223.
Grathwohl, W., Chen, R. T. Q., Bettencourt, J., Sutskever, I.,
and Duvenaud, D. (2019). FFJORD: Free-Form Con-
tinuous Dynamics for Scalable Reversible Generative
Models. In 7th International Conference on Learning
Representations, ICLR 2019, New Orleans, LA, USA,
May 6-9, 2019. OpenReview.net.
Gross, D. J. (1996). The role of symmetry in fundamen-
tal physics. Proceedings of the National Academy of
Sciences, 93(25):14256–14259.
Kobyzev, I., Prince, S., and Brubaker, M. (2020). Normal-
izing Flows: An Introduction and Review of Current
Methods. IEEE Transactions on Pattern Analysis and
Machine Intelligence.
Levine, S., Kumar, A., Tucker, G., and Fu, J. (2020). Offline
reinforcement learning: Tutorial, review, and perspec-
tives on open problems. ArXiv, abs/2005.01643.
Li, L., Walsh, T. J., and Littman, M. L. (2006). Towards
a Unified Theory of State Abstraction for MDPs.
ISAIM, 4:5.
Mandel, T., Liu, Y.-E., Brunskill, E., and Popovic, Z.
(2016). Efficient Bayesian Clustering for Reinforce-
ment Learning. In IJCAI, pages 1830–1838.
Mausam and Kolobov, A. (2012). Planning with Markov
Decision Processes: An AI Perspective. Morgan &
Claypool Publishers.
Narayanamurthy, S. M. and Ravindran, B. (2008). On the
Hardness of Finding Symmetries in Markov Decision
Processes. In Proceedings of the 25th international
conference on Machine learning, pages 688–695.
Ravindran, B. and Barto, A. G. (2001). Symmetries and
Model Minimization in Markov Decision Processes.
Technical report, USA.
Ravindran, B. and Barto, A. G. (2004). Approximate Ho-
momorphisms: A Framework for Non-exact Mini-
mization in Markov Decision Processes.
Ruan, S. S., Comanici, G., Panangaden, P., and Precup, D.
(2015). Representation Discovery for MDPs Using
Bisimulation Metrics. In Twenty-Ninth AAAI Confer-
ence on Artificial Intelligence.
Shorten, C. and Khoshgoftaar, T. M. (2019). A survey on
Image Data Augmentation for Deep Learning. Jour-
nal of Big Data, 6(1):1–48.
Sutton, R. S. (1990). Integrated Architectures for Learning,
Planning, and Reacting Based on Approximating Dy-
namic Programming. In Porter, B. and Mooney, R.,
editors, Machine Learning Proceedings 1990, pages
216–224. Morgan Kaufmann, San Francisco (CA).
Taylor, J., Precup, D., and Panagaden, P. (2009). Bound-
ing Performance Loss in Approximate MDP Homo-
morphisms. In Koller, D., Schuurmans, D., Bengio,
Expert-guided Symmetry Detection in Markov Decision Processes
97