which therefore increases. Hence, simulated quan-
tum annealing is applied to a bigger formulation. A
bigger formulation demands more qubits, which may
limit the accuracy, variation and stability of the quan-
tum annealing algorithm. This is only an assumption
and needs to be examined more closely. Neumann et
al. (2020) also already stated, that Q-RL is limited
by the current Quantum Processing Unit (QPU) size.
However, with the extension of an Experience Replay
Buffer and Target Network, we are able to stabilize
learning and therefore may reduce the needed QPU
size compare to previous approaches.
Quantum sampling has been proven to be a
promising method to enhance reinforcement learn-
ing tasks to speed-up learning in relation to needed
time steps (Neumann et al., 2020). Further work con-
cerning the relation between QPU size and domain
complexity (respectively state input) would needed to
strictly determine current limitations.
ACKNOWLEDGEMENTS
This work was funded by the BMWi project PlanQK
(01MK20005I).
REFERENCES
Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985). A
learning algorithm for boltzmann machines. Cognitive
Science, 9(1):147 – 169.
Benedetti, M., Realpe-G
´
omez, J., and Perdomo-Ortiz, A.
(2018). Quantum-assisted helmholtz machines: A
quantum–classical deep learning framework for in-
dustrial datasets in near-term devices. Quantum Sci-
ence and Technology, 3(3):034007.
Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe,
N., and Lloyd, S. (2017). Quantum machine learning.
Nature, 549(7671):195–202.
Binmore, K. (2007). Game Theory: A Very Short Introduc-
tion. Oxford University Press.
Charpentier, A., Elie, R., and Remlinger, C. (2020). Rein-
forcement learning in economics and finance.
Crawford, D., Levit, A., Ghadermarzy, N., Oberoi, J. S.,
and Ronagh, P. (2019). Reinforcement learning using
quantum boltzmann machines.
Foerster, J. N., Assael, Y. M., de Freitas, N., and Whiteson,
S. (2016). Learning to communicate with deep multi-
agent reinforcement learning.
Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., and
Whiteson, S. (2017). Counterfactual multi-agent pol-
icy gradients. In AAAI.
Jerbi, S., Trenkwalder, L. M., Nautrup, H. P., Briegel, H. J.,
and Dunjko, V. (2020). Quantum enhancements for
deep reinforcement learning in large spaces.
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov,
M., Tunyasuvunakool, K., Ronneberger, O., Bates,
R.,
ˇ
Z
´
ıdek, A., Bridgland, A., Meyer, C., Kohl, S.
A. A., Potapenko, A., Ballard, A. J., Cowie, A.,
Romera-Paredes, B., Nikolov, S., Jain, R., Adler,
J., Back, T., Petersen, S., Reiman, D., Steinegger,
M., Pacholska, M., Silver, D., Vinyals, O., Senior,
A. W., Kavukcuoglu, K., Kohli, P., and Hassabis, D.
(2020). High accuracy protein structure prediction us-
ing deep learning. In Fourteenth Critical Assessment
of Techniques for Protein Structure Prediction (Ab-
stract Book), 14.
Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Sallab, A.
A. A., Yogamani, S., and P
´
erez, P. (2020). Deep rein-
forcement learning for autonomous driving: A survey.
Levit, A., Crawford, D., Ghadermarzy, N., Oberoi, J. S.,
Zahedinejad, E., and Ronagh, P. (2017). Free energy-
based reinforcement learning using a quantum proces-
sor.
Li, R. Y., Di Felice, R., Rohs, R., and Lidar, D. A. (2018).
Quantum annealing versus classical machine learning
applied to a simplified computational biology prob-
lem. npj Quantum Information, 4(1).
Mahmud, M., Kaiser, M. S., Hussain, A., and Vassanelli,
S. (2018). Applications of deep learning and rein-
forcement learning to biological data. IEEE Trans-
actions on Neural Networks and Learning Systems,
29(6):2063–2079.
Mcclelland, J., Mcnaughton, B., and O’Reilly, R. (1995).
Why there are complementary learning systems in the
hippocampus and neocortex: Insights from the suc-
cesses and failures of connectionist models of learning
and memory. Psychological review, 102:419–57.
Mirhoseini, A., Goldie, A., Yazgan, M., Jiang, J., Songhori,
E., Wang, S., Lee, Y.-J., Johnson, E., Pathak, O., Bae,
S., Nazi, A., Pak, J., Tong, A., Srinivasa, K., Hang, W.,
Tuncer, E., Babu, A., Le, Q. V., Laudon, J., Ho, R.,
Carpenter, R., and Dean, J. (2020). Chip placement
with deep reinforcement learning.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J.,
Bellemare, M., Graves, A., Riedmiller, M., Fidjeland,
A., Ostrovski, G., Petersen, S., Beattie, C., Sadik,
A., Antonoglou, I., King, H., Kumaran, D., Wierstra,
D., Legg, S., and Hassabis, D. (2015). Human-level
control through deep reinforcement learning. Nature,
518:529–33.
Morita, S. and Nishimori, H. (2008). Mathematical founda-
tion of quantum annealing. Journal of Mathematical
Physics, 49(12):125210.
Neukart, F., Compostella, G., Seidel, C., von Dollen, D.,
Yarkoni, S., and Parney, B. (2017a). Traffic flow opti-
mization using a quantum annealer.
Neukart, F., Dollen, D. V., Seidel, C., and Compostella, G.
(2017b). Quantum-enhanced reinforcement learning
for finite-episode games with discrete state spaces.
Neumann, N., Heer, P., Chiscop, I., and Phillipson, F.
(2020). Multi-agent reinforcement learning using sim-
ulated quantum annealing.
Neven, H., Denchev, V. S., Rose, G., and Macready, W. G.
(2008). Training a binary classifier with the quantum
adiabatic algorithm.
Towards Multi-agent Reinforcement Learning using Quantum Boltzmann Machines
129