Figure 7: Q values of machine 2.
9 CONCLUSIONS AND FUTURE
WORK
In this paper, we tackled the low learning efficiency
in applying fuzzy Q-learning to Ms. PacMan in the
previous method. To increase the learning efficiency,
we proposed UCB fuzzy Q-learning by combining
fuzzy Q-learning and the UCBQ algorithm that can
learn search and utilization in a well-balanced manner
by reflecting the number of times an action is selected
to eliminate local optimal solutions. In the experiment,
the proposed method was applied to Ms. PacMan. As
a result, the score of the proposed method exceeded
the score of the previous method about 100 points. It
was also shown that a lower constant used in the
proposed method resulted in a higher score.
Our future work is to clarify difference in scores
between our method and the previous method by
making the program cope with maps on stages 3 and
later of Ms. PacMan. In addition, since it is
considered that the influence of an action selection in
UCB fuzzy Q-learning increases with the number of
actions, it is necessary to verify whether the score
increases by further dividing actions. Also, to confirm
the learning efficiency, it is necessary to obtain scores
for many executions of the learning of each machine
and compare the transitions. To confirm its usefulness,
it is necessary to compare it also with DQN.
REFERENCES
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-
time Analysis of the Multiarmed Bandit Problem.
Machine Learning, 47, 235-256.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., & Tang, J. Z. (2016). OpenAI Gym.
arXiv, 1606.01540.
DeLooze, L. L., & Viner, W. R. (2009). Fuzzy Q-Learning
in a Nondeterministic Environment: Developing an
Intelligent Ms. Pac-Man Agent. Proc. IEEE Symposium
on Computational Intelligence and Games, 162-169.
Glorennec, P. Y. (1994). Fuzzy Q-Learning and Dynamical
Fuzzy. Proc. IEEE International Fuzzy Systems
Conference, 26-29.
Gu, D., & Hu, H. (2004). Accuracy based fuzzy Q-learning
for robot behaviours. Proc. IEEE International Fuzzy
Systems Conference, 25-29.
Hu, Y., Li, W., Xu, H., & Xu, G. (2015). An Online
Learning Control Strategy for Hybrid Electric Vehicle
Based on Fuzzy Q-Learning. Energies, 8(10), 11167-
11186.
Katagami, D., Toriumi, H., Osawa, H., Kano, Y., Inaba, M.,
& Otsuki, T. (2018). Introduction to Research on
Artificial Intelligence Based Werewolf. Journal of
Japan Society for Fuzzy Theory and Intelligent
Informatics in Japanese, 30(5), 236-244.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013).
Playing Atari with Deep Reinforcement Learning.
arXiv, 1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Ruse, A. A.,
Veness, J., Bellemare, M. G., & Graves, A. (2015).
Human-Level Control through Deep Reinforcement
Learning. Nature, 518, 529-533.
Nakajima, T., Udo, A., & Ishibuchi, H. (2003). Acquisition
of Soccer Agent Behavior through Fuzzy Q-Learning.
Journal of Japan Society for Fuzzy Theory and
Intelligent Informatics in Japanese, 15(6), 702-707.
Saito, K., Notsu, A., & Honda, K. (2014). The Effect of
UCB Algorithm in Reinforcement Learning. Proc.
Fuzzy System Symposium, 174-179.
Silver, D., Aja, H., Maddison, C. J., Guez, A., Sifre, L.,
Driessche, G. v., & Schrittwieser, J. (2016). Mastering
the Game of Go with Deep Neural Networks and Tree
Search. Nature, 529, 484-489.
Tarjan, R. (1971). Depth-First Search and Linear Graph
Algorithms. SIAM Journal on Computing, 1(2), 146-
160.
Umano, M., Tachino, H., & Ise, A. (2013). Application of
Fuzzy Q-learning to Car Racing Game. Proc. Fuzzy
System Symposium, 1006-1011.
van Senjen, H., Fateme, M., Romoff, J., Laroche, R.,
Barnes, T., & Tsang, J. (2017). Hybrid Reward
Architecture for Reinforcement Learning. arXiv,
1706.04208.
Watkins, C. J. (1989). Learning From Delayed Rewards,
PhD Thesis. University of Cambridge.
Watkins, C. J., & Dayan, P. (1992). Q-Learning. Machine
Learning, 8, 279-292.
Closest Closest Closest GoTo GoTo Avoid
Ghost Pill PPill Pill PPill Ghost
1
near near near
44 30 11
2
near near middle
55050
3
near near far
30 49 29
4
near middle near
60 110 74
5
near middle middle
26 39 53
6
near middle far
40 27 50
7
near far near
72 99 50
8
near far middle
50 50 50
9
near far far
23 50 51
10
middle near near
80 63 44
11
middle near middle
83 50 30
12
middle near far
47 50 50
13
middle middle near
89 108 42
14
middle middle middle
50 73 44
15
middle middle far
67 50 59
16
middle far near
50 50 50
17
middle far middle
48 53 50
18
middle far far
50 50 50
19
far near near
103 82 50
20
far near middle
64 50 50
21
far near far
42 50 50
22
far middle near
31 10 36
23
far middle middle
50 71 50
24
far middle far
50 59 50
25
far far near
50 50 50
26
far far middle
51 46 48
27
far far far
50 50 50
St at e