Finally the control variable θ is the throttle to be ap-
plied to the car which will follow the control law:
θ(S
t
) = −
n
X
i=1
µ
i
·
ω
i
(S
t
) · ∂J
i
∂θ(S
t−1
)
(22)
3.3.2 Experimental Setup
Although in the majority of the examples in Rein-
forcement Learning the reward function uses only
(−1, 0, +1) values, we are using a continuous reward
function which is an inverse function of the summa-
tion of the performance indexes meaning that high re-
wards corresponds to minimums errors in the perfor-
mance indexes.
reward =
1
1 +
P
J
i
(23)
3.3.3 Results
The simulations where made using the Reinforcement
Learning Framework (Sutton, 2006). In the Conflict-
ing Mountain Car Problem, the results indicate that
the agent following the presented goal coordination
approach learns a near optimal policy for reaching the
goal by successive approximations.
Figure 3, shows the result of the simulation, show-
ing the fact that the number of steps for reaching the
goal decrease up to a constant number of steps mean-
ing that this is the near optimal strategy for controlling
the agent.
0 20 40 60 80 100
episode num
200
400
600
800
1000
steps to reach the goal
50 60 70 80 90 100
last 50 episodes
105.0
105.5
106.0
106.5
107.0
107.5
108.0
108.5
109.0
steps to reach the goal
Figure 3: Mountain Car experimental results.
In Figure 3(Top) it can be seen that the number of
steps to reach the goal in the firsts episodes is near
1000 and in the subsequent episodes this step num-
ber decreases considerably, indeed, Figure 3(Bottom)
shows that the systems arrives to a stable configura-
tion around 107 steps per episode. An additional ob-
servation is that the speed of learning is high, from
the episode number 20, the system is practically sta-
bilized around the 107 steps per episode.
4 CONCLUSIONS AND FURTHER
WORK
A general framework for the problem of coordination
of multiple competing goals in dynamic environments
for physical agents has been presented. This approach
to goal coordination is a novel tool to incorporate a
deep coordination ability to pure reactive agents.
This framework was tested on two test problems
obtaining satisfactory results. Future experiments are
planed for a wide range of problems including differ-
ential games, humanoid robotics and modular robots.
Also we are interested in the study of methods for
reinforcement learning better suited for continuous
input states and multiple continuous actions. Usu-
ally a discretization of input and output variables is
produced but in other cases a better result could be
approximated if the problem were modeled by means
continuous states, generating more robust systems.
REFERENCES
Albus, J. (1975). A new approach to manipulator control:
The cerebellar model articulation controller (cmac). J.
of Dynamic Sys., Meas. and Control, pages 220–227.
Fonseca, C. M. and Fleming, P. J. (1995). An overview
of evolutionary algorithms in multiobjective optimiza-
tion. Evolutionary Computation, 3(1):1–16.
Isaacs, R. (1999). Differential Games. Dover Publications.
Passino, K. (2005). Biomimicry for Optimization, Control,
and Automation. Springer Verlag.
Sutton, R. (2006). Reinforcement learning and artificial in-
telligence. http://rlai.cs.ualberta.ca/RLAI/rlai.html.
Sutton, R. and Barto, A. (1998). Reinforcement Learning,
An Introduction. MIT Press.
Sutton, R. S. (1996). Generalization in reinforcement learn-
ing: Successful examples using sparse coarse coding.
In Touretzky, D. S., editor, Adv. in Neural Inf. Proc.
Systems, volume 8, pages 1038–1044. MIT Press.
Zitzler, E., Laumanns, M., Thiele, L., and Fonseca, C.
(2002). Why quality assessment of multiobjective op-
timizers is difficult. In Proc. GECCO 2002, pages
666–674.
DYNAMIC GOAL COORDINATION IN PHYSICAL AGENTS
159