construction.    In  this  situation,  the  life-span  of  an 
agent should be relatively long, lasting at least until 
any reconfiguration of the factory system is required 
or a change occurs in its operating environment. This 
study will be  focused  on factory based construction 
manufacture,  specifically  for  precast  reinforced 
concrete (PRC) component production. 
Optimization  of  customized  PRC  component 
production  has  been  considered  by  several 
researchers (Leu & Hwang, 2001; Chan & Hu, 2002; 
Benjaoran  &  Dawood,  2005),  using  genetic 
algorithms (GAs) to improve production 
performance.  Although the approach was shown to 
be successful, heuristic search methods such as GAs 
are computationally expensive.   Therefore, they are 
not well suited to situations where decisions have to 
be made quickly. 
RL solutions based on a learned model, such  as 
that developed by Shitole et al. (2019), will generate 
rapid solutions to a decision problem, once trained.  A 
number  of  authors  have  applied  this  method  to  the 
control of factory operations (Waschneck et al., 2018; 
Zhou et al., 2020; Xia et al., 2021) and found results 
to be promising when compared to more conventional 
approaches  such  as  rules-of-thumb.  Unfortunately, 
applications  have  been  outside  construction 
manufacturing, and therefore do not address many of 
the challenges of this industry, although Waschneck 
et al. (2018) did address the problem of customization 
within the semiconductor industry. 
The  objective  of  this  paper  is  to  explore  the 
potential  of  RL  based  modelling  as  a  means  of 
controlling  factory  based  construction 
manufacturing,  given  the  unique  demands  of 
construction projects. 
2  DYNAMIC SYSTEM CONTROL 
2.1  Decision Agents 
The  future  path  of  a  construction  manufacturing 
system  is  determined  by  both  controllable  and 
uncontrollable  events.  The  controllable  events 
provide an opportunity to steer this path along a line 
that  is  favourable  to  the  manufacturer,  optimizing 
performance  in  terms  of,  say,  productivity  and/or 
profit.  This is achieved through the selection of an 
appropriate sequence of  decisions wherever options 
exist. Examples of such decisions include prioritizing 
jobs  in  a  queue,  deciding  when  to  take  an  item  of 
equipment offline for maintenance, and selecting the 
number of machines to allocate to a process. 
These decisions are made by one or more agents, 
as illustrated in Figure 1, which operate dynamically 
throughout the life of the system.  An agent monitors 
relevant variables defining the state of the system and 
its environment (both current and possibly past states, 
and  even  predictions  about  future  states)  then  uses 
these insights to decide on appropriate future actions 
to implement.  Typically, these actions will concern 
events in the  immediate  future (given that  the  most 
relevant,  accurate,  and  valuable  information  is 
available at the time of the decision) but can also be 
applied to events later in the future for decisions that 
have a long lead time.   
 
Figure 1: Decision agent control of dynamic system. 
An  important  dichotomy  of  decision  agents  is 
search  based  versus  experience  based  systems. 
Search  based  agents,  which  include  blind  and 
heuristic methods, use a systematic exploration of the 
solution space looking for the best action attainable. 
They tailor a solution to the specific instance of the 
problem at hand.  As such, they may find better 
optimized  solutions  than  experience  based  agents, 
although that needs to be tested. Search based agents 
are also highly extensible, meaning they can be easily 
adapted  to  new  versions  of  the  problem.  On  the 
downside, they can be computationally expensive and 
thus not suited to situations requiring rapid decision 
making. 
In  contrast,  experience  based  agents,  which 
include rules-of-thumb and artificial neural networks 
(ANNs), make decisions based on exposure to similar 
situations  from  the  past.  Once  developed,  an 
experience based agent can output decisions rapidly. 
However, because the solutions they offer are generic 
rather than tailored to each situation, their decisions 
may not be as well optimized as those of search based 
agents.  Furthermore, experience based agents tend to 
lack  extensibility; each  new  version  of  the problem 
requires  redevelopment  of  the  agent,  which  in  turn 
requires  the  acquisition  and  assimilation  of  large 
volumes of new information on system behaviour.