3 PROGRAMMING
The dynamically reconfigurable device would not be
functional without the possibility of programming.
The functional blocks and switching networks have
a variable component which can be programmed.
The programming is done through a series of
program blocks, connected into a chain. The exit
from one block is connected to the entry to another.
Once we are in the programming mode, a shift of the
data in the program chain occurs with each clock. In
order to program the whole device, we have to spend
as many clocks as there are blocks. The Figure 3.
represents a program chain.
Figure 3: Programming chain.
The size of the program which is to be
programmed depends on the size of the
reconfigurable device. The 8x8 element matrix
requires 64 functional units and 64 switching
networks, while programming of such a block
requires 128 programming words. In order not to
waste the time required for programming, one more
register is added to the program block. This register
is called the preload register for configuration
loading. The real program chain consists of such
preload registers. When the whole configuration is
loaded into the program series for a certain clock,
the configuration of the whole device changes so
that the data from the preload register is transferred
into the configuration register. Thereby, the function
of the device will seemingly have changed within a
single clock. While the configuration loading into
the preload registers is in progress, the
reconfigurable component can perform its current
function without being hindered by the loading of
the new configuration.
4 COMPUTATION EXAMPLES
To demonstrate the functionality and applicability of
this type of architecture, we will analyse its
application in two examples. The first example is the
calculation of a simple function:
dcbay −×+= )( .
This example is used to explain the basic
principles of how the device functions. The
dynamically reconfigurable device consists of a
single input and a single output block, three
functional units, two adders and one multiplier. To
keep the construction simple, we will not use a
larger number of elements. First comes the adder,
then the multiplier and then the other adder. The
input block is located on the entry to the device, and
the output on the exit from the block.
The program chain corresponds to the element
layout, therefore the chain begins with the input
block, which is followed by the adder, multiplier,
adder and ends in the output block. The input block
sends the data through the dout_0 exit, parameter a,
dout_1 parameter b, dout_2 parameter c and dout_4
parameter d. In the first addition block, the value
ba
is calculated. After that, the result of the
ba
addition is routed into the multiplier, where it
is multiplied by the c parameter value. The result of
the multiplication is routed to the other adder, where
it is added the
d
parameter value. The result
obtained in the second adder is stored in the output
block.
The first step when programming the
reconfigurable device is setting the data route delay
in a way that the data arrival to the function blocks is
synchronised. The configuration of the multiplier
ensures that the A and B entry values are multiplied
and the scaling of results is not necessary. It is also
specified that the b parameter has to be delayed by
one clock. The delay is necessary in order to achieve
synchronisation. Since the a, b and c parameters
leave the exit block at the same time, the c parameter
reaches the multiplier before the sum of
ba + . It is
for that reason that the c parameter is purposefully
delayed by one clock. The switching matrix is
programmed so that is allows the signal in_w to
enter through the A entry of the functional unit,
while the in_s signal enters through the B entry. The
multiplication results are sent to out_s and out_n.
Finally, the second adder is configured in a way
that the B entry of the functional unit is negated and
added to the A entry value. There are no A entry
delays, while the B entry is delayed by two clocks.
The reason for the delay is the same as in
multiplying. The B entry contains the result of the
cba
)(
operation, which required two clocks.
The switching matrix will take the A parameter from
the in_i input, and the B parameter from the in s
input. The result is sent to the out_s output. The final
result and all the inter-results from every block will
come to the exit block. In order to synchronise the
final results and the intermediate results, additional
delay was added on the exit, which synchronises the
data upload into the output block.
APPLICATION OF DYNAMICALLY RECONFIGURABLE PROCESSORS IN DIGITAL SIGNAL PROCESSING
345