Occopus will decide randomly between node defini-
tions when provisioning new node instances, thus we
propose to use the updated infrastructure description
method instead. In this case, the user can explicitly
define how many Kafka nodes should be created in
the ELKH Cloud, or in Azure, or in any other cloud
provider for which a Kafka node definition exists.
The updated infrastructure description providing
Kafka nodes in both the ELKH Cloud and Azure
can be found in the Gitlab repository for the paper
(Farkas, 2021), in the directory kafka.vm, the file is
called infra-kafka-hybrid.yaml. This infrastruc-
ture description shows the structure of the infrastruc-
ture: one Zookeeper node is used, and there are two
Kafka nodes, one (as earlier) for the ELKH Cloud,
and one for Azure. Both ELKH Cloud and Azure
variants should have at least 2, at most 10 instances
when scaling the nodes.
If an ELKH Cloud user has already created an in-
frastructure based on the ELKH Cloud-only Apache
Kafka reference architecture, then Occopus enables
the extension of the available infrastructure with the
new additional Azure-based Kafka nodes. The user
simply has to instruct Occopus to build an infrastruc-
ture, based on the existing one, and using the new
infrastructure description. Occopus will check if the
nodes of the existing infrastructure are still alive (if
not, will provision new ones), and will also create the
Azure-based Kafka node instances.
The node definition part belonging to the updated,
hybrid infrastructure depends on the target cloud
used. In the following subsections we will examine
how the extension can be implemented in Azure us-
ing virtual machines and in Azure using container in-
stances. Occopus, beside others, has support for han-
dling these cloud types.
4.1 Azure VM-based Kafka Nodes
Azure-based virtual machines in Occopus are handled
by the azure vm resource handler plugin. Through
this plugin users can instantiate virtual machines in
the different regions of Azure, cloudinit scripts are
available for contextualization, just like in case of the
OpenStack-based ELKH Cloud. The Occopus doc-
umentation provides detailed instructions on how to
prepare a node definition for the azure vm plugin, so
we are only enumerating the necessary steps in a nut-
shell here:
• set authentication data to be used with Azure (us-
ing Azure service principal),
• set node properties, like region, virtual machine
size, publisher,
• create contextualization script for the service.
As it was already mentioned, the Occopus doc-
umentation covers most cases. Also, as Azure pro-
vides cloudinit-based contextualization, very likely
the contextualization script used for ELKH Cloud-
based nodes can also be used for Azure.
However, a slight modification is necessary. As
shown in Figure 3, the Kafka nodes are communicat-
ing with each other. As the ELKH Cloud-based in-
stances, and the Azure-based instances are running on
completely different networks, the different nodes in
this hybrid scenario must have a public IP assigned,
and this public IP address should be used as the ad-
vertised listener, and the listener used for inter-broker
communication, too. Most cloud providers offer a
way to query the public IP address associated to the
virtual machine from inside the virtual machine, but
this method differs for the clouds. Thus, the con-
textualization script of the Azure-based instances will
differ from the ELKH Cloud-based ones in this man-
ner: for querying the public IP address of the VM,
the Azure Instance Metadata Service(IMDS, 2021) is
used.
With the updated cloudinit script, we can start
to extend the existing infrastructure in the ELKH
Cloud with Kafka nodes coming from Azure, using
the azure vm plugin. The output of the Occopus
command to extend the infrastructure can be found
in the Gitlab repository’s kafka.vm directory, in the
file logs/03 extend hybrid.log.
After the extension of the infrastructure, we will
have one Zookeeper node running in the ELKH
Cloud, 5 Kafka nodes running in the ELKH Cloud,
and 2 additional Kafka nodes running in Azure.
At this point, we can start to attach new producers
and consumers to the extended cluster. They can con-
nect either to the ELKH Cloud-based or Azure-based
nodes, but it is important to note, that according to
the Kafka documentation new servers will not auto-
matically be assigned any data partitions, so unless
partitions are moved to them they will not be doing
any work until new topics are created.
4.2 Azure Container-based Kafka
Nodes
Containers offer a simple solution for running pre-
packaged applications in versatile environments. As it
was mentioned earlier, ELKH Cloud does not provide
a native service for running containers, but users are
required to deploy some sort of container service (for
example Docker Engine or Kubernetes) onto virtual
machines, and run containers on top of that service.
Azure includes a service called Azure Container
Instances (ACI), which enables Azure users to start
Reference Architecture for IoT Platforms towards Cloud Continuum based on Apache Kafka and Orchestration Methods
209