With the development of neural networks, this
task is able to be unified into a single problem: given
pairs of example images from both domains, teach a
convolutional neural network to map the input images
to the output images. Use of image to image transla-
tion in medical imaging is to generate images virtu-
ally, images which are not acquired due to the clinical
workflow.
Various neural networks have been developed and
used for image to image translation. Generative ad-
versarial network (GAN) is the most popular model
used for the same. By performing various transfor-
mations on the basic GAN model, various other net-
works viz. cGAN (Isola et al., 2016), Pix2Pix (Isola
et al., 2016), MedGAN (Armanious et al., 2018), Cy-
cleGAN (Zhu et al., 2017) have been developed.
In this paper, we propose a multimodality (T1WI
to T2WI and vice versa) image translation model for
DICOM brain images. DICOM images are the preva-
lent medical industry standard and DICOM images
are smaller in size compared to the corresponding
NIfTI images. We discuss the pre-processing tech-
niques for DICOM data as well as the proposed U-Net
based model in this paper. We further show a qualita-
tive and quantitative comparison of the generated re-
sults with the ground truth followed by the scope of
future work.
2 RELATED WORK
Numerous contributions have been made in literature
on medical image translation. However, most of these
contributions pertain to the NIfTI format which are
used for research purposes. A deep network based so-
lution to reconstruct T2WI from T1WI and few sam-
ples of k-space for T2WI using an encoder-decoder
architecture has been proposed on NIfTI brain images
in (Srinivasan et al., 2020). A comparison for image-
to-image translation of T1WI and T2WI is proposed
using CycleGAN and U-Net for NIfTI brain images
in (Welander et al., 2018). Considering the impor-
tance of complementary information present in differ-
ent modalities and the predominant industrial usage,
DICOM images are for the first time, motivated to be
utilized in construction of T2WI from a given T1WI
using our proposed U-Net based model.
The advantages and uses of image-to-image trans-
lation on paired and unpaired images using GANs es-
pecially in medical imaging using deep learning has
been explained in (Kaji and Kida, 2019) (Alotaibi,
2020) (Shen et al., 2019). In (Avula, 2020), Convo-
lutional Neural Network (CNNs) specialising in vi-
sual imagery are explored for the reconstruction of T1
Weighted Glioma Images from T1 Weighted-Images.
Conditional Generative Adversarial Net-
works(cGAN) which enables fine-tuned contrast
synthesis are tested in (Yang et al., 2020) for cross
modality registration and MRI segmentation to
perform cross modality image-to-image translation
of MRI scans. Predictive Generative Adversarial
Networks(pGAN) method is compared with cGAN in
(Dar et al., 2018) where both utilize adversarial loss
functions and correlated structure across neighboring
cross-sections for improved synthesis, particularly
at high spatial frequencies. In (Xiao et al., 2018),
the authors demonstrate an algorithm that learns
complex mappings between different MRI contrasts
and accurately transforms between T1WI and T2WI,
proton density images, time-of-flight angiograms,
and diffusion MRI images. A tool to transform non
T1W-Images to have a similar contrast profile to
an adult T1W-Image as mentioned in (Neurabenn,
2020) has been developed which uses the basic
U-Net model. Whole medical image synthesis using
Deep Encoder-Decoder Image Synthesizer has been
proposed in (Sevetlidis et al., 2016).
3 PROPOSED WORK
Several kind of deep learning models were investi-
gated during the literature study. Two models stood
out amongst the others in synthesizing realistic im-
ages in high resolution - Encoder-decoder and U-Net.
In medical image to image translation, paired images
from source and target modality are needed. Conver-
sion of one modality to another modality uses extrac-
tion of features like tissues and fat cells.
U-Net (Ronneberger et al., 2015) can be consid-
ered as a modified version of encoder decoder archi-
tecture. As shown in Figure 2, U-Net architecture
consists of a contracting path to capture context and a
symmetric expanding path that enables precise local-
ization. The main idea is to supplement a contracting
network by successive layers, where pooling opera-
tors are replaced by up-sampling operators. These
layers increase the resolution of the output. A suc-
cessive convolution layer can then learn to assemble
a more precise output based on this information.
U-Net architecture is divided into 2 parts – a con-
tracting path and an expansive part. As we can see in
Figure 2, in the contracting path, the spatial dimen-
sions are reduced and the number of channels are in-
creased while in the expansive path, dimensions are
increased and channels are decreased. Then, with a
set of transformations, we end up with high-resolution
features which are then combined to predict a relevant
Multi Modality Medical Image Translation for Dicom Brain Images
169