tion blur from images taken with a long exposure
time (Shan et al., 2008; Chakrabarti, 2016), and the
second is an approach that removes noise from im-
ages taken with a short exposure time (Li et al., 2015;
Zhang et al., 2017; Remez et al., 2017).
For removing the motion blur, many traditional
methods estimate point spread functions (PSF) which
represent motion blurs in images and remove the blurs
based on the estimated point spread functions (Shan
et al., 2008). Recent methods, on the other hand,
use deep neural networks for directly removing image
blurs without estimating PSFs (Chakrabarti, 2016;
Kupyn et al., 2018). However, it is difficult to recover
the details of the image by either method.
For denoising low light images, some existing
methods estimate image noise first and then enhance
low light images with denoising (Li et al., 2015). The
deep neural networks are also used for removing noise
and enhancing images directly (Zhang et al., 2017;
Remez et al., 2017). However, again it is difficult to
recover the details of the image as with the deblurring
methods. As shown in these existing methods, it is a
very difficult problem to recover accurate high inten-
sity images from degraded image taken in low light.
In recent years, a method has been proposed that
uses deep learning to recover a clearer high intensity
image from a single low light image compared to ex-
isting methods (Chen et al., 2018). This method has
succeeded in recovering relatively clear image details
even in the case of an image with a small amount of
noise. However, when there is a lot of noise in the
image, the image details cannot be recovered well,
and the restoration accuracy is insufficient. This is
because the information of the object is largely lost
due to image noise, and the information necessary for
recovering a clear image is insufficient.
Thus, in this paper, we propose a method that can
recover high intensity images with higher accuracy
by using multiple low light images. In the proposed
method, sequential images of the same object are used
as multiple low light images. As a result, information
on high frequency components required for accurate
recovery can be obtained from multiple images, and
as a result, it is expected that more accurate high in-
tensity image can be recovered. However, if we have
a moving object in the scene, the position of the ob-
ject changes in the sequential images. Thus, in this
paper, we propose a method for generating a high in-
tensity image while compensating for such a differ-
ence in position. We also introduce brand new loss
called character recognition loss, which enables us to
recover high frequency components and improve the
readability of the characters in the recovered high in-
tensity image.
3 PROPOSED METHOD
The network of the proposed method is shown in
Fig. 2. As shown in this figure, the proposed method
trains three different U-Nets (Ronneberger et al.,
2015), that is moving object U-Net, stationary object
U-Net and mask image U-Net.
3.1 Alignment of Moving Objects
When dealing with sequential images, point corre-
spondence of moving objects among the sequential
images is very important. In case of recovering a
high intensity image at time T from T time low light
images I
I
I
i
(i = 1, . . . , T ), the image recovery can be
performed more effectively, if the optical flow of the
corresponding point is known. Thus, in this research,
the optical flow in the sequential images is estimated
in advance and used for aligning the moving objects
roughly in the images from time 1 to time T . Fig. 3
shows an example of the alignment performed in our
method. As shown in this figure, the misalignment
of the moving object among the sequential images is
almost eliminated by this alignment procedure. By
using the sequential images aligned in this way, the
improvement of the accuracy in the high intensity im-
age recovery can be expected.
3.2 Generating High Intensity Images
from Sequential Low Light Images
In general, image noise occurs randomly for each
shot, so even if we shoot in the same scene, we can
obtain an image with different noise each time we
shoot. Therefore, by using multiple low light images,
we can obtain more accurate information about the
scene comparing with using only a single low light
image.
In this research, we consider a network that inputs
the low light images at T times I
I
I
i
(i = 1, . . . , T ) and
outputs a high intensity recovered image I
I
I
R
at time T,
which is the final frame. Therefore, the network can
be regarded as the function F shown in the following
equation.
I
I
I
R
= F(I
I
I
1
, . . . , I
I
I
T
) (1)
In this research, U-Net is used as such a network
F. The input of the U-Net is the concat of sequen-
tial low light images, and output of the network is the
recovered high intensity image. In this way, the net-
work can learn the recovery of accurate high intensity
image from multiple noisy low light images.
Suppose the number of vertical pixels in the image
is H, the number of horizontal pixels is W , the number
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications
600