L
GAN
(G, D) = E
y∼p
data(y)
[logD(y)]
+ E
I
0
,I
1
∼p
data(I
0
,I
1
)
[log(1 − D(G(I
0
, I
1
)))]
On the other hand, L
L1
represents L1 loss shown in
the following equation:
L
L1
(G) = E
y,I
0
,I
1
∼P
data(y,I
0
,I
1
)
[||y − G(I
0
, I
1
)||
1
]
where, y is the ground truth of the high-resolution
depth image, I
0
is a low-resolution depth image, and
I
1
is a high-resolution RGB image.
By training the network as shown in Eq. (1),
we obtain Generator which generates high-resolution
depth images from low-resolution depth images.
4 DATASET
We next explain the data set used in this research.
In order to learn the proposed network, pairs of
depth image and RGB image is required. Therefore,
we constructed a training dataset using NYU Depth
Dataset (Silberman and Fergus, 2011). NYU Depth
is an indoor image dataset, which consists of 2284
pairs of depth image and RGB image. The depth and
RGB images obtained from this dataset were resized
to 256 × 256, and 2184 pairs were used for train-
ing and 100 pairs were used for testing. In this re-
search, we conducted two experiments, a synthetic
image experiment in which a low-resolution depth
image obtained from LiDAR was created from a high-
resolution depth image synthetically, and a real image
experiment in which real low-resolution depth images
were obtained from LiDAR (Velodyne VLP-16). In
both cases, in order to investigate the change in ac-
curacy due to the difference in the amount of infor-
mation in the low-resolution depth image, we cre-
ated datasets with different vertical resolutions, n =
16, 8, and4, for low-resolution depth images. That is
the number of vertical scan lines of the LiDAR was
16, 8 and 4. The example dataset used in our experi-
ments is shown in the Fig. 6.
5 EXPERIMENTS
5.1 Synthetic Image Experiments
We next show the results of synthetic image exper-
iments, in which a high-resolution depth image is
generated from a low-resolution depth image and an
RGB image by using the proposed method. For com-
parison, we also generated the high-resolution image
from just a low-resolution depth image.
Table 1: Accuracy of recovered high-resolution depth im-
age.
LiDAR only method 1 method 2
RMSE ↓ 6.6462 5.7329 5.6673
n = 16 PSNR ↑ 32.187 33.4756 33.5886
SSIM ↑ 0.9453 0.9525 0.9529
RMSE 11.6271 9.2953 9.3441
n = 8 PSNR 27.2126 29.3198 29.1475
SSIM 0.9117 0.9289 0.9239
RMSE 19.2588 15.4661 16.3567
n = 4 PSNR 22.7405 24.7400 24.2165
SSIM 0.8828 0.9020 0.8914
Generator and Discriminator were trained for
5000 epochs. The batch size was 32, and
Adam (Kingma and Ba, 2014) was used with a learn-
ing rate of 0.001 for learning optimization.
For each low-resolution depth image of n = 16, 8,
and 4, the network was trained by using 2184 training
data, and high-resolution depth images were gener-
ated from 100 test low-resolution images by using the
trained network.
The experimental results are shown in Fig. 7.
From the result of n = 16 in Fig. 7 (a), we find that the
difference between the proposed method and the ex-
isting method with only depth images is small. How-
ever, as the vertical resolution of the input depth im-
age decreases to n = 8 and n = 4, the degradation
of the result in the existing method becomes very
large, and we find that the proposed method com-
bining RGB images can recover the high-resolution
depth image more accurately. For example, we find
that the shapes of the desk and chair are distorted in
the existing method, whereas the proposed method
can recover them more accurately.
Table 1 shows the accuracy of the recovered 100
high-resolution depth images in RMSE, PSNR, and
SSIM. From this table, we find that in any case of ver-
tical resolution, the proposed method using the RGB
image and the depth image can generate more accu-
rate high-resolution depth images than the existing
method.
5.2 Real Image Experiments
We next show the results obtained from real image
experiments. Similar to the synthetic image experi-
ments, training was performed with NYU Depth de-
taset, and the low-resolution depth image obtained
from a LiDAR (Velodyne VLP-16) were input to the
trained network to evaluate the performance of the
proposed method. We tested the proposed method and
the existing method while changing the vertical reso-
lution of LiDAR to n = 16, 8 and 4. Calibration of the
data between the RGB camera and LiDAR was con-
ducted in advance by using projective transformation.
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications
662