movement, which is insufficient for naturally moving
around in a large indoor VR scene, let alone an out-
door scene), lacking or inconsistent audio (humans’
perception of space is not only visual but also relies
heavily on audio perception of reverberation effects;
this is often overlooked in VR representations of ar-
chitectural use cases).
In essence, the challenges regarding faithful per-
ception of space, distance, scale, etc. in VR repre-
sentations are substantial. Quite specifically, current
research indicates that distance perception in VR is
compressed, i.e., users’ estimation of distances in VR
are approximately 20% lower than the same users’ es-
timation of distances in real life, (Jensen et al., 2020).
Being able to accurately estimate distances is not the
only important factor in perceiving scale, but it is ob-
viously related.
4 METHODS
In this section we describe the approaches we have
taken, and the choices we have made, towards de-
signing a way to experimentally evaluate how view-
ing mode influences accuracy of distance estimation.
First we discuss our approach to letting test partici-
pants evaluate distances in the various modes. Sub-
sequently we describe central aspects of the technical
implementation behind the experiments.
4.1 Estimating Distances in Viewing
Modes
The literature on distance estimation in VR separates
distances into ego-centric distances (distances from
one self to some location in the environment) and exo-
centric distances (distances between two environment
locations), (Renner et al., 2013). We believe both to
be equally important for the purpose of evaluating ur-
ban scale architecture in VR, and hence include both
types in our experiment (Fig. 3). The literature is also
extensive in terms of which method to apply when
getting test participants to estimate experienced dis-
tances, e.g. (Peer and Ponto, 2017). Examples of
applied methods are verbal reporting, blind walking,
throwing, etc. We opt for verbal reporting for two
main reasons: 1) it is the method that best suits having
360° viewing modes in the experiment, and 2) blind
walking, although most popular for VR research, is
not realistic for scenarios of mid to upper action space
dimension (10m or higher), as it can be challenging to
find a suitable environment for carrying out the exper-
iment.
In terms of test participant locomotion in the 6
DoF viewing mode, we opted to avoid teleportation.
Only 1-to-1 physical movement is possible for explor-
ing a local area of the virtual 3D scene, and partic-
ipants are only allowed to move within an approxi-
mately 3m by 3m area, and there is a mark on the
ground in the virtual environment, where the partic-
ipants is to return to prior to verbally reporting on
questions regarding estimated distances.
A final important thing regarding what the exper-
iment should entail, concerns test participants’ per-
ception of their own height in the different viewing
modes. As described, the two 360° modes are pre-
sented to participants as monocular experiences. This
causes perceptual confusion, especially when looking
down on the ground, as the only perceptually plausi-
ble explanation for experiencing no binocular dispar-
ity is that what is viewed must be located at infinity.
Thus, participants should subjectively feel “floating”
high above the ground. For the 3 DoF and 6 DoF VR
modes, test participants do perceive the correct stereo
disparities, but, similarly to the 360° modes, there is
no visual representation of self when looking down
(you do not see your own legs and feet). For these
reasons we included into the experiment that test par-
ticipants would be asked to verbally answer whether
they felt shorter than normal, normal height, or taller
than normal.
4.2 3D Model
The VR models for the user tests are made using a
workflow developed for the production of architec-
tural urban VR scenarios using parametric urban de-
sign to feed a VR model. This is done within a soft-
ware framework comprising open GIS data (as a basis
for the parametric generation of real-world urban en-
vironments), the CityEngine (CE) parametric urban
modeler (for the parametric generation of 3D urban
models), and the Unreal Engine (UE) game engine
(for the preparation of the final rendered VR model).
While the parametric modeling approach poten-
tially allows for model representation at different lev-
els of detail (LOD) and texture sets, for the tests pre-
sented in this paper, a consistent LOD and textur-
ing was chosen using the metaphor of two types of
cardboard (plain and corrugated) architectural scale
model. Geometrically, detailing was limited to adding
windows, doors, pitched roofs, and cornices to build-
ings. Ground surfaces were textured using the same
corrugated cardboard texture as for the buildings with
no accentuation of curbs or other 3D features of the
horizontal plane. While the 3D model represents
a real-world urban space, which, in reality, has a
Scale Perception in VR for Urban Scale Environments: 360 Photos versus Full 3D Virtual Reality
35