RoSELS: Road Surface Extraction for 3D Automotive LiDAR Point

Cloud Sequence

Dhvani Katkoria and Jaya Sreevalsan-Nair

Graphics-Visualization-Computing Lab (GVCL),

International Institute of Information Technology Bangalore (IIITB), Bangalore, India

Keywords:

Road Surface Extraction, 3D LiDAR Point Clouds, Automotive LiDAR, Ego-vehicle, Semantic Segmentation,

Ground Filtering, Frame Classiﬁcation, Road Geometry, Sequence Data, Point Set Smoothing, Range View,

Multiscale Feature Extraction, Local Features, Global Features.

Abstract:

Road surface geometry provides information about navigable space in autonomous driving. Ground plane

estimation is done on “road” points after semantic segmentation of three-dimensional (3D) automotive Li-

DAR point clouds as a precursor to this geometry extraction. However, the actual geometry extraction is less

explored, as it is expensive to use all “road” points for mesh generation. Thus, we propose a coarser surface

approximation using road edge points. The geometry extraction for the entire sequence of a trajectory provides

the complete road geometry, from the point of view of the ego-vehicle. Thus, we propose an automated system,

RoSELS (Road Surface Extraction for LiDAR point cloud Sequence). Our novel approach involves ground

point detection and road geometry classiﬁcation, i.e. frame classiﬁcation, for determining the road edge points.

We use appropriate supervised and pre-trained transfer learning models, along with computational geometry

algorithms to implement the workﬂow. Our results on SemanticKITTI show that our extracted road surface

for the sequence is qualitatively and quantitatively close to the reference trajectory.

1 INTRODUCTION

Navigable space detection is a challenging prob-

lem in robotics and intelligent vehicle technology,

which requires an integrated solution from both com-

puter vision and computational geometry. In three-

dimensional (3D) automotive LiDAR point cloud pro-

cessing, navigable space implies the ground surface

on which a vehicle can traverse, which is predomi-

nantly the road surface. Here, the “ground” class of

points includes several ﬁne-grained classes, namely,

“road,” “parking,” “sidewalk,” “terrain,” etc. (Paig-

war et al., 2020). The state-of-the-art methods per-

form ground point segmentation/detection followed

by ground plane estimation motivated as a precursor

to road surface extraction (Paigwar et al., 2020; Rist

et al., 2020). However, we ﬁnd that ground plane es-

timation needs to be performed piecewise even for a

single point cloud, thus providing a coarse approx-

imation of the surface geometry. Piecewise estima-

tion requires systematic geometric analysis to deter-

mine the number, position, and orientation of planes

needed to jointly provide a water-tight surface. This is

https://orcid.org/0000-0001-6333-4161

a challenging point set processing problem, especially

as the point clouds are unstructured. Instead, we pro-

pose surface mesh extraction from the road points di-

rectly. However, generating a ﬁne mesh with all road

points is time-consuming. This is alleviated by using

an appropriate subset of road points that sufﬁciently

sample the surface. Here, we propose the road edge

points and vehicle positions as this desired sample set.

Given we are using the positions of the ego-

motion of the vehicle, we can now expand the surface

extraction across all frames in a sequence. This leads

to creating a watertight road surface for the entire se-

quence, which is as seen from the point of view of

the ego-vehicle. Such a process requires all the point

clouds in the sequence, which improves the utilization

of the complete dataset.

The conventional data processing workﬂow for

3D automotive LiDAR point clouds involves se-

mantic segmentation, which readily detects road

points. However, the semantic segmentation results

have to be post-processed to identify curb or edge

points (Behley et al., 2021). At the same time,

ground point ﬁltering using local height differences

is a reliable solution in the LiDAR point cloud anal-

Katkoria, D. and Sreevalsan-Nair, J.

RoSELS: Road Surface Extraction for 3D Automotive LiDAR Point Cloud Sequence.

DOI: 10.5220/0011301700003277

In Proceedings of the 3rd International Conference on Deep Learning Theory and Applications (DeLTA 2022), pages 55-67

ISBN: 978-989-758-584-5; ISSN: 2184-9277

Figure 1: Summary of our proposed system, RoSELS, for

3D road surface extraction using ground points detected

from an automotive LiDAR point cloud sequence. Our sys-

tem includes two novel and signiﬁcant intermediate pro-

cesses of road edge point detection and frame classiﬁcation.

ysis (Arora et al., 2021). Binary clustering in point

clouds can be done here using highly statistically

signiﬁcant handcrafted features, such as height dif-

ferences, to classify the points as “edge” and “non-

edge” points. Since the expectation-maximization

(EM) algorithm has been found to be effective for bi-

nary clustering of LiDAR point clouds (Kumari and

Sreevalsan-Nair, 2015), we propose a road edge point

detection method using binary clustering of ground

points. We also use the spatiotemporal locality of the

points for outlier removal to improve the ground point

detection, and thus the edge points.

Extraction of road geometry becomes challeng-

ing in the presence of turnings and complex topol-

ogy, such as crossroads. Since our work is novel

in extracting surface mesh geometry for roads, we

ﬁrst focus on the workﬂow for straight roads. Such

a mesh generation process can be then extended to

curved roads, i.e. turnings and crossroads. Alterna-

tively, our proposed method can extract contiguous

segments of straight roads and ﬁll the gaps between

them for short curved segments using surface correc-

tion. This solution works for most sequences with a

large fraction of contiguous straight roads. For our

requirement of identifying contiguous straight roads,

the point cloud geometry for each frame needs to be

classiﬁed. We propose a novel frame-wise point cloud

geometry classiﬁcation, referred to as frame classi-

ﬁcation, using an appropriate image representation

of the geometry. We choose an intermediate image

representation speciﬁcally, as is done in state-of-the-

art deep learning classiﬁers for semantic segmenta-

tion (Guo et al., 2020). Here, transfer learning is used

for frame classiﬁcation.

In summary (Figure 1), our proposed approach is

to detect ground points, on which both edge detec-

tion and frame classiﬁcation are performed. We fur-

ther smooth the edge point set to improve the sample

set for surface mesh generation and ﬁnally extract the

road surface using geometry algorithms. Our novel

contributions are in integrating appropriate methods

in our proposed system, RoSELS (Road Surface Ex-

traction from LiDAR point cloud Sequence) for its

implementation (Figure 2). Our key contributions are:

• design and implementation of a complete auto-

mated system using ground points, for road sur-

face extraction from the ego-motion in 3D auto-

motive LiDAR point clouds,

• a novel per-frame road-geometry classiﬁcation,

i.e. frame classiﬁcation, using appropriate image

representation of the ground points to be used in

transfer learning.

• novel use of appropriate point set processing and

surface mesh generation methods for performing

edge point detection and road surface extraction,

respectively.

2 RELATED WORK

The source data for the design of RoSELS is the 3D

automotive LiDAR point clouds in the form of se-

quence based on the trajectory of the ego-motion of

the vehicle (Behley et al., 2019; Behley et al., 2021).

While most of the existing methods work with frame-

wise analysis, the focus here is on the entire sequence.

The state of the art in 3D automotive LiDAR point

cloud processing on the following topics is relevant

to our work:

Ground Point Segmentation: Our starting point for

detecting road edge points is ground point segmen-

tation, which is equivalent to point-wise classiﬁcation

into “ground” and “non-ground points.” This has been

an active area of research since the mid-2000s (Paig-

war et al., 2020; Arora et al., 2021). There are

two parallel approaches – (i) use height-based hand-

crafted features and traditional machine learning, and

(ii) use convolutional neural networks (CNNs) or con-

volutional encoder-decoder, either with image repre-

sentation for its projection (e.g. sparse pseudo image,

bird’s eye view (BEV), range image, etc.) or with

3D points directly (Guo et al., 2020). The latter can

be directly used for binary road segmentation (Gigli

et al., 2020), speciﬁcally. While CNNs work the best

for trained environments, they are not as generalized

for other environments and are expensive for train-

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

Figure 2: (Left) Our proposed workﬂow of RoSELS, for generating 3D road surface, from input 3D LiDAR frame-wise point

clouds in a sequence and its trajectory information (position and pose of the vehicle). Our workﬂow is structured and proceeds

from point-, frame-, to sequence-wise processing. (Right) Frame classiﬁcation implemented on top-view images of ground

points in each frame, using (i) transfer learning using ResNet-50 architecture (He et al., 2016). The possible class hierarchy

for frames is given in (ii), of which we currently focus on the ﬁrst level of straight and curved road classes.

ing. However, depending on the requirement, height-

based feature extraction has been still used for ground

point segmentation (Arora et al., 2021; Ouyang et al.,

2021). For instance, either geometry-based ﬁlter-

ing (Ouyang et al., 2021) or processing of elevation

map image (Shen et al., 2021) is done. Such images

at various resolutions serve as input to an ensemble

edge detection for probabilistic ground point segmen-

tation, through voting (Arora et al., 2021).

An alternative method is the ﬁne-grained multi-

class semantic segmentation (Guo et al., 2020), where

the appropriate classes can be functionally combined

as “ground” class (Paigwar et al., 2020; Arora et al.,

2021; Shen et al., 2021). Projection-based methods

using deep learning form a widely used class of se-

mantic segmentation methods, for which range im-

ages are extensively used (Milioto et al., 2019; Cort-

inhal et al., 2021). Another set of networks directly

use 3D point sample sets, e.g. RandLA-Net (Hu et al.,

2020), SCSSnet (Rist et al., 2020), etc.

RoSELS uses the lower-cost ground segmentation

solution using supervised learning with hand-crafted

features, as our goal is to identify the road edge points

using the ground segmentation.

Road Edge Extraction: Curb extraction has been

studied for mobile laser scanning (MLS)/LiDAR

point clouds (Zhao et al., 2021; Sui et al., 2021),

monocular images acquired by moving vehicle (Stain-

vas and Buda, 2014), and elevation map from 2D laser

scanner (Liu et al., 2013). All of these methods in-

volve identifying candidate points or positions using

elevation ﬁltering and using appropriate line ﬁtting al-

gorithms. Our work is closest to road boundary ex-

traction for MLS point clouds (Sui et al., 2021) where

the edge points are located by searching outwards

from the vehicle trajectory. The difference, however,

is that for the MLS point clouds, the search is per-

formed in the candidate point set, and we exploit the

range image view of the vehicle LiDAR point cloud,

on which the search is performed.

In the benchmark SemanticKITTI dataset for ve-

hicle LiDAR point clouds, the curb points are labeled

as “sidewalk,” where the points are ﬁrst labeled in

tiles by human annotators (Behley et al., 2019), and

the road boundary/curb points are then speciﬁcally re-

ﬁned (Behley et al., 2021). In the baseline approaches

in the benchmark test of semantic segmentation im-

plemented using deep learning architectures, the IoU

(intersection over union) score for sidewalk is 75.2%

with RangeNet++ (Milioto et al., 2019) and 75.5% in

SCSSnet (Rist et al., 2020), which is relatively low.

Thus, we observe that the deep learning solutions

for semantic segmentation cater well to classifying

road points, but have a gap in road edge point de-

tection. This can be explained by the class imbal-

ance. Hence, we propose a structure-aware method

for road edge detection from ground points. Our

approach is to detect road edge points using the

height-based handcrafted features in supervised learn-

ing methods (Arora et al., 2021).

Scene Classiﬁcation: We look at the state of the

art in scene classiﬁcation, which is the closest to our

novel frame classiﬁcation. Coarse scene classiﬁca-

tion has been done on satellite or aerial images using

transfer learning (Zhou et al., 2018b), where ResNet

(Residual Network) (He et al., 2016) has demon-

strated near-accurate performance. ResNet with 50

layers (ResNet-50) is optimal in performance and cost

for land-cover classiﬁcation of remote sensing im-

ages (Scott et al., 2017). Road type classiﬁcation

based on its functionality as “highway,” and “non-

highway,” has been done on the KITTI vision bench-

mark suite using AlexNet (Krizhevsky et al., 2012).

Since ResNet-50 has worked better on aerial images,

we choose to use the same in RoSELS instead of

AlexNet, as we require a deep learning architecture

that works best for the top-view of the road. The top-

views perceptually show a clear distinction between

RoSELS: Road Surface Extraction for 3D Automotive LiDAR Point Cloud Sequence

different road geometry classes, namely the “straight”

and “curved” roads.

Ground Plane Estimation: Recent work on road

extraction has considered the ground plane estimation

and segmentation to be a precursor to the geometry

extraction (Paigwar et al., 2020; Rist et al., 2020).

GndNet uses 2D voxelization or pillars to generate

a pseudo-image which then is passed on to a con-

volutional encoder-decoder to estimate ground eleva-

tion (Paigwar et al., 2020). SCSSnet uses semantic

segmentation to identify ground points and performs a

simple ground plane estimate (Rist et al., 2020). Since

our goal is to perform coarse geometry extraction di-

rectly, we identify edge points and triangulate them

along with trajectory points. RoSELS is also differ-

ent from the mesh map, which is a triangulation of an

automotive LiDAR point cloud using surface normals

derived from range images (Chen et al., 2021).

3 PROPOSED WORKFLOW &

IMPLEMENTATION

We propose a novel workﬂow to extract approxi-

mate road geometry, for straight roads. The work-

ﬂow of RoSELS consists of ﬁve key steps, namely,

) ground point detection, (S

) frame classiﬁcation

implicitly giving the road geometry, (S

) road edge

point detection, (S

) edge point set smoothing, and

) 3D road surface extraction. As shown in Fig-

ures 1 and 2:

• S

is a point-wise operation, i.e. it is implemented

on each point in the point cloud, i.e. a frame.

• The frame-wise operations, S

and S

, are de-

coupled and implemented in parallel.

• S

and S

are sequence-wise operations, and

hence requires the trajectory information of the

vehicle for the entire sequence.

The overall workﬂow of RoSELS, i.e. S

to S

, is

captured in Algorithm 1. The partial workﬂows

of the point-wise classiﬁcation process (S

) and the

sequence-wise road edge extraction (S

, S

) are

given in Algorithms 2 and 3, respectively. Our pro-

posed road surface extraction gives the surface as vis-

ible from the point-of-view of the vehicle. Hence, the

vehicle is called an ego-vehicle (Rist et al., 2020).

– Ground Point Detection: The point cloud is

classiﬁed into “ground” and “non-ground” points for

ground point detection, for which the motivation is

explained in Section 2. S

involves two sequential

substeps, namely, outlier removal and semantic seg-

mentation. Here, we exploit the temporal and spatial

locality of the points.

Outlier Removal – Point cloud registration or scan

matching, which is widely implemented using the It-

erative Closest Point (ICP) registration (Besl Paul and

McKay, 1992), is performed on two different point

clouds to ﬁnd the correspondence pairs of points be-

tween the two. A correspondence pair implies that

there exists an afﬁne transformation (e.g., scaling, ro-

tation and translation) to make a point in one cloud

equivalent to a point in the other cloud.

The registration uses temporal locality, i.e. points

in a frame must be preserved in consecutive frames.

Thus, we ﬁnd correspondence pairs of points in con-

secutive frames and mark the remaining points as

“outliers” to be ﬁltered out. Owing to the continu-

ity of motion across frames, iterative registration on

three consecutive frames is implemented at a time.

For a given current frame x, we perform registration

in two steps. In the ﬁrst step, outliers are removed

in frame (x − 1), using registration between frames

(x−1) and (x −2). Then, the same process is repeated

on frame (x), after performing registration between

(x) and (x − 1).

Semantic Segmentation – Many of the objects cor-

responding to non-ground points have elevation (z)

higher than that of the ground. Hence, height-based

features are best suited for differentiating ground

points from others (Arora et al., 2021). The “ground”

class is a combination of several ﬁne-grained seman-

tic classes pertaining to the ground, thus making it a

coarser class. We extract the local and global spatial

handcrafted features, and use them in Random For-

est Classiﬁer (RFC) (Breiman, 2001) to segment the

point cloud to the ground and the non-ground classes.

The feature extraction is implemented on the point

cloud in each frame, after outlier removal. Here, we

compute multi-scale local height features, for three

scales. Multi-scale features, i.e. features captured at

different spatial resolutions, are known to work bet-

ter than a single scale for LiDAR point classiﬁca-

tion using RFC (Weinmann et al., 2014). Here, for

each point, we select a hybrid neighborhood search

that combines the criteria of the spherical and the k-

nearest neighborhoods (knn). Thus, we identify at

most k-nearest neighbors (knn) of a point that are

within a given distance, r, from the point. The height

features used for ground point detection are listed in

Table 1. These extracted features are computed and

used in an RFC for both training and testing.

– Road Edge Point Detection: The points on

the road edges are those ground points that physically

interface with the curb/sidewalk (Behley et al., 2021).

RoSELS requires extraction of both the left and right

banks of the road. Edge detection is a well-studied

problem in image processing, where gradient infor-

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

Table 1: Point-wise features at each scale for ground detec-

tion (S

Local Features Global Features Global Features

Point-based Frame-based

Height-based

– Difference from max.

– Difference from mean

– Standard deviation

Height-based

– Value (z-coordinate)

– Difference from mean

(of frame)

Position-based

– Distance from sensor

– Elevation angle θ

Height-based

– Difference from mean

– Standard deviation

mation is used for identifying the edges in images and

is implemented using the widely used three-step pro-

cess, which includes differentiation, smoothing, and

labeling (Ziou and Tabbone, 1998). Applying the

same approach as in image processing, the height gra-

dient is used as the characteristic feature to identify

the points on road edges.

However, our method of road edge detection is

different from the edge detection in images in two

salient ways. Firstly, smoothing is needed for only

road edge points, and not the entire point cloud, unlike

the image smoothing done for edge detection in im-

ages. Thus, we perform edge smoothing and labeling

steps, which are now jointly referred to as edge point

set smoothing, on ground points. Secondly, in our

case, the road extraction depends on the road geome-

try information, i.e. determined during S

. Addition-

ally, unlike the differentiation step which is done on a

frame, the edge point set smoothing (i.e. S

) requires

the information of the sequence trajectory for coor-

dinate system transformation. Thus, the three steps

do not follow successively, here. Thus, S

is exclu-

sively for the differentiation step, and edge point set

smoothing and labeling are implemented in S

Height Gradient-based Differentiation – Here, the

ﬁrst-order derivatives or gradients of height values are

computed exclusively of the ground points. We pro-

pose point clustering for this task with two speciﬁc

requirements. Firstly, we cluster road points into re-

gions with low and high height-gradient, referred to as

“ﬂat” and “non-ﬂat” regions, respectively. Secondly,

the features needed for clustering are computed using

height differences. Of the hand-crafted features used

for semantic segmentation of 3D airborne and terres-

trial LiDAR point clouds (Weinmann et al., 2014), we

choose the two appropriate height-difference (∆

) fea-

tures, namely, in a local neighborhood, and in the 2D

accumulation map. The 2D accumulation map gen-

erates local neighborhoods of points projected to xy-

plane, within a square of ﬁxed length (e.g. 0.25m),

centered at the point. For the clustering process, we

observe that the clear clusters do not exist in vehicle

LiDAR point clouds. The Expectation-maximization

(EM) algorithm (Dempster et al., 1977) has been

known to work better than the k-means clustering in

such scenarios. The EM algorithm works with an un-

derlying assumption of the existence of a Gaussian

Mixture Model (GMM) in the data. Thus, assuming

a bimodal data distribution in the 2D feature space,

we use the EM algorithm to determine two clusters of

points belonging to the ﬂat and non-ﬂat regions.

Projection to Range Images – The edge points fall in

the non-ﬂat regions, where those closest to the cen-

terline, i.e. the trajectory, are the desired ones. For

a frame-wise operation of centerline detection, the

range image of the frame is the best representation of

the frame to use. The centerline is deﬁned as the col-

umn of the range image where the sensor, i.e. the ego-

vehicle, is positioned. A range image is a dense ras-

terized representation of the occluded view from the

ego-vehicle point. Thus, it is generated as the spheri-

cal projection of the points nearest to the ego-vehicle,

and the pixels are colored based on the attribute of

the nearest point in the pixel. The image resolution

is given by the angular resolution in the elevation and

azimuthal angles. For instance, the angular resolu-

tion for the Velodyne HDL-64E S2

that was used for

SemanticKITTI data (Behley et al., 2019) acquisition

has 64 angular subdivisions (i.e. ≈0.4

) in elevation

angle spanning for 26.8

, and similarly 0.08

angu-

lar resolution for 360

azimuthal angle, which gives a

64 × 4500 resolution of range images.

Edge Detection – In order to determine the edge

points, we propose the use of a scanline algorithm

on the range image. We ﬁrst scan the image of size

H ×W row-wise, where the key positions relative to

the ego-vehicle, in the pixel space, are:

• at P

c f

, i.e. (0,

), which indicates the centerline

column in the front;

• at P

cbL

, i.e. (0, 0), which indicates the centerline

column in the back (rear), but on the left-side of

the ego-vehicle; and

• at P

cbR

, i.e. (0,W), which indicates the centerline

column in the back, but on the right-side.

Note that the left and right sides of the ego-vehicle

are with respect to its front face. Thus, at each row,

the pixels on the centerline columns are used as the

reference for scanning the pixels in the row in the ap-

propriate direction, until a pixel containing a non-ﬂat

region point is encountered. For front left and right

pixels for non-ﬂat region points, we traverse from P

c f

towards P

cbL

and P

cbR

, respectively. Similarly, in the

rear side, for left side, we traverse from P

cbL

to P

c f

and for right side, from P

cbR

to P

c f

. After locating

these pixels, their corresponding 3D LiDAR points

are to be determined. We refer to these row-wise

This information is from the sensor speciﬁcation sheet

as published by the sensor manufacturer.

RoSELS: Road Surface Extraction for 3D Automotive LiDAR Point Cloud Sequence

points as p

f L

, p

on the left side, and p

f R

, p

on the

right side. These points are added to the side-speciﬁc

sets, EP

and EP

, for the left and right sides, respec-

tively in each frame.

In this step, the height differences pertaining to

other surface variations on the road, e.g. potholes, are

disregarded. We visualize the points in EP

and EP

and ensure that the edges of other surface artifacts are

not labeled as road edge points. This is signiﬁcant, as

the artifact points would adversely impact the perfor-

mance of RoSELS.

– Frame Classiﬁcation: The underlying road

geometry inﬂuences the surface extraction method, as

expected. Our proposed approach uses the road edge

points for generating triangulated (surface) meshes.

The edge points along the road boundary are to

be sampled sufﬁciently for accurate edge extraction.

This sampling is dependent on the road curvature.

We ﬁrst consider a broad classiﬁcation of

“straight” and “curved” roads (Figure 2, (Right)(ii)).

We restrict our current work to straight roads for three

reasons. Firstly, curved roads need more samples as

edge points so that the edges can be extracted with

sufﬁcient accuracy, and the sample size is determined

using geometric methods. Secondly, to extract the

curved road edges piecewise, the larger road topol-

ogy is not sufﬁciently captured from the point of view

of the ego-vehicle. The road topology for turnings

and crossroads involve T- and X-intersections which

need to be captured, that is beyond the scope of the

current workﬂow. Thirdly, additional interior road

points are needed to extract curved road surfaces ap-

propriately. Addressing these three issues requires an

in-depth study which is beyond the scope of our cur-

rent work. Hence, we show a proof-of-concept for our

workﬂow for straight roads exclusively.

Transfer Learning using ResNet-50 Architecture –

For 3D LiDAR point cloud sequence, we observe that

each frame distinctly demonstrates the road geometry

from its top-view, i.e. 2D projection of the points on

the x-y plane. To exploit the perceptual differences

between frames, we propose the use of transfer learn-

ing using ResNet-50. It has been used for an effec-

tive scene classiﬁcation of perceptually distinguish-

able images (He et al., 2016).

The attribute values of the points are used to

render the 2D top-view RGB image using percep-

tually uniform sequential colormap, i.e. the viridis

colormap. The sequential colormap is further dis-

cretized to a predetermined number of bins, say 5

bins. The ground points detected in S

are ren-

dered using the colormap with respect to their remis-

sion values. We implement transfer learning with the

ResNet50 model on these images (Figure 2(Right)).

Here, pre-trained weights for image classiﬁcation of

ImageNet are used, as per the de facto standard in

transfer learning on images.

– Edge Point Set Smoothing: Now, the road edge

points identiﬁed in S

are ﬁtted to form edges. These

edges are smoothed owing to the noise in the edges.

The smoothing is done separately for the left and right

sides of the road to avoid ﬁltering out relevant points.

Edge labeling refers to the localization of edges and

ﬁltering out false positives. Thus, S

includes both

smoothing and labeling. The edge processing is im-

plemented in the world coordinate system which con-

tains the entire trajectory of the sequence. Hence, the

ﬁrst substep is the coordinate system transformation.

Local to World Coordinate System Transformation

– This transformation ensures that the smoothed

edge exists as-is in the 3D world space. Also, the

smoothing and transformation operations are non-

commutative, i.e. the order of their implementation

has to be strictly maintained. Hence, we now add

the trajectory information as an input to the work-

ﬂow (Figure 2(Left)). This input contains the posi-

tion and poses of the ego-vehicle at each frame of the

sequence. The change in position and pose is repre-

sented as transformation matrices. These matrices are

applied on the edge points in each frame to transform

them to the 3D world space.

Point Set Smoothing and Labeling – The straight

road edges are smoothed using the transformed co-

ordinates. We ﬁrst determine subsequences of frames

that form contiguous segments of straight roads. This

is implemented separately for the left and right sides.

The random sample consensus (RANSAC) line ﬁt-

ting model (Fischler and Bolles, 1981) is applied to

each such subsequence. Thus, we get disconnected

smooth line segments, that look like dashed lines, on

both sides of the road.

– 3D Road Surface Extraction: After smooth-

ing, the road surface is extracted as a triangulated

mesh formed with the left and right edge points for

each contiguous segment of the straight road. Here, a

constrained Delaunay tetrahedralization (Shewchuk,

2002) is implemented and is followed by the extrac-

tion of the outer/external surface of the tetrahedral

mesh. This generates better quality triangles com-

pared to performing 2D Delaunay triangulation on

projections of the 3D points.

Implementation of RoSELS: The RoSELS has been

implemented on Intel core i7 CPU with 12 GB of

RAM. We have used Open3D library APIs (Zhou

et al., 2018a) for point cloud registration in S

. For

neighborhood computation in S

and S

, Open3D

KDTree has been used. The scikit-learn library

APIs (Buitinck et al., 2013) have been used for im-

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

Input : A sequence S of frame-wise point clouds

{P( f

) : 0 ≤ i < n

f rames

} with frame f

index i

Input : Trajectory information of the sequence

T (S)

Output: 3D surface mesh of the extracted road R

edge

← {} // Set of edge points in S

for frame f in S do

// Ground point detection

// using Algorithm 2

( f ) ← ground-point-detection(P( f ))

// Frame classification using

// top-view image as straight- or

// curved-road

img

( f ) ← projection-xy-plane(G

( f ))

road type( f ) ←

classify-using-transfer-learning(TV

img

( f ))

// Straight-road edge detection

// using Algorithm 3

if road type( f ) is “straight-road” then

( f ) ←

straight-road-edge-detection(G

( f ))

edge

← P

edge

∪ E

( f ) // Merging all

// edge points

end

← generate-triangulated-mesh(P

edge

)

return R

Algorithm 1: The complete workﬂow of RoSELS for

road surface extraction from a sequence.

plementing the RFC and GMM models in S

and S

respectively. Frame classiﬁcation model in S

has

been implemented using Keras APIs (Chollet et al.,

2015) and the model has been trained for ﬁve epochs.

Edge point set smoothing in S

has used the RANSAC

model from the scikit-image library. PyVista library

APIs (Sullivan and Kaszynski, 2019) has been used

for geometry computation in S

4 EXPERIMENTS & RESULTS

RoSELS speciﬁcally requires an input dataset that has

sequence(s) of LiDAR point clouds through a trajec-

tory of the vehicle, and also, sufﬁcient annotations for

generating machine learning solutions. In that regard,

SemanticKITTI (Behley et al., 2019) serves well as

our test data.

Input : Point cloud P( f ) at a frame f

Output: Set of ground points G

( f )

( f ) ← {} // Set of ground points

for point p in point cloud P( f ) do

// Extraction of all features,

// as given in Table 1

for 0 ≤ i < n

scales

← ﬁnd-local-neighborhood(p,

neighborhood size)

end

← compute-features (P

, N

, . . ., N

scales

)

// Classification of points as

// ground or non-ground points

type(p) ←

classify-using-Random-Forest-Classiﬁer(F

)

// Add ground points to the output

if type(p) is “ground” then

( f ) ← G

( f ) ∪ {p}

end

return G

( f )

Algorithm 2: Ground point detection per frame,

i.e. S

4.1 Dataset

The SemanticKITTI dataset (Behley et al., 2019) has

been published primarily for three benchmark tasks,

namely semantic segmentation and scene comple-

tion of point clouds using single and multi-temporal

scans. Since our work is different from the bench-

mark tasks, validation is not readily available for the

dataset. Given its ﬁt as input to RoSELS, we use Se-

manticKITTI for our experiments and provide an ap-

propriate qualitative and quantitative assessment.

The SemanticKITTI dataset comprises of over

43,000 scans of which over 21,000 are from the train-

ing sequence IDs, 00 to 10. We have used the se-

quence 08 as the validation/test set, as prescribed by

the data providers, thus training our model on the re-

maining training sequences for our classiﬁer models,

i.e. RFC model for ground point detection (S

), and

transfer learning model for frame classiﬁcation (S

We have only used every 10

frame of training se-

quences of SemanticKITTI since frames are captured

in 0.1s and our subsampling ensures signiﬁcant varia-

tions in the vehicle environment are captured without

incurring high computational costs. We have found

that including more overlapping data resulted in in-

creased computation without adding new information.

Overall, the dataset has annotations for 28 dis-

tinct classes for the semantic segmentation bench-

mark task. We consider ﬁve such classes, namely

RoSELS: Road Surface Extraction for 3D Automotive LiDAR Point Cloud Sequence

Input : Set of ground points G

( f ) at a frame f with label “straight-road”

Input : Trajectory information of the sequence T(S)

Output: Set of road edge points E

( f )

( f ) ← {} // Set of non-flat region points

for point p in G

( f ) do

← compute-accumulation-map(p)

← ﬁnd-local-neighborhood(p, neighborhood size)

∆z

← compute-local-height-difference-features(G

, N

, A

)

region type(p) ← classify-using-GMM(F

∆z

) // Classify as flat or non-flat region

if region type(p) is “non-ﬂat” then

( f ) ← NF

( f ) ∪ {p}

end

// Generate range image of size (H,W ) using non-flat region points

img

( f ) ← range-image-generation(NF

( f ), G

( f ), H,W )

← {}

// Detect edge points from range image

for 0 ≤ row < H do

c f

← (row,

) // Determine centerline pixel in the front side

// using the column for sensor

cbL

, P

cbR

← (row, 0), (row,W ) // Determine centerline pixel in the back (rear) side

// using the columns for sensor

f L

← point-in-pixel-furthest-from-centerline-in-pixel-interval(P

c f



c f

, P

cbL



)

f R

← point-in-pixel-furthest-from-centerline-in-pixel-interval(P

c f



c f

, P

cbR



)

← point-in-pixel-furthest-from-centerline-in-pixel-interval(P

CRL



cbL

, P

c f



)

← point-in-pixel-furthest-from-centerline-in-pixel-interval(P

CRR



cbR

, P

c f



)

← EP

∪ {p

f L

, p

}

← EP

∪ {p

f R

, p

}

// Correct the selected edge points in 3D world space

for point p in {p

f L

, p

f R

, p

} do

p ← transform-using-trajectory-information(p, T (S))

end

// Postprocessing edges to remove outliers

← smooth-edge-using-RANSAC-line-ﬁtting(EP

)

← smooth-edge-using-RANSAC-line-ﬁtting(EP

)

( f ) ← EP

∪ EP

return E

( f )

Algorithm 3: Straight road edge detection per frame, followed by collation of edge points from all frames (S

, S

“road,” “parking,” “sidewalk,” “other ground,” and

“terrain,” together as the “ground” class in RoSELS.

Thus, the “non-ground” class implies the remaining

classes, i.e. movable objects, such as “car,” “bicy-

cle,” etc., and stationary objects, such as, “building,”

“fence,” “vegetation,” etc. The curbs of the road are

labeled as sidewalk (Behley et al., 2021) and are im-

portant in our evaluation.

For the frame classiﬁcation, we have manually an-

notated all frames in all training sequences, i.e. from

00 to 10, into “straight,” “crossroad,” and “turning.”

4.2 Parameter Setting and Experiments

For multi-scale feature extraction for ground point

detection using RFC, hybrid criteria for neighbor-

hood determination have been used for three differ-

ent scales. We have commonly used the constraint

of r of 1m for the spherical neighborhood in all the

scales, and variable k values for the knn neighbor-

hood, i.e. k = 50, 100, 200 neighbors. We have sys-

tematically experimented with several combinations

of neighborhood criteria to arrive at this parameter

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

Table 2: Speciﬁcations of the SemanticKITTI sequences used for training and validation/testing in RoSELS.

Seq. ID # Frames Ground truth (GT) Filtered after outlier removal in S

Training Total Used Straight Crossroad Turning # Points # “Ground” points “Road”

(%) # “Ground” points “Road”

(%)

00 4,541 455 238 205 12 55,300,603 21,242,723 45.2 20,928,740 45.2

01 1,101 111 69 23 19 11,737,924 6,684,753 71.8 6,425,398 72.0

02 4,661 467 195 134 138 58,678,800 26,955,344 42.8 26,568,086 42.8

03 801 81 17 48 16 10,038,550 4,563,802 48.8 4,485,650 49.0

04 271 28 25 3 0 3,518,075 1,816,228 65.9 1,779,528 66.4

05 2,761 277 172 102 3 34,624,816 14,025,815 40.5 13,802,511 40.5

06 1,101 111 54 57 0 13,567,503 8,417,991 34.1 8,223,230 34.1

07 1,101 111 54 57 0 1,3466,390 5,301,837 48.1 5,233,937 48.2

09 1,591 160 42 39 79 19,894,193 9,313,682 45.0 9,159,419 45.1

10 1,201 121 64 32 25 15,366,254 5,608,339 43.7 5,487,403 43.9

All 19130 1922 930 700 292 236,193,108 103,930,514 45.3 102,093,902 45.3

Testing Filtered and Classiﬁed in S

08 4071 408 261 124 23 50,006,369 21,943,921 40.3 20,919,150 41.1

Our annotation of “ground” combines ﬁve classes, namely, “road,” “parking,” “sidewalk,” “terrain,” and “other ground,” as given in SemanticKITTI dataset. Percentage values in

columns 9 and 11 give the fraction of ground points in columns 8 and 10, respectively, that are annotated as “Road” in SK. Boldface indicates improvement in retaining road points, thus

demonstrating the efﬁciency of processes in S

setting. Similarly, we have used similar hybrid cri-

teria, i.e. r = 1m and k = 50 for ﬁnding the local

neighborhood of ground points for computing height-

difference features to be used in the GMM for detect-

ing ﬂat and non-ﬂat regions.

We have used sequences 01, 05, 07, and 08 from

the training dataset for road surface extraction. We

have also tested our proposed method on sequence 15

from the test dataset. Our results for all the sequences

are given in Figure 3. The performance of our edge

point set smoothing in S

in sequence 07 is demon-

strated in Figure 4.

4.3 Results

For each step in our workﬂow, we perform both qual-

itative analysis using visualization and appropriate

quantitative evaluation.

: Ground Detection: The details of the sequences

used in our models are given in Table 2. For all train-

ing sequences, we observe that the percentage of road

points preserved as ground points does not reduce af-

ter outlier removal in S

. This shows the efﬁciency

of our outlier removal process while preserving the

“road” points. The results of our ground detection

using RFC and different experiments we performed

by including multi-scale features and registrations are

given in Table 3. The table shows the average ac-

curacy and the mean IoU (mIoU) for ground class

across all frames of test sequence 08. We observe that

GndNet (Paigwar et al., 2020) has reported an mIoU

of 83.6%, but is not comparable here, as their mIoU

has been calculated across both the ground and non-

ground classes together. Similarly, ground segmenta-

tion in (Arora et al., 2021) has reported an mIoU for

the “ground” class as 78.46% but it is not compara-

ble as their “ground” class includes “vegetation” ad-

ditionally. While we cannot directly compare, these

mIoU scores indicate that our approach for ground

point classiﬁcation shows a considerably high level

Table 3: Ground detection using random forest classiﬁer

(RFC).

Set of points # Scales Classiﬁcation mIoU

to be classiﬁed for local features accuracy (%-age) (%-age)

All points

1 (single scale) 96.37 89.38

3 (multi-scale) 96.63 90.63

Filtered

points

1 (single scale) 96.58 89.47

3 (multi-scale) 96.91 90.79

* Filtered points are those that were retained after outlier removal in S

Table 4: Results of frame classiﬁcation using transfer learn-

ing.

#Class hierarchy

Class outcomes Classiﬁcation

levels accuracy (%age)

1 Straight road, Curved road 82.35

2 Straight road, Crossroad, 78.51

Turning

* Frame class hierarchy is as shown in Figure 2(Right,(ii)).

Table 5: Class distribution of road edge points in extracted

surface.

GT Class ↓ # Points (% age)

Seq. ID → 01 05 07 08

Road 12,437 (94.0) 10,093 (63.4) 3,864 (76.5) 19,856 (84.3)

Parking 0 408 (2.6) 448 (8.9) 710 (3.0)

Sidewalk 6 (0.0) 5,243 (32.9) 695 (13.8) 2,599 (11.0)

Terrain 519 (3.9) 161 (1.0) 42 (0.8) 393 (1.7)

Other-

ground 276 (2.1) 18 (0.1) 0 0

Non-

ground 0 0 0 7 (0.0)

* Underlined %-age values show that road edge points in the extracted surface belong to

“road” and “sidewalk” classes, predominantly, and as desired.

of accuracy.

: Frame Classiﬁcation: As an experiment, we

have trained two different frame classiﬁcation mod-

els corresponding to the different levels of frame/road

geometry class hierarchy (Figure 2, (Right)(ii)). The

ﬁrst model is for classiﬁcation into straight or curved

roads, and the second one is for classiﬁcation into

straight roads, crossroads, and turnings. The vali-

dation accuracy on sequence 08 for both the frame

classiﬁcation models is shown in Table 5. Given the

higher accuracy at the ﬁrst level of classiﬁcation, we

have used the model for “straight” and “curved” roads

here. These accuracy results can be further improved

in future work by addressing the class imbalance in

both hierarchical levels (Table 2).

RoSELS: Road Surface Extraction for 3D Automotive LiDAR Point Cloud Sequence

Figure 3: Results of implementing RoSELS on training sequences of SemanticKITTI (Behley et al., 2019), where 01, 05,

and 07 have been used for training our learning models, and 08 has been used for validation/testing. Row A follows the color

scheme as mentioned in Figure 4. For rows B, C, and D wireframe meshes are shown in indigo and ﬁlled meshes are shown

in tan color.

: Edge Point Set Smoothing: The class distri-

bution of our road edge points after S

is shown in

Table 5. We observe that most of the edge points be-

long to the “road” class predominantly, followed by

the “sidewalk” class. This shows that our approach

identiﬁes edge points that are annotated as “road” or

“sidewalk,” as expected. This shows the combined

efﬁciency of S

, S

, and S

. The edge point set

smoothing results for sequence 07 in Figure 4 show

that the noise in edge points is substantially reduced,

thus giving smooth road edges on the left and right of

the trajectory.

To quantify the error, we perform these three steps

on the “road” points in the ground truth (GT), and

compute the root mean square error between the edge

point sets computed using “road” (GT) and “ground”

(detected in S

) points. We perform this analysis ow-

ing to the absence of ground truth of edge points and

extracted surface. The RMSE values for sequences

01, 05, 07, and 08 are given in Figure 3. Given that

each frame has an extent of 51.2m in front of the ve-

hicle and 25.6m on either side (Behley et al., 2019),

we observe that the RMSE errors are relatively low.

: 3D Road Surface Extraction: The results of

the 3D extracted surface for trajectories of different

sequences are visualized in Figure 3. Our qualita-

tive results show that RoSELS works efﬁciently on

straight paths, closed trajectories, and complex trajec-

tories that predominantly have large contiguous seg-

ments of straight roads. Our results of surfaces ex-

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

Figure 4: Edge point set smoothing results on sequence 07

data trajectory (top row) with zoomed-in region inset (bot-

tom row). Red, purple and green points show the trajectory,

left and right edge points, respectively.

tracted using detected ground points and “road” (GT)

points are overlaid for demonstrating their similar-

ity. It can be seen that short segments of turning or

connections between straight road segments have also

been effectively covered in the triangulated meshes.

RoSELS has also been entirely implemented on

test sequence 15. As the GT for the test sequences is

not available, we qualitatively compare our extracted

road surface with the reference trajectory using sur-

face and point rendering (Figure 5(Left)). In all se-

quences, including 15, we observe that the road sur-

face mesh preserves the trajectory as its medial skele-

ton (Tagliasacchi et al., 2016), as expected.

However, RoSELS fails to extract the road surface

for the entire trajectory where: (a) the segments with

straight roads are highly fragmented, and (b) the sub-

sequences have comparable segments of curved and

straight roads. When edge points are not identiﬁed

for large segments of the road, as seen in training

sequence 03 (Figure 5(Right)), RoSELS extracts the

road surface partially. Surface extraction for a com-

plete trajectory for such scenarios requires an in-depth

study of curved roads, which is beyond the scope of

our current work.

5 CONCLUSIONS

We have proposed and implemented RoSELS, a novel

system for automating 3D road surface extraction for

a sequence of 3D automotive LiDAR point clouds. It

implements a ﬁve-step workﬂow. Firstly, with out-

lier removal and multiscale feature extraction, super-

vised learning using RFC is used for ground point de-

tection. Secondly, the height differences in ground

Figure 5: Results of RoSELS on (Left) a sample sequence

from the test set of SemanticKITTI (Behley et al., 2019),

for which annotations for semantic segmentation have not

been published; and (Right) a sample sequence where the

surface is only partially extracted.

points are used to detect road edge points using the

EM algorithm. Simultaneously, our frame classiﬁca-

tion provides the road geometry by transfer learning

using ResNet-50 on top-view images. The fourth step

is for smoothing the edge point set in the sequence.

As the last step, the 3D road surface is extracted as

a triangulated mesh using 3D Delaunay tetrahedral-

ization. Our experiments on four sequences in Se-

manticKITTI with varying complexity in geometry

have yielded good results, which have been quali-

tatively and quantitatively veriﬁed. Thus, RoSELS

works successfully on trajectories with contiguous

straight roads, predominantly.

Although road surfaces across different trajecto-

ries are extracted with a high level of visual similarity

using the proposed algorithm, our approach fails to

extract road surfaces for the entire trajectory where

the segments do not have contiguous straight road ge-

ometry. Thus, extending our method to curved roads

is in the scope of future work. A more robust met-

ric and ground truth for validation are also open chal-

lenges for the road surface extraction application.

ACKNOWLEDGEMENT

We are grateful to Machine Intelligence and Robotics

(MINRO) grant from the Government of Karnataka

for supporting our work through graduate fellowship

and conference support, and to Mayank Sati and Sunil

Karunakaran, who are with the Ignitarium Technol-

ogy Solutions, Pvt. Ltd., for their insightful discus-

sion and suggestions. We thank IIITB and the re-

search group at GVCL for their constant support.

RoSELS: Road Surface Extraction for 3D Automotive LiDAR Point Cloud Sequence

REFERENCES

Arora, M., Wiesmann, L., Chen, X., and Stachniss, C.

(2021). Mapping the Static Parts of Dynamic Scenes

from 3D LiDAR Point Clouds Exploiting Ground

Segmentation. In 2021 European Conference on Mo-

bile Robots (ECMR), pages 1–6. IEEE.

Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke,

S., Gall, J., and Stachniss, C. (2021). Towards 3d

lidar-based semantic scene understanding of 3d point

cloud sequences: The semantickitti dataset. The Inter-

national Journal of Robotics Research, 40(8-9):959–

967.

Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke,

S., Stachniss, C., and Gall, J. (2019). SemanticKITTI:

A dataset for semantic scene understanding of LiDAR

sequences. In Proceedings of the IEEE/CVF Interna-

tional Conference on Computer Vision, pages 9297–

9307.

Besl Paul, J. and McKay, N. D. (1992). A method for regis-

tration of 3-D shapes. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 14(2):239–256.

Breiman, L. (2001). Random forests. Machine learning,

45(1):5–32.

Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F.,

Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P.,

Gramfort, A., Grobler, J., Layton, R., VanderPlas, J.,

Joly, A., Holt, B., and Varoquaux, G. (2013). API de-

sign for machine learning software: experiences from

the scikit-learn project. In ECML PKDD Workshop:

Languages for Data Mining and Machine Learning,

pages 108–122.

Chen, X., Vizzo, I., L

abe, T., Behley, J., and Stach-

niss, C. (2021). Range image-based LiDAR local-

ization for autonomous vehicles. In 2021 IEEE In-

ternational Conference on Robotics and Automation

(ICRA), pages 5802–5808. IEEE.

Chollet, F. et al. (2015). Keras. https://keras.io.

Cortinhal, T., Tzelepi, G., and Aksoy, E. (2021). Sal-

saNext: Fast, Uncertainty-aware Semantic Segmen-

tation of LiDAR Point Clouds for Autonomous Driv-

ing. In 15th International Symposium, ISVC 2020, San

Diego, CA, USA, October 5–7, 2020, volume 12510.

Springer.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).

Maximum likelihood from incomplete data via the

EM algorithm. Journal of the Royal Statistical So-

ciety: Series B (Methodological), 39(1):1–22.

Fischler, M. A. and Bolles, R. C. (1981). Random sample

consensus: A paradigm for model ﬁtting with appli-

cations to image analysis and automated cartography.

Communications of the ACM, 24(6):381–395.

Gigli, L., Kiran, B. R., Paul, T., Serna, A., Vemuri, N., Mar-

cotegui, B., and Velasco-Forero, S. (2020). Road Seg-

mentation on low resolution LIDAR point clouds for

autonomous vehicles. In XXIV International Society

for Photogrammetry and Remote Sensing Congress,

Nice, France. ISPRS 2020.

Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., and Ben-

namoun, M. (2020). Deep learning for 3d point

clouds: A survey. IEEE transactions on pattern anal-

ysis and machine intelligence.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Hu, Q., Yang, B., Xie, L., Rosa, S., Guo, Y., Wang, Z.,

Trigoni, N., and Markham, A. (2020). Randla-net:

Efﬁcient semantic segmentation of large-scale point

clouds. In Proceedings of the IEEE/CVF Conference

on Computer Vision and Pattern Recognition, pages

11108–11117.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

agenet classiﬁcation with deep convolutional neural

networks. Advances in neural information processing

systems, 25.

Kumari, B. and Sreevalsan-Nair, J. (2015). An interactive

visual analytic tool for semantic classiﬁcation of 3D

urban LiDAR point cloud. In Proceedings of the 23rd

SIGSPATIAL International Conference on Advances

in Geographic Information Systems, pages 1–4.

Liu, Z., Liu, D., Chen, T., and Wei, C. (2013). Curb detec-

tion using 2D range data in a campus environment. In

2013 Seventh International Conference on Image and

Graphics, pages 291–296. IEEE.

Milioto, A., Vizzo, I., Behley, J., and Stachniss, C. (2019).

Rangenet++: Fast and accurate lidar semantic seg-

mentation. In 2019 IEEE/RSJ International Confer-

ence on Intelligent Robots and Systems (IROS), pages

4213–4220. IEEE.

Ouyang, Z., Dong, X., Cui, J., Niu, J., and Guizani, M.

(2021). PV-EncoNet: Fast Object Detection Based on

Colored Point Cloud. IEEE Transactions on Intelli-

gent Transportation Systems, Early Access:1–12.

Paigwar, A., Erkent,

O., Sierra-Gonzalez, D., and Laugier,

C. (2020). Gndnet: Fast ground plane estimation and

point cloud segmentation for autonomous vehicles. In

2020 IEEE/RSJ International Conference on Intelli-

gent Robots and Systems (IROS), pages 2150–2156.

IEEE.

Rist, C. B., Schmidt, D., Enzweiler, M., and Gavrila, D. M.

(2020). SCSSnet: Learning Spatially-Conditioned

Scene Segmentation on LiDAR Point Clouds. In

2020 IEEE Intelligent Vehicles Symposium (IV), pages

1086–1093. IEEE.

Scott, G. J., England, M. R., Starms, W. A., Marcum, R. A.,

and Davis, C. H. (2017). Training deep convolutional

neural networks for land–cover classiﬁcation of high-

resolution imagery. IEEE Geoscience and Remote

Sensing Letters, 14(4):549–553.

Shen, Z., Liang, H., Lin, L., Wang, Z., Huang, W., and Yu,

J. (2021). Fast Ground Segmentation for 3D LiDAR

Point Cloud Based on Jump-Convolution-Process. Re-

mote Sensing, 13(16):3239.

Shewchuk, J. R. (2002). Constrained Delaunay Tetrahe-

dralizations and Provably Good Boundary Recovery.

In Eleventh International Meshing Roundtable (IMR),

pages 193–204.

Stainvas, I. and Buda, Y. (2014). Performance evaluation

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

for curb detection problem. In 2014 IEEE Intelligent

Vehicles Symposium Proceedings, pages 25–30. IEEE.

Sui, L., Zhu, J., Zhong, M., Wang, X., and Kang, J. (2021).

Extraction of road boundary from MLS data using

laser scanner ground trajectory. Open Geosciences,

13(1):690–704.

Sullivan, C. B. and Kaszynski, A. (2019). PyVista: 3D plot-

ting and mesh analysis through a streamlined interface

for the Visualization Toolkit (VTK). Journal of Open

Source Software, 4(37):1450.

Tagliasacchi, A., Delame, T., Spagnuolo, M., Amenta, N.,

and Telea, A. (2016). 3D Skeletons: A State-of-the-

Art Report. In Computer Graphics Forum, volume 35,

pages 573–597. Wiley Online Library.

Weinmann, M., Jutzi, B., and Mallet, C. (2014). Seman-

tic 3D scene interpretation: A framework combin-

ing optimal neighborhood size selection with rele-

vant features. ISPRS Annals of the Photogramme-

try, Remote Sensing and Spatial Information Sciences,

2(3):181. doi : https://doi.org/10.5194/isprsannals-II-

3-181-2014.

Zhao, L., Yan, L., and Meng, X. (2021). The Extraction

of Street Curbs from Mobile Laser Scanning Data in

Urban Areas. Remote Sensing, 13(12):2407.

Zhou, Q.-Y., Park, J., and Koltun, V. (2018a). Open3D:

A modern library for 3D data processing.

arXiv:1801.09847.

Zhou, Z., Zheng, Y., Ye, H., Pu, J., and Sun, G. (2018b).

Satellite image scene classiﬁcation via ConvNet with

context aggregation. In Paciﬁc Rim Conference on

Multimedia, pages 329–339. Springer.

Ziou, D. and Tabbone, S. (1998). Edge Detection Tech-

niques - an Overview. Pattern Recognition and Image

Analysis (Advances in Mathematical Theory and Ap-

plications), 8(4):537–559.

RoSELS: Road Surface Extraction for 3D Automotive LiDAR Point Cloud Sequence