Abstract:
Road scene understanding, as a field of research, has attracted increasing attention in
recent years. The development of road scene understanding capabilities that are applicable to realworld
road scenarios has seen numerous complications. This has largely been due to the cost and
complexity of achieving human-level scene understanding, at which successful segmentation of road
scene elements can be achieved with a mean intersection over union score close to 1.0. There is a need
for more of a unified approach to road scene segmentation for use in self-driving systems. Previous
works have demonstrated how deep learning methods can be combined to improve the segmentation
and perception performance of road scene understanding systems. This paper proposes a novel
segmentation system that uses fully connected networks, attention mechanisms, and multiple-input
data stream fusion to improve segmentation performance. Results show comparable performance
compared to previous works, with a mean intersection over union of 87.4% on the Cityscapes dataset.
Description:
DATA AVAILABILITY STATEMENT : Two datasets are references in this paper. The Cityscapes dataset is
available in the Cityscapes web repository [21]. The CARLA dataset was custom-recorded from the
CARLA simulator [44] and can be obtained from the first author upon request. The main training scripts that were used to create the road scene segmentation model will be made available with
this paper.