Semantic segmentation of panoramic images and panoramic image based outdoor visual localization için kapak resmi
Semantic segmentation of panoramic images and panoramic image based outdoor visual localization
Başlık:
Semantic segmentation of panoramic images and panoramic image based outdoor visual localization
Yazar:
Orhan, Semih, author.
Yazar Ek Girişi:
Fiziksel Tanımlama:
xiii, 71 leaves: charts;+ 1 computer laser optical disc.
Özet:
360-degree views are captured by full omnidirectional cameras and generally represented with panoramic images. Unfortunately, these images heavily suffer from the spherical distortion at the poles of the sphere. In previous studies of Convolutional Neural Networks (CNNs), several methods have been proposed (e.g. equirectangular convolution) to alleviate spherical distortion. Getting inspired from these previous efforts, we developed an equirectangular version of the UNet model. We evaluated the semantic segmentation performance of the UNet model and its equirectangular version on an outdoor panoramic dataset. Experimental results showed that the equirectangular version of UNet performed better than UNet. In addition, we released the pixel-level annotated dataset, which is one of the first semantic segmentation datasets of outdoor panoramic images. In visual localization, localizing perspective query images in a panoramic image dataset can alleviate the non-overlapping view problem between cameras. Generally, perspective query images are localized in a panoramic image database with generating its virtual 4 or 8 gnomonic views, which is deforming sphere into cube faces. Doing so can simplify the searching problem to perspective to perspective search, but still there might be a non-overlapping view problem between query and gnomonic database images. Therefore we propose directly localizing perspective query images in panoramic images by applying sliding windows on the last convolution layer of CNNs. Features are extracted with R-MAC, GeM, and SFRS. Experimental results showed that the sliding window approach outperformed 4-gnomonic views, and we get competitive results compared with 8 and 12 gnomonic views. Any city-scale visual localization system has to be robust against long-term changes. Semantic information is more robust to such changes (e.g. surface of the building), and the depth maps provide geometric clues. In our work, we utilized semantic and depth information while pose verification, that is checking semantic and depth similarity to verify the poses (retrievals) obtained with the approach that use only RGB image features. Semantic and depth information are represented with a self-supervised contrastive learning approach (SimCLR). Experimental results showed that pose verification with semantic and depth features improved the visual localization performance of the RGB-only model.
Yazar Ek Girişi:
Tek Biçim Eser Adı:
Thesis (Doctoral)--İzmir Institute of Technology:Computer Engineering.

İzmir Institute of Technology: Computer Engineering--Thesis (Doctoral).
Elektronik Erişim:
Access to Electronic Versiyon.
Ayırtma: Copies: