Visual place recognition with omnidirectional images and early fusion techniques

Authors

DOI:

https://doi.org/10.17979/ja-cea.2025.46.12239

Keywords:

Mobile robotics, Visual localization, Omnidirectional cameras, Sensory fusion, Deep learning

Abstract

Omnidirectional cameras are a highly suitable option for mobile robot localization, given their ability to capture abundant and contextual scene information with a wide field of view. However, pure visual data is inherently sensitive to environmental appearance changes, which can impact system robustness. To address this limitation, this paper proposes combining omnidirectional images with intrinsic features derived from them, such as average intensity or gradient magnitude, using early fusion techniques. Subsequently, the fused information is processed by a convolutional neural network, pre-trained on extensive datasets for visual place recognition. The obtained results demonstrate that enriching visual information with these features significantly enhances system robustness, enabling precise and reliable localization in both indoor and outdoor environments, even under highly varied lighting conditions. The code used is available via the following link: https://github.com/MarcosAlfaro/LocalizacionVisualFusionTemprana/ .

Author Biography

  • Marcos Alfaro Pérez, Universidad Miguel Hernández de Elche

    Marcos es investigador predoctoral en la Universidad Miguel Hernández de Elche. Forma parte del grupo "Automatización, Robótica y Visión por Computador", y su línea de investigación se centra en el desarrollo de herramientas de aprendizaje profundo para la localización visual de robots móviles.

References

Ali-Bey, A., Chaib-Draa, B., Giguere, P., 2023. MixVPR: Feature mixing for visual place recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 2998–3007. DOI: 10.48550/arXiv.2303.02190

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J., 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5297–5307. DOI: https://arxiv.org/pdf/2303.02190

Berton, G., Masone, C., Caputo, B., 2022. Rethinking visual geo-localization for large-scale applications. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4878–4888. DOI: 10.48550/arXiv.2204.02287

Cabrera, J. J., Santo, A., Gil, A., Viegas, C., Pay´a, L., 2024. MinkUNeXt: Point cloud-based large-scale place recognition using 3D sparse convolutions. arXiv preprint arXiv:2403.07593. DOI: 10.48550/arXiv.2403.07593

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. DOI: 10.48550/arXiv.2010.11929

Flores, M., Valiente, D., Gil, A., Peidr´o, A., Reinoso, O., Pay´a, L., 2021. Evaluación de descriptores locales en localización visual con imágenes ojo de pez. In: XLII Jornadas de Automática. Universidade da Coruña, Servizo de Publicacións, pp. 507–514. DOI: 10.17979/spudc.9788497498043.507

Huang, H., Liu, C., Zhu, Y., Cheng, H., Braud, T., Yeung, S.-K., June 2024. 360Loc: A dataset and benchmark for omnidirectional visual localization with cross-device queries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 22314–22324. DOI: 10.48550/arXiv.2311.17389

Izquierdo, S., Civera, J., 2024. Optimal transport aggregation for visual place recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition. pp. 17658–17668. DOI: 10.48550/arXiv.2311.15937

Karypidis, E., Kakogeorgiou, I., Gidaris, S., Komodakis, N., 2024. DINOForesight: Looking into the future with DINO. CoRR. DOI: 10.48550/arXiv.2412.11673

Lai, H., Yin, P., Scherer, S., 2022. Adafusion: Visual-LiDAR fusion with adaptive weights for place recognition. IEEE Robotics and Automation Letters 7 (4), 12038–12045. DOI: 10.1109/LRA.2022.3210880

Liu, W., Fei, J., Zhu, Z., 2022. MFF-PR: Point cloud and image multi-modal feature fusion for place recognition. In: 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, pp. 647–655. DOI: 10.1109/ISMAR55827.2022.00082

Masone, C., Caputo, B., 2021. A survey on deep visual place recognition. IEEE Access 9, 19516–19547. DOI: 10.1109/ACCESS.2021.3054937

Pan, Y., Xie, J., Wu, J., Zhou, B., 2024. Camera-LiDAR fusion with latent correlation for cross-scene place recognition. IEEE Transactions on Industrial Electronics. DOI: 10.1007/978-3-031-72754-2 25

Payá, L., Reinoso, O., Berenguer, Y., Úbeda, D., 2016. Using omnidirectional vision to create a model of the environment: A comparative evaluation of global-appearance descriptors. Journal of Sensors 2016 (1), 1209507. DOI: 10.1155/2016/1209507

Pronobis, A., Caputo, B., 2009. COLD: The CoSy localization database. The International Journal of Robotics Research 28 (5), 588–594. DOI: 10.1177/0278364909103912

Santo, A., Gil, A., Valiente, D., Ballesta, M., Reinoso, O., 2023. Estimación de zonas transitables en nubes de puntos 3D con redes convolucionales dispersas. In: XLIV Jornadas de Automática. Universidade da Coruña. Servizo de Publicacións, pp. 737–737. DOI: 10.17979/spudc.9788497498609.732

Uy, M. A., Lee, G. H., 2018. PointNetVLAD: Deep point cloud based retrieval for large-scale place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4470–4479. DOI: 10.48550/arXiv.1804.03492

Yin, P., Jiao, J., Zhao, S., Xu, L., Huang, G., Choset, H., Scherer, S., Han, J., 2025. General place recognition survey: Towards real-world autonomy. IEEE Transactions on Robotics. DOI: 10.1109/TRO.2025.3550771

Downloads

Published

2025-09-01

Issue

Section

Visión por Computador