Visual place recognition with omnidirectional images and early fusion techniques
DOI:
https://doi.org/10.17979/ja-cea.2025.46.12239Keywords:
Mobile robotics, Visual localization, Omnidirectional cameras, Sensory fusion, Deep learningAbstract
Omnidirectional cameras are a highly suitable option for mobile robot localization, given their ability to capture abundant and contextual scene information with a wide field of view. However, pure visual data is inherently sensitive to environmental appearance changes, which can impact system robustness. To address this limitation, this paper proposes combining omnidirectional images with intrinsic features derived from them, such as average intensity or gradient magnitude, using early fusion techniques. Subsequently, the fused information is processed by a convolutional neural network, pre-trained on extensive datasets for visual place recognition. The obtained results demonstrate that enriching visual information with these features significantly enhances system robustness, enabling precise and reliable localization in both indoor and outdoor environments, even under highly varied lighting conditions. The code used is available via the following link: https://github.com/MarcosAlfaro/LocalizacionVisualFusionTemprana/ .
References
Ali-Bey, A., Chaib-Draa, B., Giguere, P., 2023. MixVPR: Feature mixing for visual place recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 2998–3007. DOI: 10.48550/arXiv.2303.02190
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J., 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5297–5307. DOI: https://arxiv.org/pdf/2303.02190
Berton, G., Masone, C., Caputo, B., 2022. Rethinking visual geo-localization for large-scale applications. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4878–4888. DOI: 10.48550/arXiv.2204.02287
Cabrera, J. J., Santo, A., Gil, A., Viegas, C., Pay´a, L., 2024. MinkUNeXt: Point cloud-based large-scale place recognition using 3D sparse convolutions. arXiv preprint arXiv:2403.07593. DOI: 10.48550/arXiv.2403.07593
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. DOI: 10.48550/arXiv.2010.11929
Flores, M., Valiente, D., Gil, A., Peidr´o, A., Reinoso, O., Pay´a, L., 2021. Evaluación de descriptores locales en localización visual con imágenes ojo de pez. In: XLII Jornadas de Automática. Universidade da Coruña, Servizo de Publicacións, pp. 507–514. DOI: 10.17979/spudc.9788497498043.507
Huang, H., Liu, C., Zhu, Y., Cheng, H., Braud, T., Yeung, S.-K., June 2024. 360Loc: A dataset and benchmark for omnidirectional visual localization with cross-device queries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 22314–22324. DOI: 10.48550/arXiv.2311.17389
Izquierdo, S., Civera, J., 2024. Optimal transport aggregation for visual place recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition. pp. 17658–17668. DOI: 10.48550/arXiv.2311.15937
Karypidis, E., Kakogeorgiou, I., Gidaris, S., Komodakis, N., 2024. DINOForesight: Looking into the future with DINO. CoRR. DOI: 10.48550/arXiv.2412.11673
Lai, H., Yin, P., Scherer, S., 2022. Adafusion: Visual-LiDAR fusion with adaptive weights for place recognition. IEEE Robotics and Automation Letters 7 (4), 12038–12045. DOI: 10.1109/LRA.2022.3210880
Liu, W., Fei, J., Zhu, Z., 2022. MFF-PR: Point cloud and image multi-modal feature fusion for place recognition. In: 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, pp. 647–655. DOI: 10.1109/ISMAR55827.2022.00082
Masone, C., Caputo, B., 2021. A survey on deep visual place recognition. IEEE Access 9, 19516–19547. DOI: 10.1109/ACCESS.2021.3054937
Pan, Y., Xie, J., Wu, J., Zhou, B., 2024. Camera-LiDAR fusion with latent correlation for cross-scene place recognition. IEEE Transactions on Industrial Electronics. DOI: 10.1007/978-3-031-72754-2 25
Payá, L., Reinoso, O., Berenguer, Y., Úbeda, D., 2016. Using omnidirectional vision to create a model of the environment: A comparative evaluation of global-appearance descriptors. Journal of Sensors 2016 (1), 1209507. DOI: 10.1155/2016/1209507
Pronobis, A., Caputo, B., 2009. COLD: The CoSy localization database. The International Journal of Robotics Research 28 (5), 588–594. DOI: 10.1177/0278364909103912
Santo, A., Gil, A., Valiente, D., Ballesta, M., Reinoso, O., 2023. Estimación de zonas transitables en nubes de puntos 3D con redes convolucionales dispersas. In: XLIV Jornadas de Automática. Universidade da Coruña. Servizo de Publicacións, pp. 737–737. DOI: 10.17979/spudc.9788497498609.732
Uy, M. A., Lee, G. H., 2018. PointNetVLAD: Deep point cloud based retrieval for large-scale place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4470–4479. DOI: 10.48550/arXiv.1804.03492
Yin, P., Jiao, J., Zhao, S., Xu, L., Huang, G., Choset, H., Scherer, S., Han, J., 2025. General place recognition survey: Towards real-world autonomy. IEEE Transactions on Robotics. DOI: 10.1109/TRO.2025.3550771
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Marcos Alfaro , Juan José Cabrera Mora, Oscar Reinoso García, Arturo Gil Aparicio, Luis Payá Castelló

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.