Integration of RGB cameras and a 3D LiDAR for traffic participant detection in CARLA and ROS2

Authors

DOI:

https://doi.org/10.17979/ja-cea.2025.46.12062

Keywords:

Autonomous vehicles, Automotive sensors, Neural networks, Sensor data fusion, Perception and sensing

Abstract

This paper improves the detection of traffic participants for autonomous vehicles by fusing data obtained from onboard RGB cameras and a 3D LiDAR. To this end, the BEVFusion model is employed, which integrates the features of these sensors in a space shared by both, called bird’s eye view. This model is trained and evaluated on a synthetic dataset generated in the CARLA (Car Learning to Act) simulator, which includes different types of traffic participants such as cars, pedestrians, trucks, buses and motorbikes. Metrics such as average accuracy and average errors of translation, scale, orientation, speed are analysed. In addition, the influence on the average accuracy of the distance between the traffic participants and the sensorized vehicle are analysed quantitatively. The model is evaluated qualitatively with online data from CARLA and processed using a BEVFusion wrapper in the environment for developing robotic applications ROS2 (Robot Operating System).

References

Bai, X., Hu, Z., Zhu, X., Huang, Q., Chen, Y., Fu, H., Tai, C.-L., 2022. TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, New Orleans, LA, USA, 1090–1099.

Caesar, H., Bankiti, V., Lang, A. H., Vora, S., Liong, V. E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O., 2020. nuScenes: A multimodal dataset for autonomous driving. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 11621-11631. DOI: 10.1109/CVPR42600.2020.01164

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V., 2017. CARLA: An Open Urban Driving Simulator. Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, California, USA, 1–16. DOI: 10.48550/arXiv.1711.03938

Huang, K., Shi, B., Li, X., Li, X., Huang, S., Li, Y., 2022. Multi-modal sensor fusion for auto driving perception: A survey. DOI: 10.48550/arXiv.2202.02703

linClubs, 2023. BEVFusion-ROS-TensorRT. https://github.com/linClubs/BEVFusion-ROS-TensorRT.

Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D. L., Han, S.,2023. BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation. IEEE International Conference on Robotics and Automation (ICRA), London, UK, 2774–2781. DOI: 10.1109/ICRA48891.2023.10160968

Mehr, G., Eskandarian, A., 2025. SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset. DOI: 10.48550/arXiv.2502.01894

Montenegro, J., García-Guillén, A., Castro, F. M., Martínez, J. L., Morales, J., 2024. Detección de participantes del tráfico en entornos urbanos sobre imágenes RGB y nubes de puntos 3D. Jornadas de Automática 45, Málaga, Spain. DOI: 10.17979/ja-cea.2024.45.10870

Moreau, J., Ibanez-Guzman, J., 2023. Emergent Visual Sensors for Autonomous Vehicles. IEEE Transactions on Intelligent Transportation Systems 24 (5), 4716–4737. DOI: 10.1109/TITS.2023.3248483

NVIDIA, 2023. CUDA-BEVFusion. https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/tree/master/CUDA-BEVFusion.

Philion, J., Fidler, S., 2020. Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D. Computer Vision–ECCV: 16th European Conference, Glasgow, UK, Proceedings, Part XIV16, 194–210. DOI: 10.1007/978-3-030-58568-612

Redmon, J., Divvala, S., Girshick, R., Farhadi, A., 2016. You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 779–788. DOI: 10.1109/CVPR.2016.91

Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H., 2020. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 10526–10535. DOI: 10.1109/CVPR42600.2020.01054

Song, Z., He, Z., Li, X., Ma, Q., Ming, R., Mao, Z., Pei, H., Peng, L., Hu, J.,Yao, D., et al., 2023. Synthetic datasets for autonomous driving: A survey. IEEE Transactions on Intelligent Vehicles 9 (1), 1847–1864. DOI: 10.1109/TIV.2023.3331024

Srivastav, A., Mandal, S., 2023. Radars for autonomous driving: A review of deep learning methods and challenges. IEEE Access 11, 97147–97168. DOI: 10.48550/arXiv.2306.09304

Urmila., O., Megalingam, R. K., 2020. Processing of LiDAR for Traffic Scene Perception of Autonomous Vehicles. International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 298–301. DOI: 10.1109/ICCSP48568.2020.9182175

Vora, S., Lang, A. H., Helou, B., Beijbom, O., 2020. PointPainting: Sequential Fusion for 3D Object Detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 4604–4612. DOI: 10.1109/CVPR42600.2020.00466

Wang, Y., Mao, Q., Zhu, H., Deng, J., Zhang, Y., Ji, J., Li, H., Zhang, Y., 2023. Multi-Modal 3D Object Detection in Autonomous Driving: a Survey. International Journal of Computer Vision 131 (8), 2122–2152. DOI: 10.1007/s11263-023-01784-z

Xu, D., Anguelov, D., Jain, A., 2018. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 244–253. DOI: 10.1109/CVPR.2018.00033

Yang, B., Li, J., Zeng, T., 2025. A Review of Environmental Perception Technology Based on Multi-Sensor Information Fusion in Autonomous Driving. World Electric Vehicle Journal 16 (1). DOI: 10.3390/wevj16010020

Downloads

Published

2025-09-01

Issue

Section

Robótica