Contenido principal del artículo

Javier Borau Bernad
Laboratorio de Sistemas Inteligentes, Universidad Carlos III de Madrid
España
https://orcid.org/0009-0009-5623-1688
Álvaro Ramajo Ballester
Laboratorio de Sistemas Inteligentes, Universidad Carlos III de Madrid
España
https://orcid.org/0000-0001-9425-9408
José María Armingol Moreno
Laboratorio de Sistemas Inteligentes, Universidad Carlos III de Madrid
España
https://orcid.org/0000-0002-3353-9956
Núm. 45 (2024), Visión por Computador
DOI: https://doi.org/10.17979/ja-cea.2024.45.10737
Recibido: may. 13, 2024 Aceptado: jul. 1, 2024 Publicado: jul. 12, 2024
Derechos de autor

Resumen

En los últimos años, los avances en Deep Learning y Visión por Computador han impulsado el desarrollo de algoritmos de detección monocular aplicados a la gestión y seguridad del tráfico urbano, con el objetivo de optimizar la recolección de datos en entornos urbanos para las ciudades inteligentes del futuro. Sin embargo, estos esfuerzos han estado predominantemente enfocados en la extracción de datos desde la perspectiva del vehículo, pasando por alto las ventajas que ofrece el uso de cámaras instaladas en la infraestructura. Este artículo se centra en el estudio de la obtención de datos tridimensionales del tráfico desde esta perspectiva alternativa, aprovechando un punto de vista superior para evitar oclusiones y obtener información más precisa sobre el tamaño y la posición de los vehículos. Así, esta investigación propone un nuevo enfoque metodológico para la integración de sistemas de visión por computador basados en infraestructuras, aplicados a los Sistemas Inteligentes de Transporte.

Detalles del artículo

Citas

Caesar, H., Bankiti, V., Lang, A. H., Vora, S., Liong, V. E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O., June 2020. nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: https://doi.org/10.1109/CVPR42600.2020.01164

Chen, Y., Tai, L., Sun, K., Li, M., June 2020. Monopair: Monocular 3d object detection using pairwise spatial relationships. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: https://doi.org/10.1109/CVPR42600.2020.01211

Creß, C., Zimmer, W., Strand, L., Fortkord, M., Dai, S., Lakshminarasimhan, V., Knoll, A., 2022. A9-dataset: Multi-sensor infrastructure-based dataset for mobility research. In: 2022 IEEE Intelligent Vehicles Symposium (IV). pp. 965–970. DOI: 10.1109/IV51971.2022.9827401 DOI: https://doi.org/10.1109/IV51971.2022.9827401

Geiger, A., Lenz, P., Stiller, C., Urtasun, R., 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 1231 – DOI: 10.1177/0278364913491297 DOI: https://doi.org/10.1177/0278364913491297

Geiger, A., Lenz, P., Urtasun, R., 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, 3354–3361. DOI: 10.1109/CVPR.2012.6248074 DOI: https://doi.org/10.1109/CVPR.2012.6248074

He, K., Zhang, X., Ren, S., Sun, J., 2015. Deep residual learning for image recognition. DOI: https://doi.org/10.1109/CVPR.2016.90

Li, Z., Jia, J., Shi, Y., 2023. Monolss: Learnable sample selection for monocular 3d detection. DOI: https://doi.org/10.1109/3DV62453.2024.00088

Liao, Y., Xie, J., Geiger, A., 2023. Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (3), 3292–3310. DOI: 10.1109/TPAMI.2022.3179507 DOI: https://doi.org/10.1109/TPAMI.2022.3179507

Liu, X., Xue, N., Wu, T., 2021. Learning auxiliary monocular contexts helps monocular 3d object detection. DOI: https://doi.org/10.1609/aaai.v36i2.20074

Liu, Z., Wu, Z., T’oth, R., 2020. Smoke: Single-stage monocular 3d object detection via keypoint estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 4289–4298. DOI: 10.1109/CVPRW50498.2020.00506 DOI: https://doi.org/10.1109/CVPRW50498.2020.00506

Ma, X., Zhang, Y., Xu, D., Zhou, D., Yi, S., Li, H., Ouyang, W., June 2021. Delving into localization errors for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4721–4730. DOI: https://doi.org/10.1109/CVPR46437.2021.00469

Mao, J., Niu, M., Jiang, C., Liang, H., Chen, J., Liang, X., Li, Y., Ye, C., Zhang, W., Li, Z., Yu, J., Xu, H., Xu, C., 2021. One million scenes for autonomous driving: Once dataset.

MMDetection3D Contributors, 2020. MMDetection3D: OpenMMLab nextgeneration platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d.

Patil, A., Malla, S., Gang, H., Chen, Y.-T., 2019. The h3d dataset for fullsurround 3d multi-object detection and tracking in crowded urban scenes. In: 2019 International Conference on Robotics and Automation (ICRA). pp. 9552–9557. DOI: 10.1109/ICRA.2019.8793925 DOI: https://doi.org/10.1109/ICRA.2019.8793925

Ramajo-Ballester, A., de la Escalera Hueso, A., Armingol Moreno, J. M., 3D Object Detection for Autonomous Driving: A Practical Survey. In: 9th International Conference on Vehicle Technology and Intelligent Transport Systems. pp. 64–73. DOI: 10.5220/0011748400003479 DOI: https://doi.org/10.5220/0011748400003479

Shi, X., Ye, Q., Chen, X., Chen, C., Chen, Z., Kim, T.-K., October 2021. Geometry-based distance decomposition for monocular 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 15172–15181. DOI: https://doi.org/10.1109/ICCV48922.2021.01489

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., Vasudevan, V., Han, W., Ngiam, J., Zhao, H., Timofeev, A., Ettinger, S., Krivokon, M., Gao, A., Joshi, A., Zhang, Y., Shlens, J., Chen, Z., Anguelov, D., June 2020. Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: https://doi.org/10.1109/CVPR42600.2020.00252

Wang, T., Zhu, X., Pang, J., Lin, D., 2021a. Fcos3d: Fully convolutional onestage monocular 3d object detection. DOI: https://doi.org/10.1109/ICCVW54120.2021.00107

Wang, T., Zhu, X., Pang, J., Lin, D., 2021b. Probabilistic and geometric depth: Detecting objects in perspective.

Wilson, B., Qi, W., Agarwal, T., Lambert, J., Singh, J., Khandelwal, S., Pan, B., Kumar, R., Hartnett, A., Pontes, J. K., Ramanan, D., Carr, P., Hays, J., 2023. Argoverse 2: Next generation datasets for self-driving perception and forecasting.

Xiao, P., Shao, Z., Hao, S., Zhang, Z., Chai, X., Jiao, J., Li, Z., Wu, J., Sun, K., Jiang, K., Wang, Y., Yang, D., 2021. Pandaset: Advanced sensor suite dataset for autonomous driving. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). pp. 3095–3101. DOI: 10.1109/ITSC48978.2021.9565009 DOI: https://doi.org/10.1109/ITSC48978.2021.9565009

Ye, X., Shu, M., Li, H., Shi, Y., Li, Y., Wang, G., Tan, X., Ding, E., June 2022. Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21341–21350. DOI: https://doi.org/10.1109/CVPR52688.2022.02065

Yu, F., Wang, D., Darrell, T., 2017. Deep layer aggregation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2403–2412. DOI: 10.1109/CVPR.2018.00255 DOI: https://doi.org/10.1109/CVPR.2018.00255

Yu, H., Luo, Y., Shu, M., Huo, Y., Yang, Z., Shi, Y., Guo, Z., Li, H., Hu, X., Yuan, J., Nie, Z., June 2022. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21361–21370. DOI: https://doi.org/10.1109/CVPR52688.2022.02067