Reconocimiento facial en informativos televisivos mediante redes convolucionales profundas

Autores/as

DOI:

https://doi.org/10.17979/ja-cea.2025.46.12046

Palabras clave:

Procesamiento de imágenes, Redes neuronales, Aprendizaje máquina, Técnicas de inteligencia artificial, Visión por computador

Resumen

Este trabajo propone un sistema de inteligencia artificial basado en redes neuronales profundas que permite la detección y reconocimiento de personas concretas en imágenes extraídas de informativos televisivos. Para ello, se ha creado un conjunto de datos (dataset) que consta de 12800 imágenes, centrado principalmente en figuras políticas de ámbito nacional. El sistema propuesto realiza la detección del individuo en la escena de manera automática utilizando la red YOLOv8 y, posteriormente, realiza su reconocimiento a partir del clasificador que proporcione mayor certidumbre. Para ello, se compararon siete arquitecturas de red neuronal convenientemente adaptadas a esta problemática concreta: VGG-16, VGG-19, InceptionV3, Xception, ResNet-101, MobileNetV2 y DenseNet-169, siendo este último el modelo que obtiene en promedio un mejor desempeño en todas las pruebas realizadas. Los resultados confirman la viabilidad del sistema y permiten sentar las bases para futuras investigaciones.

Referencias

Asensi-González, R., 2024. Reconocimiento del rostro humano en imágenes de informativos televisivos mediante redes convolucionales profundas, Trabajo de Fin de Máster en Investigación en Ingeniería de Software y Sistemas Informáticos, Universidad Nacional de Educación a Distancia, Madrid.

Bledsoe, W. W., 1963. A study to determine the feasibility of a simplified face recognition machine. Panoramic Research, Inc. Palo Alto, California.

Bledsoe, W. W., 1964. Facial recognition project. Panoramic research, Inc. Palo Alto, California.

Bledsoe, W. W., 1966. Man-machine facial recognition: report on a large-scale experiment. Technical Report PRI 22, Panoramic Research, Inc. Palo Alto, California.

Boutrus, F., Damer, N., Fang, M., Kirchbuchner, F. Kuijper, A., 2021. MixFaceNets: extremely efficient face recognition networks. IEEE International Joint Conference on Biometrics (IJCB), pp. 1-8. DOI: 10.1109/IJCB52358.2021.9484374

Chen, S., Liu, Y., Gao, X, Han, Z., 2018. MobileFaceNets: efficient CNNs for accurate real-time face verification on mobile devices. In: Zhou, J., et al. Biometric Recognition. CCBR 2018. Lecture Notes in Computer Science. Vol. 10996. Springer, Cham, pp. 428-438. DOI: 10.1007/978-3-319-97909-0_46

Chollet, F., 2017. Xception: deep learning with depthwise separable convolutions. arXiv. DOI: 10.48550/arXiv.1610.02357

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., 2009. ImageNet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, pp. 248-255. DOI: 10.1109/CVPR.2009.5206848

Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A, 2010. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision 88, 303–338. DOI: 10.1007/s11263-009-0275-4

Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A, 2015. The pascal visual object classes challenge: a retrospective. International Journal of Computer Vision 111, 98-136. DOI: 10.1007/s11263-014-0733-5

Fukushima, K., 1980. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36, 193–202. DOI: 10.1007/BF00344251

Girshick R., Donahue, J., Darrell, T., Malik, J., 2013. R-CNN rich feature hierarchies for accurate object detection and semantic segmentation. arXiv. DOI: 10.48550/arXiv.1311.2524

Girshick, R., 2014. Fast R-CNN. IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 1440-1448, DOI: 10.1109/ICCV.2015.169

Goldstein, A.J, Harmon, L. D., Lesk, A.B., 1971. Identification of human faces. In: Proceedings of the IEEE, vol. 59, no. 5, pp. 748-760. DOI: 10.1109/PROC.1971.8254

Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J., 2016. MS-Celeb-1M: A dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (Eds) Computer Vision – ECCV 2016. Lecture Notes in Computer Science, vol. 9907, Springer, Cham, pp 87–102. DOI: 10.1007/978-3-319-46487-9_6

He, K, Zhang, X, Ren, A., Sun, J., 2015. Deep residual learning for image recognition. arXiv. DOI: 10.48550/arXiv.1512.03385

He, K., Gkioxari, G., Dollár, P., Girshick, R., 2017. Mask R-CNN. IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 2980-2988. DOI: 10.1109/ICCV.2017.322

Howard, A., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T. Andreetto, A., 2017. MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv. DOI: 10.48550/arXiv.1704.04861

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q., 2017. Densely connected convolutional networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2261-2269. DOI: 10.1109/CVPR.2017.243

Huang, G. B., Ramesh, M., Berg, T., Learned-Miller. E., 2007. Labeled faces in the wild: a database for studying face recognition in unconstrained environments. University of Massachusetts, Amherst, Technical Report 07-49.

Jocher, G., Qiu, J., Chaurasia, A, 2023. Ultralytics YOLO (Version 8.0.0). https://github.com/ultralytics/ultralytics (Accedido 30 abril 2025).

Krizhevsky, A., Sutskever, I. Hinton, G.E., 2012. ImageNet classification with deep convolutional neural networks. Neural Information Processing Systems, 25. DOI: 10.1145/3065386.

LeCun, Y., Boser, B., Denker, J. S., Howard, R. E., Habbard, W., Jackel, L. D., Henderson, D., 1990. Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems 2. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 396-404. DOI: 10.5555/109230.109279

Li, J., Wang, Y., Wan, C., Tai, Y., Qian, J., Yang, J., Wang, C., 2019. DSFD: dual shot face detector. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 5055-5064. DOI: 10.1109/CVPR.2019.00520

Lin, T. -Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., Dollár, P., 2015. Microsoft COCO: common objects in context. arXiv. DOI: 10.48550/arXiv.1405.0312

Nech, A., Kemelmacher-Shlizerman, I., 2017. Level playing field for million scale face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 3406-3415. DOI: 10.1109/CVPR.2017.363

Pajares, G., Herrera, P. J., Besada, E., 2021. Aprendizaje profundo. RC Libros Editorial, Madrid.

Ren S., He K., Girshick, R., Sun J., 2015. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 29th International Conference on Neural Information Processing Systems (NIPS'15), Vol. 1. MIT Press, Cambridge, MA, USA, pp. 91–99. DOI: 10.5555/2969239.2969250

Schroff, F., Kalenichenko, D., Philbin, J., 2015. FaceNet: a unified embedding for face recognition and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 815-823. DOI: 10.1109/CVPR.2015.7298682

Simonyan, K. Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR 2015), San Diego, pp. 1-14. DOI: 10.48550/arXiv.1409.1556

Sirovich, L., Kirby, M., 1987. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America 4, 519-524. DOI: 10.1364/JOSAA.4.000519

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., 2014. Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 1-9. DOI: 10.1109/CVPR.2015.7298594

Tang, X., Du, D. K., He, Z, Liu, J., 2018. PyramidBox: a context-assisted single shot face detector. In: 15th European Conference on Computer Vision (ECCV 2018), Munich, Germany, Proceedings, Part IX. Springer-Verlag, Berlin, Heidelberg, pp. 812-828. DOI: 10.1007/978-3-030-01240-3_49

Turk, M., Pentland, A., 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience 3, 71-86. DOI: 10.1162/jocn.1991.3.1.71

Viola, P, Jones, M., 2001. Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), Kauai, HI, USA, pp. I-I. DOI: 10.1109/CVPR.2001.990517

Wolf, L., Hassner, T., Maoz, I., 2011. Face recognition in unconstrained videos with matched background similarity. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, CO, USA, pp. 529-534. DOI: 10.1109/CVPR.2011.5995566

Zeiler, M., Fergus, R., 2014. Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (Eds), 13th European Conference on Computer Vision (ECCV 2014), Lecture Notes in Computer Science, vol 8689, Springer, Cham. DOI: 10.1007/978-3-319-10590-1_53

Descargas

Publicado

01-09-2025

Número

Sección

Visión por Computador