Reconocimiento facial en informativos televisivos mediante redes convolucionales profundas
DOI:
https://doi.org/10.17979/ja-cea.2025.46.12046Palabras clave:
Procesamiento de imágenes, Redes neuronales, Aprendizaje máquina, Técnicas de inteligencia artificial, Visión por computadorResumen
Este trabajo propone un sistema de inteligencia artificial basado en redes neuronales profundas que permite la detección y reconocimiento de personas concretas en imágenes extraídas de informativos televisivos. Para ello, se ha creado un conjunto de datos (dataset) que consta de 12800 imágenes, centrado principalmente en figuras políticas de ámbito nacional. El sistema propuesto realiza la detección del individuo en la escena de manera automática utilizando la red YOLOv8 y, posteriormente, realiza su reconocimiento a partir del clasificador que proporcione mayor certidumbre. Para ello, se compararon siete arquitecturas de red neuronal convenientemente adaptadas a esta problemática concreta: VGG-16, VGG-19, InceptionV3, Xception, ResNet-101, MobileNetV2 y DenseNet-169, siendo este último el modelo que obtiene en promedio un mejor desempeño en todas las pruebas realizadas. Los resultados confirman la viabilidad del sistema y permiten sentar las bases para futuras investigaciones.
Referencias
Asensi-González, R., 2024. Reconocimiento del rostro humano en imágenes de informativos televisivos mediante redes convolucionales profundas, Trabajo de Fin de Máster en Investigación en Ingeniería de Software y Sistemas Informáticos, Universidad Nacional de Educación a Distancia, Madrid.
Bledsoe, W. W., 1963. A study to determine the feasibility of a simplified face recognition machine. Panoramic Research, Inc. Palo Alto, California.
Bledsoe, W. W., 1964. Facial recognition project. Panoramic research, Inc. Palo Alto, California.
Bledsoe, W. W., 1966. Man-machine facial recognition: report on a large-scale experiment. Technical Report PRI 22, Panoramic Research, Inc. Palo Alto, California.
Boutrus, F., Damer, N., Fang, M., Kirchbuchner, F. Kuijper, A., 2021. MixFaceNets: extremely efficient face recognition networks. IEEE International Joint Conference on Biometrics (IJCB), pp. 1-8. DOI: 10.1109/IJCB52358.2021.9484374
Chen, S., Liu, Y., Gao, X, Han, Z., 2018. MobileFaceNets: efficient CNNs for accurate real-time face verification on mobile devices. In: Zhou, J., et al. Biometric Recognition. CCBR 2018. Lecture Notes in Computer Science. Vol. 10996. Springer, Cham, pp. 428-438. DOI: 10.1007/978-3-319-97909-0_46
Chollet, F., 2017. Xception: deep learning with depthwise separable convolutions. arXiv. DOI: 10.48550/arXiv.1610.02357
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., 2009. ImageNet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, pp. 248-255. DOI: 10.1109/CVPR.2009.5206848
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A, 2010. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision 88, 303–338. DOI: 10.1007/s11263-009-0275-4
Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A, 2015. The pascal visual object classes challenge: a retrospective. International Journal of Computer Vision 111, 98-136. DOI: 10.1007/s11263-014-0733-5
Fukushima, K., 1980. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36, 193–202. DOI: 10.1007/BF00344251
Girshick R., Donahue, J., Darrell, T., Malik, J., 2013. R-CNN rich feature hierarchies for accurate object detection and semantic segmentation. arXiv. DOI: 10.48550/arXiv.1311.2524
Girshick, R., 2014. Fast R-CNN. IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 1440-1448, DOI: 10.1109/ICCV.2015.169
Goldstein, A.J, Harmon, L. D., Lesk, A.B., 1971. Identification of human faces. In: Proceedings of the IEEE, vol. 59, no. 5, pp. 748-760. DOI: 10.1109/PROC.1971.8254
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J., 2016. MS-Celeb-1M: A dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (Eds) Computer Vision – ECCV 2016. Lecture Notes in Computer Science, vol. 9907, Springer, Cham, pp 87–102. DOI: 10.1007/978-3-319-46487-9_6
He, K, Zhang, X, Ren, A., Sun, J., 2015. Deep residual learning for image recognition. arXiv. DOI: 10.48550/arXiv.1512.03385
He, K., Gkioxari, G., Dollár, P., Girshick, R., 2017. Mask R-CNN. IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 2980-2988. DOI: 10.1109/ICCV.2017.322
Howard, A., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T. Andreetto, A., 2017. MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv. DOI: 10.48550/arXiv.1704.04861
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q., 2017. Densely connected convolutional networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2261-2269. DOI: 10.1109/CVPR.2017.243
Huang, G. B., Ramesh, M., Berg, T., Learned-Miller. E., 2007. Labeled faces in the wild: a database for studying face recognition in unconstrained environments. University of Massachusetts, Amherst, Technical Report 07-49.
Jocher, G., Qiu, J., Chaurasia, A, 2023. Ultralytics YOLO (Version 8.0.0). https://github.com/ultralytics/ultralytics (Accedido 30 abril 2025).
Krizhevsky, A., Sutskever, I. Hinton, G.E., 2012. ImageNet classification with deep convolutional neural networks. Neural Information Processing Systems, 25. DOI: 10.1145/3065386.
LeCun, Y., Boser, B., Denker, J. S., Howard, R. E., Habbard, W., Jackel, L. D., Henderson, D., 1990. Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems 2. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 396-404. DOI: 10.5555/109230.109279
Li, J., Wang, Y., Wan, C., Tai, Y., Qian, J., Yang, J., Wang, C., 2019. DSFD: dual shot face detector. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 5055-5064. DOI: 10.1109/CVPR.2019.00520
Lin, T. -Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., Dollár, P., 2015. Microsoft COCO: common objects in context. arXiv. DOI: 10.48550/arXiv.1405.0312
Nech, A., Kemelmacher-Shlizerman, I., 2017. Level playing field for million scale face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 3406-3415. DOI: 10.1109/CVPR.2017.363
Pajares, G., Herrera, P. J., Besada, E., 2021. Aprendizaje profundo. RC Libros Editorial, Madrid.
Ren S., He K., Girshick, R., Sun J., 2015. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 29th International Conference on Neural Information Processing Systems (NIPS'15), Vol. 1. MIT Press, Cambridge, MA, USA, pp. 91–99. DOI: 10.5555/2969239.2969250
Schroff, F., Kalenichenko, D., Philbin, J., 2015. FaceNet: a unified embedding for face recognition and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 815-823. DOI: 10.1109/CVPR.2015.7298682
Simonyan, K. Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR 2015), San Diego, pp. 1-14. DOI: 10.48550/arXiv.1409.1556
Sirovich, L., Kirby, M., 1987. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America 4, 519-524. DOI: 10.1364/JOSAA.4.000519
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., 2014. Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 1-9. DOI: 10.1109/CVPR.2015.7298594
Tang, X., Du, D. K., He, Z, Liu, J., 2018. PyramidBox: a context-assisted single shot face detector. In: 15th European Conference on Computer Vision (ECCV 2018), Munich, Germany, Proceedings, Part IX. Springer-Verlag, Berlin, Heidelberg, pp. 812-828. DOI: 10.1007/978-3-030-01240-3_49
Turk, M., Pentland, A., 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience 3, 71-86. DOI: 10.1162/jocn.1991.3.1.71
Viola, P, Jones, M., 2001. Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), Kauai, HI, USA, pp. I-I. DOI: 10.1109/CVPR.2001.990517
Wolf, L., Hassner, T., Maoz, I., 2011. Face recognition in unconstrained videos with matched background similarity. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, CO, USA, pp. 529-534. DOI: 10.1109/CVPR.2011.5995566
Zeiler, M., Fergus, R., 2014. Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (Eds), 13th European Conference on Computer Vision (ECCV 2014), Lecture Notes in Computer Science, vol 8689, Springer, Cham. DOI: 10.1007/978-3-319-10590-1_53
Descargas
Publicado
Número
Sección
Licencia
Derechos de autor 2025 Ricardo Asensi-González, Pedro Javier Herrera

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0.