Gemma Meseguer: Speaker Diarization in Broadcast Audio using NVDIA NeMO Models

Student: Gemma Meseguer Castillo
Advisors: Christoforos Souganidis, Eva Navas Cordón and Inma Hernáez Rioja
Thesis Defense Date: 03/07/2025

Within the field of Speech Processing, the task of Speaker Diarization has gained a more important role in recent years. Frameworks such as NVIDIA NeMo offer configurations for Speaker Diarization focused on domains such as telephone conversations or recorded meetings. However, the broadcast domain is not as well supported.
This work focuses on whether NVIDIA NeMo can be effectively used for Broadcast Speaker Diarization by exploring in particular the modules of the Voice Activity Detection (VAD) and Multi-scale Diarization Decoder (MSDD) models; and what factors can influence the performance of this system.
Five experiments have been performed combining all possibilities of VAD models and diarizers under two configurations: general and telephonic. The best result using only NeMo models was the combination of the fine-tuned Frame-VAD model with fine-tuned MSDD model under the telephonic configuration (45.86%). The best combination among all experiments was the Pyannote Segmentation Model as an external VAD with the clustering diarizer, also in telephonic configuration (38.80%). This result was further improved to 23.57% after post-processing. Finally, a statistical analysis confirmed that the television genre and the number of speakers significantly influence the performance of the Speaker Diarization system.

Previous post Xabier de Zuazo: #neural2speech: Decodificación del habla a partir de la actividad cerebral Next post Mariana Flores: Personalized Text-to-Speech Voice Generation for Mexican Spanish Using Neural TTS Models

(no title)
24 June, 2026
Presenting our work at Odyssey 2026 in Lisbon
(no title)
8 June, 2026
https://aholab.ehu.eus/aholab/summer-course-deep-learning-for-speech-processing/
(no title)
11 May, 2026
https://aholab.ehu.eus/aholab/shape-the-future-of-speech-ai/
(no title)
31 March, 2026
HiTZ zentroak ahotsa euskaraz ezagutu eta sintetizatzeko eredu ireki berriak argitaratu ditu https://www.ehu.eus/eu/web/campusa/-/hitz-zentroak-ahotsa-euskaraz-ezagutu-eta-sintetizatzeko-eredu-ireki-berriak-argitaratu-ditu El centro HiTZ publica nuevos modelos abiertos de reconocimiento y síntesis de voz en euskera https://www.ehu.eus/es/web/campusa/-/hitz-zentroak-ahotsa-euskaraz-ezagutu-eta-sintetizatzeko-eredu-ireki-berriak-argitaratu-ditu @hitz-zentroa.bsky.social
(no title)
23 March, 2026
The BrAIn2Lang website is now online. This project explores how speech and language can be decoded from brain activity, bringing together neuroimaging and speech technologies. aholab.ehu.eus/brain2lang/
(no title)
12 February, 2026
We’re organizing a Special Session on Speech & Language Technologies in Healthcare at #Odyssey2026 (Lisbon) From voice-based diagnosis to assistive and inclusive communication technologies — research meeting real clinical impact. Submit by March 15 https://odyssey2026.inesc-id.pt/speech-and-language-technologies-in-healthcare/ Join us!
(no title)
11 February, 2026
Gorabehera baten ondorioz, web zerbitzu batzuk ez dabiltza ondo. Konpontzen ari gara. Barkatu. Due to an incident, some web services are not working properly. We’re fixing it. Sorry. Por una incidencia, algunos servicios web no funcionan correctamente. Estamos trabajando en ello. Disculpad.
(no title)
4 February, 2026
Santa Ageda bezpera dugu! Goazen kantari! Entzun nahi duzue bizkaieraren fonotekan daukagun herri literatura? l.eus/5n7kqica Hona hemen adibide bat! l.eus/hmwkluwl
(no title)
30 January, 2026
Publiko egin da EMG-Voc ReSSint Database datu-basea ELRAren bidez The EMG-Voc ReSSint Database has been made publicly available through ELRA. Se ha hecho pública a través de ELRA la base de datos EMG-Voc ReSSint Database https://islrn.org/resources/057-914-072-202-4/ https://catalog.elra.info/en-us/repository/browse/ELRA-S0498/
(no title)
14 January, 2026
Presentando en el Congreso Internacional de Fonética Experimental, CIFE X, en la Universidad de Córdoba. uco.congressus.es/cife2026/