Project Title: BUCEADOR: Information search engine for multilingual audiovisual contents.
Finnaced by: MICINN
Project manager: Eva Navas
Participants: TALP (UPC), GPS (Uni. Vigo) , Aholab (UPV/EHU)
Team: Eva Navas, Inma Hernáez, Iñaki Sainz, Ibon Saratxaga, Daniel Erro, Iñaki Gaminde, Jon Sánchez, Juanjo Igarza
Begin date: Jan-2010
Final date: Dec-2012.
For more information about the project please visit the project homepage
Abstract:
BUCEADOR is a project focused on advanced research in all core Spoken Language Technologies (SLT), (diarization, speech recognition, speech machine translation, and text-to-speech conversion), research in both, cross-language and speech-oriented information retrieval technologies for voice search applications, and the successful joint integration of all of them in a multilingual and multimodal information retrieval system.
The goal of the project is to achieve improvements in all the SLT components and voice search applications to improve human-machine and human-to-human communication among all the official languages spoken in Spain as well as between these languages and English.
The project will obtain research advances in each mentioned technology. Examples of such approaches are exploring new techniques for the diarization of speeches, incorporating confidence measures for unconstrained conversational speech in automatic speech recognition, integrating linguistic knowledge in the statistical approach to spoken language translation, new acoustic and prosodic models for generating expressive speech in synthesis, and implementing new strategies for voice search information retrieval. It is planned that the project will participate in European and American (NIST) competitive evaluation systems.
In order to show the achievements in the above mentioned technologies and their successful joint integration, the project will create a show case consisting of a search engine for multilingual audiovisual contents. Specifically, broadcast news from several TV, radio, and internet channels in all the official Spanish languages (Spanish, Catalan, Basque and Galician) plus English will be utilized.