Speech technologies for functional diversity

The group has developed a text-to-speech conversion system that, especially for Basque, has reached a high level of diffusion in the Basque Country. This project proposes the investigation of techniques based on neural networks, in order to obtain a TTS of a higher quality than the current one.
Additionally, it is also intended to improve the quality of the personalized voices, investigating strategies to carry out the adaptation of the voices using neural networks. The ability to personalize the voice is especially important when it comes to a voice donor with pathology: the objective is to obtain synthetic voices that represent and identify the donor, so that they can use it integrated into their alternative communication device.
On the other hand, one of the greatest limitations suffered by people who have a speech pathology is the difficulty of being understood by automatic voice recognition systems. The project will investigate different strategies, based on the use of deep neural networks to convert the signals in a way that improves their intelligibility, especially when compared to ASR systems.

Automatic subtitling and search for spoken and written terms in multilingual audiovisual resources

The group has a large amount of audio and text data of the parliamentary sessions of the Basque Parliament, in Spanish and Basque. These data offer the new group the possibility of improving both the acoustic models already available and the language models of the previously developed recognition system that is being used by other technological agents (dictionary of the Elhuyar Foundation) in the Basque Country. In this new phase, it is intended to carry out the necessary steps so that said data can give rise to a continuous speech recognizer for Basque that can be made available to other technological agents for commercialization. A 3-year database of parliamentary sessions is developed from which approximately 4 hours with high-quality labeling will be drawn. This part will be made available to the research community through the data distribution agencies (LDC, ELRA). It is also intended to develop a prototype of automatic subtitling of parliamentary sessions capable of detecting speaker changes and language changes, and it should properly align the audio using both manual transcripts (obtained from the official minutes of the sessions) and the automatic transcripts obtained. by automatic speech recognition.
Groups will also collaborate in the field of automatic information extraction, as both have experience in this field. The general objective is to improve both the voice and text search of the currently available systems, mainly incorporating deep neural networks in the process of extraction of the characteristics (phonetic posteriors or BNFs).

Detection and classification of noise in vehicles
The group’s research in this line arises from the collaboration with Mercedes-Benz Vitoria with a vision very close to the application of technologies to the manufacturing process and quality of the plant.
Within its procedures for controlling the sound quality of vehicles the presence of hum, squeaking and rattling noises (Buzz, Squeak and Rattle, BS & amp; R) is decisive and for this reason we direct the line of research towards the application of neural networks to the detection of these sound events.
The most important challenge is the application of these detection systems in an environment real driving, which is where the BS&R nuisance noises occur. The automatic system has to be extraordinarily robust to operate under very unfavorable signal-to-noise conditions in highly variable environments: different vehicle types, different driving conditions and different noise sources.
The objective is to create an automatic system hearing system (machine listening) that is integrated into the plant’s quality control and functional analysis procedures, and that facilitates the detection, classification and location of undesirable noises that affect the sound quality of the vehicle.