Subproject 1

Deep learning for speech restoration from facial movement biosignals

SP1 Description

DeepRESTORE aims to investigate the use of Silent Speech Interfaces (SSI) to restore communication in people who have been deprived of the ability to speak. Silent Speech Interfaces are devices that capture non-acoustic biological signals generated during the speech production process and use them to predict the delivered message. While SSIs have been primarily investigated in the context of speech recognition ( Silent – Speech -to-Text), this project will also investigate direct speech synthesis techniques, thereby directly generating the corresponding speech waveform.

Of the two biological signals to be investigated in the coordinated project, subproject 1 will capture two biosignals produced by movements of the speech-producing apparatus: electrical signals generated by the muscles of the face and neck involved in speech production ( sEMG signals ) and captured video images of the face. Using a set of sensors located on the face and throat, and a camera, and without the use of acoustic signals, the SSI device will decode these signals into the corresponding text (EMG-to-text) or acoustic (EMG-to-speech) message. For this purpose, algorithmic techniques based on deep learning will be used.

Although silent speech interfaces can be used in other contexts (such as in a security context, to maintain privacy on a telephone), our project focuses on providing speech to people who have undergone total laryngectomy surgery . After a period of intensive learning, these people typically regain a so-called esophageal voice, the characteristics of which are markedly different from healthy speech. Since they still retain control over the speech articulators, silent speech data reflecting the articulator movements can be captured and converted into artificial speech.

sEMG- based SSI devices can significantly improve the quality of life of these people. During the project, the existing databases in Spanish will be completed with more data in other languages (English) that will be available to the research community. In addition, the use of SoA Deep Neural Networks will be further explored, contributing to new learning architectures. The project will be carried out in collaboration with international experts in the field of silent speech, and will collaborate with the Association of Laryngectomees of Bizkaia not only for the collection of data, but also and more importantly for the evaluation and validation of the techniques developed.

Goals

Multi -language database for multilingual video-based EMG and SSI studies for English and Spanish

EMG+video- to-text system for Spanish

Video+EMG- based speech generation system for Spanish

Develop novel techniques for cross-language adaptation and learning transfer in EMG+video- based speech generation systems .