Project
Decoding speech directly from brain signals is a rapidly advancing yet highly challenging field at the intersection of neuroscience, AI, and assistive communication. While invasive neuroprosthetic systems using intracranial recordings have achieved impressive speech reconstruction performance in clinical populations, their surgical risks, limited scalability, and heavy training requirements highlight the need for non-invasive alternatives such as EEG, MEG, and fMRI. However, current non-invasive work remains limited, often focusing on low-level motor features and monolingual settings. The proposed project addresses these gaps by leveraging semantic representations, multilingual modeling (Spanish–Basque), self-supervised learning, and explainable AI to build more robust, interpretable, and scalable brain-to-speech systems, combining multimodal neuroimaging and generative modeling to advance both scientific understanding and real-world assistive applications.
SP1 investigates the decoding of speech-related brain activity from MEG recordings, with the goal of reconstructing both spoken and written language from non-invasive neural signals. Experiments will span four conditions: listening, overt articulation, silent articulation, and imagined speech. Data acquisition will take place at the facilities of SP3, leveraging their MEG infrastructure, participant recruitment protocols, and technical expertise. A central methodological challenge is the phoneme-level annotation of brain activity in conditions without acoustic output (e.g., imagined or silent speech). To address this, SP1 will develop and evaluate a combination of articulatory modeling, motor trajectory inference, and statistical alignment techniques to estimatelikely phoneme onsets. Decoding pipelines will initially focus on command identification—a controlled classification task—before progressing to open-ended decoding, where outputs are reconstructed as continuous speech or text. For this, SP1 will explore transformer-based architectures such as Wav2vec2, HuBERT, or Whisper for audio generation, and large language models such as GPT and LLaMA variants for text generation. Methodological development will be closely coordinated with SP2, allowing for architectural comparisons and cross-application of models across neural data collected from different recording modalities. Particular attention will be paid to cross-condition generalization and interpretability. This integrative approach enables the study of shared and condition-specific neural representations of speech and supports the development of realistic, non-invasive BCIs capable of generating intelligible language in both spoken and textual forms.
Hitz Center
(University of the Basque Country)
Eva Navas
SP2 focuses on developing robust neural decoding models that integrate semantic knowledge and explainability. Neural data will be collected using three techniques: sEEG, EEG, and fMRI. sEEG data will be obtained from patients implanted with deep electrodes for clinical monitoring at the Epilepsy Surgery Unit of the Hospital Universitario Virgen de las Nieves in Granada (HUVN), while EEG and fMRI will be recorded from healthy, Spanish-speaking adults at the premises of the CIMCYC research centre at the University of Granada. Experimental tasks will be designed in collaboration with SP3 to target speech production and perception across varying linguistic complexities, including phoneme articulation, common AAC commands, phonetically balanced sentences, and semantic-level tasks such as picture naming and image description. Multilingual tasks will also be implemented in collaboration with partners in the Netherlands to support the construction of the Spanish-Dutch corpus. The datasets will be anonymized and prepared for open sharing. Based on these data, SP2 will develop, with the involvement of SP1, decoding models using self-supervised learning on large-scale unlabelled EEG and sEEG, followed by fine-tuning for speech and language tasks. Generative models such as latent diffusion and encoder-decoder architectures will be conditioned on neural activity to produce speech or text outputs. Finally, explainable AI techniques will be integrated, including saliency-based methods and feature attribution tools, to interpret model decisions. This will improve model transparency, support clinical trust, and provide insight into the contribution of specific brain regions and neural patterns to phonological and semantic processing, as well as neural dynamics during those cognitive tasks.
University of Granada
Hospital Universitario Virgen de las Nieves
José A. Gonzalez
The goal of SP3 is to investigate the extent to which semantic networks overlap across languages in bilingual individuals. To this aim, the team will decode perceived speech from MEG and fMRI signals using a transformer-based multimodal contrastive model.
The neuroimaging data will be collected from 20 balanced Basque–Spanish bilingual subjects (right-handed young adults), who will listen to 15-minute speech passages recorded in both Basque and Spanish. Participants will complete four MEG sessions and four fMRI sessions, each lasting approximately 2 hours. This approach will ensure a substantial amount of neural data in both languages and across the two neuroimaging modalities. Notably, an extra fMRI session will be used to administer two functional localizer tasks (a motor/articulatory localizer and a language-processing localizer) which will help identify brain regions involved in speech production and language processing, thereby facilitating the interpretation of the neural signals recorded during the main task. All neural data will be preprocessed using state-of-the-art analysis pipelines.
To investigate the relationship between neural activity and speech meaning, the team will develop computational models trained on two complementary sources of information: (a) the speech stimuli, from which contextualized semantic representations will be extracted via pre-trained language models such as GPT-2, XLM-R, mT5, and LaBSE; and (b) the MEG and fMRI data collected during the listening task.
The team will therefore map the semantic features extracted from speech stimuli to the associated brain signals. To integrate the distinct spatiotemporal characteristics of MEG and fMRI, SP3 will implement advanced fusion strategies, including cross-modal transformers, shared latent space models (e.g., Deep CCA, multimodal VAEs), and contrastive learning frameworks inspired by models such as CLIP. After mapping the stimuli representations to neural data, the team will examine the extent to which neural responses elicited by the two languages reflect the activation of shared semantic networks. In particular, SP3 will assess whether such networks can be leveraged to support cross-linguistic brain-to-speech decoding. This analysis will allow SP3 to evaluate the degree to which the neural organization of semantic information generalizes across languages in bilingual individuals, as well as the potential feasibility of multilingual brain–computer interfaces.
Finally, SP1 and SP2 will contribute decoding pipelines and modelling tools that can be leveraged to address these objectives, as well as AI explainability methods that will facilitate the interpretation of the learned neural–semantic representations.
BCBL Basque Center on Cognition, Brain and Language
Nicola Molinaro




