The 10th International Summer Workshop on Multimodal Interfaces

Title: Controllable and Adaptable Statistical Parametric Speech Synthesis Systems

Speaker: Mark Gales

Date: Wednesday, 25th of June 2014[DELAYED]

One of the advantages of statistical parametric speech synthesis approaches over concatenative schemes is that it is straightforward to modify the synthesis system to change, for example, the speaker characteristics or emotional state of the generated speech. This talk will review some of the approaches that have been used to train and modify the parameters of an HMM-based speech synthesiser to achieve this goal. There are a number of ways that the target speech configuration can be specified: manually; from some external agent or information; or to reproduce the emotions or characteristics from a limited quantity of audio. Schemes based on average-voice models (AVMs), cluster adaptive training (CAT) and combination of the schemes will be described that support these situations. In addition applications of these schemes to: building synthesis systems from diverse data; e-book reading; and building a video-realistic expressive talking head will be discussed.

Back