Embodied conversational agents in verbal and non-verbal communication
by David House (KTH - Royal Institute of Technology, Sweden)

In face-to-face communication both visual and auditory information play an obvious and significant role. Traditionally in phonetic research the auditory effects of speech production have been the primary object of study. However, when it comes to the non-verbal aspects of speech communication the primary nature of acoustics is not as evident, and understanding the interactions between visual expressions, dialogue functions and the acoustics of the corresponding speech presents a substantial challenge. Some of the visual articulation is for obvious reasons closely related to the speech acoustics (e.g. movements of the lips and jaw), but there is other articulatory movement affecting speech acoustics that is not visible on the outside of the face. On the other hand, many facial gestures used for communicative purposes do not affect the acoustics directly, but might nevertheless be connected on a higher communicative level in which the timing of the gestures could play an important role. The context of much of our research regarding these questions is to be able to create an animated talking agent capable of displaying realistic communicative behavior and suitable for use in conversational spoken language systems.

Useful applications of talking heads include aids for the hearing impaired, educational software, audiovisual human perception experiments, entertainment, and high quality audiovisual text-to-speech synthesis for applications such as news reading. The use of the talking head aims at increasing effectiveness by building on the user's social skills to improve the flow of the dialogue. Visual cues to feedback, turntaking and signaling the system's internal state are key aspects of effective interaction.

The focus of this paper is to present an overview of some of the research involved in the development of audiovisual synthesis to improve the talking head. Some examples of results and applications involving the analysis and modeling of acoustic and visual aspects of verbal and non-verbal communication are presented.

GranstrÃ¶B., & House, D. (2007). Inside out - Acoustic and visual aspects of verbal and non-verbal communication (Keynote Paper). Proceedings of the 16th International Congress of Phonetic Sciences, SaarbrÃ¼ 11-18.

GranstrÃ¶B., & House, D. (2007). Modelling and evaluating verbal and non-verbal communication in talking animated interface agents. In Dybkjaer, l., Hemsen, H., & Minker, W. (Eds.), Evaluation of Text and Speech Systems (pp. 65-98). Springer-Verlag Ltd.

Beskow, J., GranstrÃ¶B., & House, D. (2007). Analysis and synthesis of multimodal verbal and non-verbal interaction for animated interface agents. In Esposito, A., Faundez-Zanuy, M., Keller, E., & Marinaro, M. (Eds.), Verbal and Nonverbal Communication Behaviours (pp. 250-263). Berlin: Springer-Verlag.

Björn Granström se incorporó al KTH en 1969, tras su graduación MSc en Ingeniería Electrónica. Tras realizar estudios posteriores en Fonética y Lingüística General en la Universidad de Estocolmo, logró el título de Doctor en Ciencias en el KTH en 1977 con la tesis “Perception and Synthesis of Speech". En 1987 reemplazó a Gunnar Fant como Professor in Speech Communication. Ha sido director del CTT (The Center for Speech Technology) desde sus comienzos en 1996. Granström tiene numerosas publicaciones en el el area de tecnologías del habla. El CTT participa activamente en numeros proyectos europeos. Actualmente sus intereses de investigación incluyen los sistemas de comunicación multimodal utilizando agentes conversacionales.

Conferencia invitada 2