Aria Valuspa

Artificial Retrieval of Information Assistants – Virtual Agents with Linguistic Understanding, Social skills, and Personalised Aspects

ARIA – VALUSPA Platform 2.0 released

2017 started off well for the ARIA-VALUSPA project with the release of ARIA-VALUSPA Platform 2.0 (AVP 2.0), the second public release of the integrated behaviour analysis, dialogue management, and behaviour generation components developed as part of this EU Horizon 2020 project. The integrated Virtual Human framework will allow anyone to build their own Virtual Human Scenarios. Students are already using the framework to build a face recognition system that includes liveness detection, or to take questionnaires from people in a much more natural manner than asking people to fill in forms.

The AVP 2.0 can be downloaded from GitHub. It comes with installation instructions, and a tutorial for running the default interaction scenario.

The behaviour analysis component of the AVP 2.0 comes with integrated Automatic Speech Recognition in English, valence and arousal detection from audio, 6 basic emotion recognition from video, face tracking, head-pose estimation, age, gender, and language estimation from audio.

The default scenario of the dialogue manager presents Alice, the main character of the book ‘Alice’s Adventures in Wonderland’ by Lewis Caroll. You can ask her questions about herself, the book, and the author. You can of course also create your own scenarios, and we’ve created a tutorial with three different scenarios specifically aimed at getting new users started with this making their own systems.

The behaviour generation components come with emotional TTS created by Cereproc, and visual behaviour generation using the GRETA system. It uses standard FML and BML, and features lip-synched utterances. The behaviour generation component has a unique feature that allows an ongoing animation to be stopped, thus allowing the agent to be interrupted by a user, which makes interactions with it much more natural.

Another unique feature is the ability to record your interactions with the Virtual Humans. The framework stores raw audio and video, but also all predictions made by the analysis system (ASR, expressions, etc.), and in the near future it will also store all Dialogue Management and Behaviour Generation decisions, allowing you to replay the whole interaction. To simplify inspection and replay of an interaction, a seamless integration with the NoVa annotation tool is supported. NoVa is the new annotation tool developed as part of ARIA-VALUSPA to address shortcomings of existing multimedia annotation tools, as well as to provide integrated support for cooperative learning .

While the ARIA-VALUSPA Platform 2.0 presents a major step forwards in Virtual Human technology, we are always looking for ways to improve the system. Feature requests, bug reports, and any other suggestions can be logged through the GitHub issues tracker.

—————————————————————— Update ————————————————————————————

08. February 2017

A minor update to the ARIA-VALUSPA Platform for Virtual Humans has been released (AVP 2.1), containing mostly improved support for interruptions, logging of dialogue management actions, faster face tracking, and some bug fixes. Full release notes can be found here.

—————————————————————— Update ————————————————————————————
13. April 2017
An update to the ARIA-VALUSPA Platform for Virtual Humans has been released (AVP 2.2). Full release notes can be found here!

Expressive Speech Synthesis and Affective Information Retrieval

Human communication is rich, varied, and often ambiguous. This reflects the complexity and subjectivity of our lives. For thousands of years, art, music, drama and story telling have helped us understand, come to terms with, and express the complexities of our existential experience. Technology has long played a pivotal role in this artistic process, for example the role of optics in the development of perspective in painting and drawing, or the effect of film on story telling.

Information Technology has, and is having, an unprecedented impact both on our experience of life and our means of interpreting this experience. However the ability to harness this technology to help us understand, come to terms with, and mediate the explosion of electronic data, and electronic communication that now exists is generally limited to the mundane. Whereas the ability to get the height in metres of Everest is a trivial search request (8,848m by the way from a Google search), googling the question ‘What is love?’ returns (in the top four), two popular newspaper articles, a youtube video of Haddaway and a dating site. It is, of course, an unfair comparison. Google is not designed to offer responses to ambiguous questions with no definite answers. In contrast, traditional forms of art and artistic narrative have done so for centuries.

We might expect speech and language technology, dealing as it does with such a central form of human communication, to be at the forefront of applying technology to the interpretation of our ambiguous and multi-layered experience. In fact, much of the work in this area has avoided ambiguity and is often used as a tool to disambiguate information rather than as a means to interpret ambiguity. Take, for example, conversational agents (CAs): These are computer programs which allow you to speak to a device and will respond to you using computer generated speech. These systems can potentially harness the nuances of language and the ambiguity of emotional expression. However, in reality, we use them to ask them how high Everest is or where you can find a nearby pizza restaurant. Although the ability to deal with these requests is important if you are writing an assignment about Everest, or wanting to eat pizza, it raises the question of how we might extend such systems to help us interpret more complex aspects of the world around us. It is important for this technology to strive to do so for two fundamental reasons: firstly, technology has become part of our social life and as such this technology needs to be able to engender playfulness, and enrich our sense of experience, and secondly, applications which could perform a key role in mediating technology for social good require a means of interacting with users in much more complex social and cultural situations.

Conversation has a tradition as a pass time, as means of humour, as a means of helping people with their problems. However, the scope for artificial conversational agents to perform these activities is currently severely limited. In ARIA-VALUSPA we explore approaches that give conversational agents more subtle means of communicating, of becoming more playful, and representing the ambiguity in our social experience.

The technology required to do this requires close collaboration with engineers working on dialogue. CereProc Ltd, a key partner in ARIA-VALUSPA, is very active in developing techniques to make artificial voices (termed speech synthesis or text-to-speech synthesis, TTS) more emotional, expressive and characterful. These techniques include changing voice quality – for example making a voice sound stressed or calm – adding vocal gestures likes sighs and laughs,changing the emphasis from one word to another to alter the subtle meaning in a sentence, changing the rate of speech and the intonation to change how active the voice sounds, and even just making sure the voice doesn’t always say the word ‘yes’ the same way every time it
says it.

This key work on the way speech is produced can then be used to alter the perceived character of a conversational agent, or convey an internal state.

So perhaps, in the future when we ask a system ‘What is Love?’, perhaps its wistful voice will hint at past romance lost, the sense of longing for a human connection, and help us find answers which are not fixed, but emotional and depend on the different experiences we share as human beings.

Gartner identifies digital assistants as key to developing digital business opportunities

In the Top 10 strategic technology trends for 2016 and after, Gartner identified artificial intelligence and digital assistants as key to developing digital business opportunities. The market of Intelligent Virtual Assistants (IVA) is growing considerably. Having simple tools to create IVAs is a priority for companies that want to have a dedicated virtual assistant that they can easily modify and maintain independently. A leading-edge platform that automates the process of IVA creation, improvement, and sustainment has been developed by Aria-Valuspa’s partner Living Actor™.
Within the Aria-Valuspa project, Living Actor™ explores the contribution of emotions in human – machine interactions, focusing on animated avatars. More information can be found here.


A reporter from the WDR (West German Broadcasting Cologne) visited the Lab for Human-Centered Multimedia at Augsburg University, and Johannes Wagner presented the Alice agent to him. Alice does not just answer the user’s questions about “Alice in Wonderland”, but also responds to his or her emotions by analyzing voice and facial expressions. The synchronization of the multimodal signals is done by the award-winning SSI system developed by Augsburg University (