Using Mary TTS

  Mary TTS is an open-source, multilingual (emotional) Text-to-Speech Synthesis platform written in Java, maintained by   DFKI.

THe release of Elckerlyc contains a version of MaryTTS. You can include MaryXML speech commands in Elckerlyc by using the BMLT description level extension. Elckerlyc supports three description level extensions for using MaryXML in <Speech> behaviors. You can control the exact pronunciation of Speech elements by directly using Mary commands in MARYRAWXML, WORDS, and ALLOPHONES format. Other formats are quite easy to implement too, on request.

This document describes:

  • how to use these formats in BML requests
  • how to easily obtain versions of speech in the various formats, and how to add other information such as prosody or pauses.

Obtaining MaryTTS

Clearly, you can only use MaryXML speech if you have installed MaryTTS and are using one of the MaryTTS voices in your configuration of your virtual human. Downloads and documentation are found at the   MaryTTS web page. Informatino on selecting MaryTTS as speech generator for your virtual human can be found in the VirtualHumanSpec? documentation.

A short example

To send MaryXML format content in a bml <speech> behavior you need to use description level elements (see   the BML standard on description levels).

The format is like this:

<bml id="bml1>
<speech id="s1" start="0">
  <text>Specification of the text without <sync id="sync1"/>MaryXML markup</text>
  <description priority="1" type="[t]">
  [...maryxml content... contains <mark name="sync1"/> ]

[t] can be one of: maryxml | marywords | maryallophones

[maryxml content] then needs to contain data in the format specified by [t].

See below for the requirements to this content!

NOTE: When you want to use sync marks, you need to add them both in the basic text as <sync id="..."/>, and in the maryxml as <mark name="..."/>

Requirements to the MaryXML to prevent Mary from crashing

A few notes on things that are required to avoid Mary crashing :)

  • leave out the initial <?XML ... ?> tag
  • always insert a <phrase> element below every <s> element
  • <s> elements are needed to make Mary produce a coherent intonation
  • <mark name=, not <mark id=!!!!! this causes crashes if you do it wrong!
  • sometimes, newlines and spaces around the maryxml, s, p, and phrase tags are mandatory. So if mary crashes on your BML and you don't know why, try adding spaces and newlines in the XML content
  • <voice>: It is nicer to have Elckerlyc take care of voice selction, but you can still control it in Mary
  • don't forget to check that you only use the parameters that are actually supported by the voice that Elckerlyc is using!

Obtaining and modifying MaryXML versions of speech

Maybe you have a text, and you need the ALLOPHONES representation in order to modify it.

Use web server interface for the desired voice (see Mary's own documentation); input the text, convert to the desired ALLOPHONES (or other) representation;

Then take the result and put it within the <description> element in your BML (and specify the correct type for the description element, see above!)

