Using Mary TTS

  Mary TTS is an open-source, multilingual (emotional) Text-to-Speech Synthesis platform written in Java, maintained by   DFKI.

You can include MaryXML speech commands in Elckerlyc by using the BMLT description level extension. Elckerlyc supports three description level extensions for using MaryXML in <Speech> behaviors. You can control the exact pronunciation of Speech elements by directly using Mary commands in MARYRAWXML, WORDS, and ALLOPHONES format. Other formats are quite easy to implement too, on request.

This document describes:

  • how to use these formats in BML requests
  • how to easily obtain versions of speech in the various formats, and how to add other information such as prosody or pauses.

Obtaining MaryTTS

Clearly, you can only use MaryXML speech if you have installed MaryTTS and are using one of the MaryTTS voices in your configuration of your virtual human. Downloads and documentation are found at the   MaryTTS web page. Informatino on selecting MaryTTS as speech generator for your virtual human can be found in the VirtualHumanSpec? documentation.

A short example

To send MaryXML format content in a bml <speech> behavior you need to use description level elements (see   the BML standard on description levels).

The format is like this:

<bml id="bml1>
<speech id="s1" start="0">
  <text>Specification of the text without <sync id="sync1"/>MaryXML markup</text>
  <description priority="1" type="[t]">
  [...maryxml content... contains <mark name="sync1"/> ]

[t] can be one of: maryxml | marywords | maryallophones

[maryxml content] then needs to contain data in the format specified by [t].

See below for the requirements to this content!

NOTE: When you want to use sync marks, you need to add them both in the basic text as <sync id="..."/>, and in the maryxml as <mark name="..."/>

In order to integrate speech/prosody features into a BML script, use MaryXML: start Maryserver c:\hmisvn\MaryTTS\bin\maryserver.bat start MaryEditor? in browser window: Select the proper voice First window: change type to text and insert the text you want to change Second window: change type to marywords Change first window type to marywords and copy-paste content of second window to first Second window: change type to audio Copy paste everything in between <bml><speech><text></text> <description priority=”1” type=”marywords”>

</description> </speech></bml>

  • delete: <?xml … utf-8>, <voice=> and </voice>

For short breaks use <boundary breakindex=”1-6”/> For changing wordlength use <prosody rate=”%”></prosody>

