Components of a Dialogue Content

Components of a dialogue content.

This page briefly describes what a dialogue content consists of in MMDAgent-EX. See the File Format secion for details of each definition files.


A “dialogue contents” is a set of files that defines various component of a dialogue system. it consists of a set of files for dialogue system, including a content-specific user dictionary for speech recognition, a voice model for speech synthesis, 3-D models, images, text, motions, and dialogue scenarios. By creating these, you can construct any voice conversation / speech interaction.

MMDAgent-EX is compatible with MMDAgent, so that contents for MMDAgent also works with MMDAgent-EX.

A brief list of included files are listed below. See the Tutorial section how to modify them, and Files section for their detailed file formats.

  • Start-up file (.mdf)
  • Dialogue system components
    • Dialogue script (.fst)
    • 3-D models and motions (.pmd, .vmd)
    • TTS Voice model (.htsvoice)
    • Open JTalk setting file (.ojt)
    • Other assets (images etc.)
  • Additional resources for each module
    • Recognition word dictionary (.dic)
    • Rapid word dictionary (.rapiddic)
    • Julius JConf file (.jconf)
    • Button definitions (BUTTON*.txt)
  • Package definition
    • Package description (PACKAGE_DESC.txt)
    • Description text (README.txt)


The default dialogue management module of MMDAgent-EX is a simple graph-based dialogue manager written in OpenFST format with messages for its input and output:

  • Input: messages from modules
  • Output: messages that will be sent to module

A simple example of FST scenario is as follows. Assume the state to start with state number 0. The first line defines that “When the current state number is 0 and a message RECOG_EVENT_STOP|hello comes, output nothing (<eps> means no output) and go to state number 10”. In state number 10, since its accepting message is <eps>, it immediately outputs a message MOTION_ADD|mei|greet|greet.vmd and goes to state 11. The state 11 also immediately outputs SYNTH_START|mei|normal|hi, and goes to state 12. In the state 12, it will wait until a message SYNTH_EVENT_STOP|mei comes, and when it comes, it outputs no message and goes to state 0.

 0     10    RECOG_EVENT_STOP|hello   <eps>
10     11    <eps>                    MOTION_ADD|mei|greet|greet.vmd
11     12    <eps>                    SYNTH_START|mei|normal|hi
12      0    SYNTH_EVENT_STOP|mei     <eps>

MMDAgent-EX and its plugin modules will output various messages while running, and also accepts messages thrown by other modules. See the Message Reference page to see all kind of messages.

Speech Recognition

Speech Synthesis

3-D models and motions

You can use any PMD models and most of PMX models, and their motion files for MikuMikuDance in MMDAgent-EX. The CG rendering part is fully compatible with MikuMikuDance (MMD).

MikuMikuDance is a free, lightweight software that lets users to create 3D animated movies. The MMD format has a proper level of expression capability that is enough for a modern virtual agent expression, with cartoon-like rendering and physics simulation. It’s adequate capability, expressiveness and availablity was the key for us to adopt its format as agent-based spoken dialogue system.

The original MMDAgent supports only PMD, but MMDAgent-EX can render PMX models. However, you should convert PMX to PMD and CSV perior to use. See PMX file format document for details.

Be careful on licensing issue. When you are going to use a MMD model or motions obtained on the net, please take care of the license which may be set by the authors. For historical reasons, many MMD materials are intended to be shared only for fandoms of MMD, and (re-)distributing it outside the MMD community is commonly not welcomed.

Pay attention to the readme files included in the archives! Even if it is written in Japanese, you have a good translater!

Related links:

Sound / Music

You can play any sound in mp3 and wav format. Place the sound file in the content and use SOUND_START|filename message to start playing it. mp3 and wav formats are supported at all platforms. Other audio format can be available per OS:


You can either give images as background or floor image, or give a PMD stage model.


FLoating image / text

Put any text or image onto the scene, or open a text document file in full screen.