Content Overview

Components of a dialogue content.

This page briefly describes what a dialogue content consists of in MMDAgent-EX. See the File Format secion for details of each definition files.


A “dialogue contents” is a set of files that defines various component of a dialogue system. it consists of a set of files for dialogue system, including a content-specific user dictionary for speech recognition, a voice model for speech synthesis, 3-D models, images, text, motions, and dialogue scenarios. By creating these, you can construct any voice conversation / speech interaction.

A brief list of files in a dialogue content are shown below. Note that files marked as [*] should be located in the same folder. See Files section for their formats in details.

  |- Startup / Configuration file (.mdf) [*]
  |- Dialogue scenario script (.fst) [*]
  |- Recognition word dictionary (.dic) [*]
  |- Rapid word dictionary (.rapiddic) [*]
  |- Julius JConf file (.jconf) [*]
  |- Open JTalk setting file (.ojt) [*]
  |- Button definitions (BUTTON0.txt - BUTTON9.txt) [*]
  |- Package description (PACKAGE_DESC.txt) [*]
  |- Description text (README.txt) [*]
  +- (SubDirectories)
      |- 3-D models (.pmd)
      |- Motions (.vmd)
      |- TTS Voice model (.htsvoice)
      |- Background/Floor (images)
      |- Sound / Music files (sound files)
      |- Stage models (.pmd)
      |- Other assets (images, text files, etc.)

Startup / Configuration file (.mdf)

A text file containing system configurations and parameters. Open this file by MMDAgent or MMDAgent-EX to start this content. This file is required for all contents. See its file format page for full list of configurable parameters.

# example of .mdf file

Dialogue Scenario script (.fst)

A text file containing dialogue management definition, written in OpenFST format. See the reference page how to write it.

# example of .fst file
 0     10    RECOG_EVENT_STOP|hello   <eps>
10     11    <eps>                    MOTION_ADD|mei|greet|greet.vmd
11     12    <eps>                    SYNTH_START|mei|normal|hi
12      0    SYNTH_EVENT_STOP|mei     <eps>

Speech recognition setting files (.dic, .rapiddic, .jconf)

.dic file is an optional user dictionary for Julius speech recognizer. Writing task-specific words in this file will make MMDAgent-EX recognize those words more. See the reference page for details.

# example of .dic file
<unk> @1.0 <unk> [MMDAgent] e m u e m u d i: e: j e N t o
<unk> @2.0 <unk> [おっはー] O q h a:

.jconf file is an optional configuration file for Julius speech recognizer. You can give Julius any configuration parameters in addition to system default. See all options available on Julius.

## example of .jconf file
# set lower audio trigger level threshold
-lv 120
# set duration time to reject too long input
-rejectlong 6000

Speech synthesis setting files (.ojt, .htsvoice, etc.)

Definition files for “Open JTalk” speech synthesis module. They are required to do speech synthesis. The .ojt file defines voice names and configuration parameters. See the file format page how to set up a voice parameters in MMDAgent-EX.

## example of .ojt file
# number of voices
# voice names
# number of speaking styles
# speaking style names, interpolation weight, and synthesis parameter
mei_voice_normal   1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.52 1.0
mei_voice_angry    0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  1.1 -0.5  0.52 1.1
mei_voice_bashful  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  1.0  0.5  0.52 0.9
mei_voice_happy    0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  1.1  1.5  0.52 1.0
mei_voice_sad      0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  1.0 -0.5  0.52 0.9
mei_voice_fast     1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  2.0  1.0  0.52 1.0
mei_voice_slow     1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.5  1.0  0.52 1.0
mei_voice_high     1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  4.0  0.52 1.0
mei_voice_low      1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0 -2.0  0.52 1.0

Voice model files (.htsvoice) should be prepared trained from speech corpus by HTS and place it anywhere in the content. (MMDAgent-EX does not have default voice definitions in its distribution)

3-D models and motions (.pmd, .vmd)

You can use any PMD models and most of PMX models, and their VMD motion files for MikuMikuDance in MMDAgent-EX. The CG rendering part is fully compatible with MikuMikuDance (MMD).

MikuMikuDance is a free, lightweight software that lets users to create 3D animated movies. The MMD format has a proper level of expression capability that is enough for a modern virtual agent expression, with cartoon-like rendering and physics simulation. It’s adequate capability, expressiveness and availablity was the key for us to adopt its format as agent-based spoken dialogue system. Here are related links you can find more information:

The original MMDAgent supports only PMD, but MMDAgent-EX can render PMX models. However, you should convert PMX to PMD and CSV perior to use. See PMX file format document for details.

Stage (image, .pmd)

Stage image (background and floor), or stage 3-D model can be used to set up the scene behind the agent. You can either give background and floor images, or give a PMD stage model, by STAGE command message inside dialogue scenario. The size of background and floor can be changed by stage_size parameter in .mdf file.

Here is an example message of setting / changing stage. See reference for details.

STAGE|(bitmap file name for floor),(bitmap file name for back)
STAGE|(stage file name, .xpmd or .pmd)

Camera (parameter or .vmd)

You can give camera position by CAMERA message, or camera movement as motion VMD file made in MikuMikuDance, inside dialogue scenario.

Here is an example message of changing or start moving the camera position. See reference for details.

CAMERA|(camera motion file name)

Sound / Music (.mp3, .wav, etc.)

mp3, wav and other format is supported. Place the sound file in the content and use SOUND_START|filename message to start playing it in the dialogue scenario.

mp3 and wav formats are always supported at all platforms. MMDAgent-EX just calls audio APIs on each OS to play a sound, so available audio format depends on the API it uses. Here is a list of sound APIs that MMDAgent-EX uses:

Here is an example message that makes MMDAgent-EX play a sound. See messsage description how to use it in details.

SOUND_START|(sound alias)|(sound file name)

Raw image / text (image, .txt)

You can put any text or image in the scene, or open a text document file in full screen. Use TEXTAREA_ADD and TEXTAREA_SET message to display short text or image in the 3-D scene. Reference is here.

TEXTAREA_ADD|(textarea alias)|(width,height)|(size,margin,exlinespace)|r,g,b,a|r,g,b,a|x,y,z
TEXTAREA_SET|(textarea alias)|(text)

You can also show content of a text file at full screen in the middle of dialogue scenario and force user to respond by INFOTEXT messages. Text file should be in UTF-8.



You can show a prompt dialogue in the middle of dialogue scenario and give users a chance to respond by tap or click using PROMPT message.

Buttons on screen (BUTTON*.txt)

You can configure optional buttons to be displayed on the screen, and define action what to execute when they are tapped. The definition files are BUTTON0.txt to BUTTON9.txt.

Package info (PACKAGE_DESC.txt)

It is recommended that you propery define package information in PACKAGE_DESC.txt. to deal it more correctly and fancy in MMDAgent-EX. See here for details.


If README text is prepared on the text, it will be displayed at the first launch of the content and after some update has been detected. The file name of the README should be given in the PACKAGE_DESC.txt file The character encoding should be UTF-8.

Last modified January 13, 2021: Updated content (5da0ab2)