Audio & Sound Playback
Audio file playback is provided by Plugin_Audio, and speech lip-sync playback is provided by Plugin_Remote. Make sure these plugins are enabled when using these features.

Audio & Sound Playback #

Using music and sound effects effectively can create richer interactions. MMDAgent-EX can play audio files, allowing you to play background music or trigger sound effects for events.

MMDAgent-EX also supports playing pre-recorded (or synthesized) audio with character lip-sync. This can be used to have the character respond with pre-recorded audio or speak audio files generated by another TTS engine.

Below we explain how to play sound files and how to play audio files with lip-sync.

Preparation #

MMDAgent-EX plays sound through the system’s default audio output device. Make sure the device you want to use for audio playback is set as the default output device.

On macOS and Linux, install sox beforehand.

macOS:

brew install sox

Linux:

sudo apt install sox

Playing Sounds #

You can play audio files (.wav, .mp3, etc.) using the SOUND_START message.

Supported formats

Windows: Supports .wav and .mp3 by default. To play other formats you may need to install appropriate codecs or drivers.

macOS, Linux: Playback uses sox’s play command and supports most audio formats including .wav and .mp3.

When playback starts, a SOUND_EVENT_START is emitted.

SOUND_START|(sound alias)|(sound file name)
SOUND_EVENT_START|(sound alias)

Use the SOUND_STOP message to stop a playing sound.

SOUND_STOP|(sound alias)

When audio playback finishes (or is stopped), SOUND_EVENT_STOP is emitted.

SOUND_EVENT_STOP|(sound alias)
If you don’t hear sound, check your default audio output device. Playback uses sox’s play command, but you can specify a different command via environment variables.

Speech Playback with Lip-sync #

Preparation #

Lip-sync maps phoneme information extracted from the audio file to blends of mouth shapes (e.g., “a”, “i”, “u”, “o”). Therefore, you must predefine which morphs on the model correspond to those mouth shapes in a .shapemap file. The distributed model already includes this configuration for its bundled model, but for other models you will need to create it yourself.

To create it, save a text file named xxx.pmd.shapemap in the same folder as the model file xxx.pmd. Specify the morph names for LIP_A through LIP_O. In NOLIP, list, comma-separated, all other morphs that open the mouth. The file must be UTF-8 encoded.

#### Morph names for lip sync
LIP_A a
LIP_I i
LIP_U u
LIP_O o
#### List of morph names to reset to 0 during lip sync
#### Specify all mouth-opening morphs not specified above
NOLIP e, oo~, Wa, eh, ah, ii, shout, ah-le, mouth smile

Playback #

After startup, play an audio file with lip-sync using the SPEAK_START message. (model alias) specifies the model alias to lip-sync, and (audio file) is the audio file to play. When playback starts, SPEAK_EVENT_START is emitted.

SPEAK_START|(model alias)|(audio file)
SPEAK_EVENT_START|(model alias)

Audio files can be in formats supported by libsndfile (e.g., .wav, .mp3): https://libsndfile.github.io/libsndfile/formats.html

When playback ends, SPEAK_EVENT_STOP is emitted.

SPEAK_EVENT_STOP|(model alias)

To stop playback mid-way, use SPEAK_STOP. When issued and the audio is stopped (or already stopped), SPEAK_EVENT_STOP is emitted.

SPEAK_STOP|(model alias)

Playback sampling rate and timing drift (v1.0.4) #

Until v1.0.3, SPEAK_START playback with lip-sync converted audio to 16 kHz mono for playback. From v1.0.4, playback is handled by Plugin_Audio, so audio is played at the original sampling rate, improving quality.

However, because lip-sync and audio playback are now processed on separate threads, their start timing may drift depending on the environment. To revert SPEAK_START behavior to pre-v1.0.3 behavior, add the following to your .mdf:

Plugin_Remote_Speak_16k=true