How to speak

Tips to make speech recognition work efficiently.

This page is still under development.

The current system only has Japanese models for speech processing, so the default contents are for Japanese language only. We actually has an English speech model assets already, and are now working to make English version.

After launch, MMDAgent-EX is always listening the audio input and ready to recognize a speech. MMDAgent-EX fully incorporates open-source speech recognition engine Julius. This page explains basic usage of speech recognition in MMDAgent-EX.

The ASR engine Julius is embedded in MMDAgent-EX. All models that works with Julius can be used in MMDAgent-EX. See the related section to see how to change speech recognition setting or adapt to another language.

Check audio input level

The audio input volume and status are shown as circle indicator at the left bottom of the screen.

The circle size indicates input volume of live audio. When no sound input is detected, the circle will be drawn in blue. When some sound is detected and being processed by engine, the circle changes to yellow. While a valid speech input is detected and speech recognition process is running, a wide orange circle will be shown.

Audio indicator at left bottom. (1) Blue indicates no sound is detected, (2) yello means some sounds are detected and being processed by the engine. When speech recognition is running to output some result, (3) the wide orange circle will be drawn.

Scaling input volume

MMDAgent-EX tries to detect user’s utterance automatically, but sometimes it does not work well in noisy environment. In such case, adjusting scaling factor of input volume will helps.

Input volume scaling widget will appear by swiping from left edge, or press “a” key, or open menu and tap “Input Volume” at the top.

Showing input volume scale control widget at the left of the screen.

The scaling factor of audio input can be controlled by moving the handle. The horizontal bar indicates default point (scaling factor = 1.0). Moving up the slider amplifys the input, and moving down will descrease the volume. Set the handle to the lowest position sets the scaling factor to 0.0, which means all the inputs are muted.

Note that it is a soft volume control, just scales the captured audio data in MMDAgent-EX after recording. MMDAgent-EX does not access to the hardware volume or change the mixer setting of your device. You should also check the actual audio input volume on your device.

If too loud or too small on desktop OS (Win/Mac/Linux), you should also check for your default audio input device if audio stream can be property captured.

Pause and resume speech recognition

You can stop and resume ASR engine manually from menu.

Pause menu. Tap this to pause ASR engine, and tap again to resume.

Last modified May 7, 2020: Re-organizing further (4b20b5d)