Setting up the Julius Voice Recognition Engine

Setting up the Julius Voice Recognition Engine #

Plugin_Julius is a plugin that provides voice recognition functionality using the Julius voice recognition engine. It is characterized by its compact operation. Below, we explain the settings, messages, and how to use this plugin.

.mdf Configuration #

Plugin_Julius_conf, Plugin_Julius_lang (Required)

The configuration name and language name of the voice recognition engine.

No default designation. By preparing a model and specifying these valid combinations in .mdf, Plugin_Julius is activated.

Combinations supported by the default model:

  • dnn, ja
  • dnn, en
  • gmm, ja
Plugin_Julius_conf=dnn
Plugin_Julius_lang=en

Plugin_Julius_wordspacing

Specify whether to separate words in the recognition result output.

  • no: Pack without putting anything between words (default for ja)
  • yes: Insert a space between words (default for languages other than ja)
  • comma: Insert a comma between words (compatible with old MMDAgent)
Plugin_Julius_wordspacing=yes

Plugin_Julius_logfile

Outputs the internal log of the Julius engine to a file.

Plugin_Julius_logfile=log.txt

show_caption

Displays subtitles. The voice recognition results are displayed on the left side of the screen and the voice synthesis content (the sentence given with SYNTH_START) is displayed on the right side. Set to false to disable it.

show_caption=true

Event Messages #

RECOG_EVENT_START

Output when voice input is detected.

RECOG_EVENT_START

RECOG_EVENT_STOP

Output when recognition results are obtained.

RECOG_EVENT_STOP|Recognition result sentence

RECOG_EVENT_OVERFLOW

Output when the input sound level is too high and causes an overflow.

RECOG_EVENT_OVERFLOW

RECOG_EVENT_MODIFY

Output when the processing of the RECOG_MODIFY message is complete.

RECOG_EVENT_MODIFY|GAIN
RECOG_EVENT_MODIFY|USERDICT_SET
RECOG_EVENT_MODIFY|USERDICT_UNSET
RECOG_EVENT_MODIFY|CHANGE_CONF|(jconf_file_prefix)

RECOG_EVENT_AWAY

Output when voice recognition is temporarily suspended (ON) or restarted (OFF) due to menu operations or external control.

RECOG_EVENT_AWAY|ON
RECOG_EVENT_AWAY|OFF

RECOG_EVENT_GMM

Output of identification result tag when using Julius’s environmental sound identification function.

RECOG_EVENT_GMM|noise

Command Messages #

RECOG_MODIFY

This is a command to modify engine settings. It dynamically changes the engine that is running.

  • GAIN: Amplitude scaling factor of the input voice (default 1.0)
  • USERDICT_SET: Load user dictionary (if it’s already loaded, it will be replaced)
  • USERDICT_UNSET: Delete user dictionary
  • CHANGE_CONF: Restart the engine with the specified jconf configuration file
RECOG_EVENT_MODIFY|GAIN|(scale)
RECOG_EVENT_MODIFY|USERDICT_SET|(dict_file_path)
RECOG_EVENT_MODIFY|USERDICT_UNSET
RECOG_EVENT_MODIFY|CHANGE_CONF|(jconf_file_prefix)

RECOG_RECORD_START

Starts automatic recording of the input voice. The cut-out voice fragments are sequentially saved as individual .wav files in the specified directory.

RECOG_RECORD_START|(directory)

RECOG_RECORD_STOP

Stops automatic recording of the input voice.

RECOG_RECORD_STOP

Synchronization of Audio Input Status #

During operation, in all display models, the morph values with the following names are automatically updated according to the state of the audio input (no change if there is no morph).

  • Morph “volume”: Volume value of audio input (0.0~1.0)
  • Morph “trigger”: 1.0 when the audio input is voice, 0.0 when it’s not

By using this, you can, for example, change the morph in sync with the input volume or switch the display according to the voice input ON/OFF, implementing interactivity.

In addition, the volume of the audio input is also set to the KeyValue value “Julius_MaxVol” at any time.

Customization #

Content Dictionary (.dic) #

You can expand your vocabulary by preparing a dictionary that defines unknown words. A dictionary for each content is placed within the content under a file name with the extension of the .mdf file changed to .dic (if it is foobar.mdf, it would be foobar.dic). Plugin_Julius will search for the above .dic file at startup and, if found, will read it in as an additional user dictionary.

Per Content Settings (.jconf) #

Similarly, if there is a file like foobar.jconf, Plugin_Julius will read it in as an additional configuration file. By using this, it is also possible to provide different Julius parameters and settings for each content.

Further Expansion such as Adding Models #

The latest original version of Julius is fully incorporated, allowing for complete customization. You can use all the features, models, and settings that are available with Julius. For example, by preparing a language model and an acoustic model for Julius in a certain language, you can add support for other languages.

When using customized models or dictionaries, please place the Julius configuration file in Release/AppData/Julius, with a filename of jconf_configurationname_languagename.txt. By specifying these configuration names and language names in .mdf, Plugin_Julius will launch with that configuration file.

When you want to use other engines #

Julius is a compact open-source speech recognition engine, but it was created with technology from a bygone era, so its model performance, noise resistance, and recognition accuracy, especially in noisy environments, are inferior to the latest speech recognition engines.

If you create a system in Python using a cloud speech recognition engine like Google STT or Whisper,

  • Operate as a submodule of MMDAgent-EX with Plugin_AnyScript
  • Collaborate with the separate process of MMDAgent-EX through the WebSocket feature

You can collaborate in these two ways. Please refer to the relevant documentation for each.

comments powered by Disqus