About Audio API

Notes about audio API and parameters used in MMDAgent-EX.

This page describes audio API and its setting parameters on MMDAgent-EX.

Android

When running on Android 8.0 and later, MMDAgent-EX uses the new Android C API called AAudio for audio recording and playing. On older devices where AAudio is not available, OpenSL ES for Android is used instead. We are using Google Oboe library to support both APIs.

The setup parameters are like below. On AAudio, speech recognition profile is experimentally set as input preset. There is currently no detailed description about this preset, but it seems that it at least sets the lowest latency on most devices, and enables auto-gain control (AGC) on some devices. Additional speech processing may be added depending on Google’s future specification changes.

   oboe::AudioStreamBuilder builder;
   builder.setFormat(oboe::AudioFormat::I16);
   builder.setPerformanceMode(oboe::PerformanceMode::LowLatency);
   builder.setInputPreset(oboe::InputPreset::VoiceRecognition);
   builder.setContentType(oboe::ContentType::Speech);

iOS

AudioUnit framework is used in MMDAgent-EX. Apple’s internal Voice processing feature is experimentally enabled for input stream. Although no detailed description is available what kind of processing is included, it seems that it enables auto gain control (AGC), noise supression and echo cancellation.

   AudioComponentDescription ioUnitDescription;
   ioUnitDescription.componentType          = kAudioUnitType_Output;
   ioUnitDescription.componentSubType       = kAudioUnitSubType_VoiceProcessingIO;
   ioUnitDescription.componentManufacturer  = kAudioUnitManufacturer_Apple;

Windows

For speech recognition and synthesis, DirectSound API is used.

You can choose the audio device to open by setting environment variable “PORTAUDIO_DEV” or “PORTAUDIO_DEV_NUM”. The list of available auduio devices are displayed at startup log of MMDAgent-EX, like this:

id [desc1: desc2]
id [desc1: desc2]
...

Set env “PORTAUDIO_DEV” to the string of desc1: desc2 of the target device, or set the env “PORTAUDIO_DEV_NUM” to the id number of the target device before launch.

For sound playing by SOUND_START command, Media Control Interface (MCI) is used.

MacOS

The Core Audio interface is used at Mac OS. The default device as set in audio profile will be opened.

Linux

The ALSA audio interface is used on Linux. The default device as set in audio profile will be used.


最終更新 2021.01.08: copied en to ja (adefa8b)