Wave Arts VQE  1.00
Voice Quality Enhancement
Wave Arts VQE

Introduction

Wave Arts VQE (Voice Quality Enhancement) is a software library implementing acoustic echo cancellation for voice over IP (VoIP) applications. The principal features are:

An application integrating with VQE would use the VQE API to open audio devices and stream audio, then all echo cancellation and other signal processing is built-in. It's cross-platform (Windows and Mac OS-X for now).

If you are using PJSIP, there is integration code that implements a PJSIP audio device using VQE, you need only enable the VQE audio driver in the PJSIP configuration.

There is also a standalone VQE demo application with JUCE user-interface which demonstrates VQE and provides additional example code. The demo application plays some speech that asks you questions, records your answers, and then plays back the echo canceled recording of your answers. The application has a built-in user guide.

Wave Arts VQE is released under a dual GPL/commercial license. Developers can download the source and test drive, or use it in GPL open source apps, and if you want to use it in a closed source application you pay for a commercial license.

Please see the VQE feature description for more information about the signal processing features of VQE. This doxygen guide will focus on how to use VQE.

VQE Quick Start Guide

This section will give a brief overview on how to integrate VQE into your application. Remember that VQE will be used for all audio device discovery and streaming. First you need to create a Wave Arts Audio Device object:

        #include "WaVQE.h"

        // make the WaAudioDev object
        WaAudioDev *wad = WaAudioDev::WadMake(0);
        // initialize API and discover devices
        wad->Init();

WadMake will create an audio device object using an appropriate audio API for the platform you are using. So for example if you are on Windows 7, WadMake will make a WASAPI object, on Win XP, WadMake will make a MME object, and on Mac OS-X, WadMake will make a CoreAudio object.

The wad is used to discover devices and can be used to create an audio stream. A wad can create a record stream from an input device, a playback stream to an output device, or a bidirectional stream to a pair of input and output devices.

As an example of device discovery, the following code will print all the input and output devices:

        WadDevInfo devInfo;
        int i, numDev;

        wad->GetNumDevices(&numDev);
        for (i = 0; i < numDev; i++) {
                wad->GetDevInfo(i, &devInfo);
                printf("device %d: name '%s' numInChan %d numOutChan %d\n",
                        i, devInfo.name, devInfo.numInChan, devInfo.numOutChan);
        }

Most WaAudioDev methods return integer WadStatus where 0 (WAD_OK) is success and non-zero is failure. These examples do not show error checking for brevity.

Once you have a wad object, you can create a VQE object. The following code initializes the VQE parameter structure to sensible defaults and creates a VQE object:

        VQEParam vqeParam;
        WaVQE::GetDefaultParam(&vqeParam);
        WaVQE *vqe = new WaVQE(wad, &vqeParam);

Now we can open a pair of devices for streaming. To do this we first initialize a WadParam struct which specifies the streaming devices and format.

        WadParam param;
        LibClear(&param, sizeof(param));
        param.priority = WadPriorityMax;
        // recording
        param.inDev = 0;
        param.inFmt.sampRate = 16000;
        param.inFmt.sampFmt = WadFloat32;
        param.inFmt.numChan = 1;
        param.inFmt.framesPerBuf = 320;
        // playback
        param.outDev = 1;
        param.outFmt = param.inFmt;

The above code clears the parameter struct and fills in values to record from device 0 and playback to device 1, indices determined during device discovery. Both recording and playback will be at 16 kHz sampling rate, in 32-bit float format, with 320 sample buffers (20 msec). Now we can open the devices and start streaming:

        // open the devices
        int status = vqe->Open(&param, InCallback, OutCallback, NULL);
        if (status != WAD_OK) {
                printf("can't open devices, status %d: '%s'\n",
                        status, wad->GetErrorText());
        }
        // start streaming
        vqe->Start();

The Open call takes callback functions which are called to deliver input samples to the application and to fetch output samples for playback, plus an optional pointer argument passed to the callbacks. Open creates a single I/O thread which issues the callbacks. The callback definition is below:

typedef int WadCallbackFn(void *buf, WadStreamFormat *fmt, void *arg);

The callback is called with a pointer to the sample buffer, the format description provided to the Open call, and the optional pointer argument. The input callback should read the samples from the buffer, the output callback should fill the buffer with samples to play. The callback should return zero to stop streaming, or non-zero to continue streaming. When done streaming, close the stream and delete the VQE and wad objects:

        vqe->Close();
        delete vqe;
        delete wad;

Those are the principal steps to creating an echo-cancelled audio stream. The section below describes more about the VQE parameters and runtime options. The reader is also encouraged to look at the following topics:

VQEParam - VQE parameter structure

WaVQE - Audio device port that implements VQE functions

WaAudioDev - Audio device abstraction

WaAudioDevPort - Audio device port abstraction

AecCalibrator - Device calibration

Debugging features of VQE

VQE contains various means for debugging. First there is a logging facility, described in WaLog.h, which outputs useful information to a stdio file pointer. Call WaLogSetLevel to set the log level from 0 (fatal errors) to 3 (fine detail). You can redirect logs to another facility using WaLogSetLogFn.

VQE contains a dumpFile option that will record all audio streams to files for later analysis and possible replay. The raw recording prior to aec is dumped in the file "in0.wav", the recording after echo cancellation is dumped to the file "in1.wav", the far signal is dumped in "out0.wav" and the playback signal after processing by aec is dumped in "out1.wav".

VQE can also dump internal signal levels for later plotting by the MATLAB script "aecplot.m". Set the matlabDump option to enable this.

VQE will also dump the impulse response of the adaptive echo canceller filter if the dumpResponse option is enabled. The response is dumped to the file "resp0.wav".

It is also possible to open the dump files and run VQE by replaying the previously recorded audio. This is done by using the WadFile device in place of a real audio device.

The VQEDemo application is an ideal platform for debugging and development of VQE algorithms. It plays a far file and records the echo cancelled audio into a record file. It also has the capability to generate all of the above dump files, and to replay from previously recorded dump files. Hence one can generate a test recording, and then tweak an algorithm by replaying the same test recording.

Future work

Some principal areas for future development are listed below.

Elimination of latency calibration beep. The audio latency can be measured dynamically using the far signal. I've experimented a bit with cross-correlation of the signals and cross-correlation of the signal envelopes and results were promising.

Improved handling of FIFO underruns and overruns. Right now VQE maintains signal continuity of both input and output streams, even if they are running with large sample rate offsets. However, the reference (far) input to the AEC will either have buffers discarded or repeated, and this can create noticeable artifacts. It would be better to use a wsola algorithm like pjsip.

Dynamic estimation of sample rate offset. This is a really hard problem. There is one published algorithm that adapts to the offset, but it is fairly slow adapting. Strategies that count buffers or sample position are not very accurate and hence require a long time to converge.

Increased efficiency and/or tail length of MB FRLS filter. This is pretty simple: increase the number of bands and decimation factor. The cost will be increased signal latency.

Multichannel audio support. The multi-band FRLS adaptive echo canceller is already designed and tested for multichannel support. However, the audio infrastructure above this is currently mono only. It's therefore relatively straightforward to make VQE a true MIMO (multiple in, multiple out) echo canceller. Multichannel will require addition of channel decorrelation.

Linux/ALSA support.

 All Classes Files Functions Variables Typedefs Enumerations Enumerator Defines