Acoustic Algorithm Debugging

Last Updated on : 2026-04-09 06:22:39Copy for LLMView as MarkdownDownload PDF

Overview

This topic describes how to debug acoustic echo cancellation (AEC) and voice activity detection (VAD) in this example. These features are implemented in src/wukong/audio/wukong_audio_aec_vad.c, with their external interfaces defined in the header file wukong_audio_aec_vad.h. This version uses Speex AEC + AES + noise suppression and RNN VAD, without online serial port parameter tuning. Debugging primarily focuses on initialization parameters and VAD sensitivity levels, combined with capturing audio data for analysis.

Test tools

It is recommended to use the official Tuya serial debugging tool tyuTool for joint debugging and data capture.

Support GUI and CLI modes, communicating with the device via the serial port.
Send control commands (for example, start/stop recording, play test tones, and capture data from various channels) to facilitate comparison of pre-/post-AEC status and VAD performance.
For installation and basic usage, refer to its GitHub repository documentation. For AI debugging, use the debug ser or debug ser_auto mode.

Test commands (shared with acoustic testing)

In the serial debugging mode of tyuTool, common commands include (refer to the tool’s built-in help for specifics):

Command	Description
`start`/`stop`/`reset`	Start/Stop/Reset recording and processing
`dump 0`	Dump raw data of the microphone
`dump 1`	Dump reference loopback data
`dump 2`	Dump AEC output data (after echo cancellation)
`dump 4`	Dump VAD-related data
`volume <0-100>` / `micgain <0-100>`	Adjust playback volume and microphone gain

By using dump 0, dump 1, and dump 2, you can compare the MIC, REF, and AEC output to evaluate the echo cancellation effect. Combined with the VAD start/stop prints in the logs, you can analyze VAD behavior.

Recommended testing procedure

Connect the device: Use tyuTool to connect to the device’s serial port and enter serial debugging mode.
Reset and start: Send reset, then send start to put the device into recording and processing state.
Play test tone or speak naturally: Play bg 0, bg 1… as needed, or simply speak to the device. Observe the audio effect and logs.
Stop and capture: Send stop, then sequentially send dump 0, dump 1, and dump 2 to capture data from the MIC, REF, and AEC channels.
Offline analysis: Import the captured PCM data into tools like Audition or ocenaudio. Compare the waveforms and spectrograms before and after AEC to assess the effectiveness of echo suppression and the appropriateness of VAD.

Pre-debugging checklist (general)

Before debugging AEC/VAD performance, it is recommended to fix the following items to avoid environmental changes interfering with judgment.

Microphone model and gain: Fix the MIC model and its gain to prevent changes in input amplitude.
Speaker model and gain: Fix the speaker model and its volume to prevent changes in the echo path.
Product structure: Evaluate the algorithm only after the enclosure and acoustic cavity are finalized.

AEC and noise reduction debugging

Residual echo suppression (AES)

speex_aes_set_param(handle, value)

Meaning: value is an integer representing the strength of residual echo suppression. The default value in the current source code is 5.
Value suggestions:
- Larger value: Suppresses more residual echo, but interruption/duplex performance may degrade (near-end voice is more likely to be suppressed during double-talk).
- Smaller value: Suppresses less residual echo, preserves more near-end voice, but residual echo may increase.
Adjust this value in the source code based on subjective listening tests and recompile, finding a balance between "low residual echo" and "good interruption/duplex performance".

Noise reduction

speex_ns_set_param(handle, level1, level2)

Meaning: Both level1 and level2 are integers with no strict range limits.
- level1: Noise suppression strength. A larger value means stronger suppression.
- level2: Noise floor level. A smaller value indicates a lower noise floor, allowing more noise to be suppressed.
Tuning suggestions:
- High signal-to-noise ratio (SNR): Choose a large level1 and small level2.
- Low SNR: Choose a small level1 and large level2.
The source code currently uses speex_ns_set_param(handle, 8, 10). Adjust based on the actual environment in the source code and recompile.

Debugging recommendations

Complete the pre-debugging checklist first. Then, use subjective listening tests combined with comparing dump 0/1/2 data.
If residual echo is too high: Try increasing the speex_aes_set_param value. If the interruption/double-talk performance is poor: Try decreasing this value.
If noise suppression is insufficient or excessive: Adjust level1 and level2 in speex_ns_set_param according to the SNR.
Additionally, combine these with structural/gain adjustments such as reducing speaker volume or increasing the distance between the microphone and the speaker.

VAD debugging

Current implementation

Initialization is performed in wukong_aec_vad_init(), with the following parameters:
- min_speech_len_ms: Minimum valid speech duration (ms). A value too small might cause false triggers.
- max_speech_interval_ms: Maximum silence interval (ms). A timeout indicates the end of a sentence.
The wukong_vad_set_threshold(level) function selects the sensitivity level (see APIs below). Internally, this maps to RNN threshold values in dB:
- WUKONG_AUDIO_VAD_HIGH: - 40 dB, less prone to false triggering
- WUKONG_AUDIO_VAD_MID: - 50 dB, default level
- WUKONG_AUDIO_VAD_LOW: - 60 dB, more sensitive and easier to trigger

Debugging recommendations

If missed detection occurs (speech is present but not detected): Switch to WUKONG_AUDIO_VAD_LOW, or appropriately increase max_speech_interval_ms.
If false triggering occurs (silence is detected as speech): Switch to WUKONG_AUDIO_VAD_HIGH, or appropriately increase min_speech_len_ms.
Cross-check the boundaries using the [vad start]/[vad stop] logs from the serial port/logs, along with the captured dump 4 data, to verify if they are reasonable.

API reference (Consistent with current code)

Interface definitions are in src/wukong/audio/wukong_audio_aec_vad.h, with implementations in wukong_audio_aec_vad.c.

Initialization and deinitialization

/**
 * Initialize AEC and VAD modules (creates Speex AEC and RNN VAD internally).
 * @param min_speech_len_ms   Minimum valid speech duration (ms).
 * @param max_speech_interval_ms   Maximum silence interval (ms). A timeout indicates the end of a sentence.
 * @param frame_size           Frame length (e.g., 320 for 20 ms at 16 kHz sampling rate).
 */
OPERATE_RET wukong_aec_vad_init(UINT32_T min_speech_len_ms, UINT32_T max_speech_interval_ms, UINT32_T frame_size);

OPERATE_RET wukong_aec_vad_deinit(VOID);

Data processing (invoked by the audio pipeline per frame)

/**
 * Feed one frame of data for AEC and VAD processing.
 * @param mic_data  Microphone input data.
 * @param ref_data  Reference signal (e.g., speaker echo capture).
 * @param out_data  AEC output data (to be sent to RNN VAD).
 */
OPERATE_RET wukong_aec_vad_process(INT16_T *mic_data, INT16_T *ref_data, INT16_T *out_data);

VAD sensitivity & control

typedef enum {
    WUKONG_AUDIO_VAD_HIGH,  // Threshold -40 dB, less prone to false triggering.
    WUKONG_AUDIO_VAD_MID,   // Threshold -50 dB, default.
    WUKONG_AUDIO_VAD_LOW,   // Threshold -60 dB, more sensitive.
} WUKONG_AUDIO_VAD_THRESHOLD_E;

/** Set the VAD sensitivity level. This is the only interface for runtime VAD adjustment. */
OPERATE_RET wukong_vad_set_threshold(WUKONG_AUDIO_VAD_THRESHOLD_E level);

/** Manually start/stop VAD detection. */
OPERATE_RET wukong_vad_start(VOID);
OPERATE_RET wukong_vad_stop(VOID);

/** Get the current VAD status: WUKONG_AUDIO_VAD_START or WUKONG_AUDIO_VAD_STOP */
INT_T wukong_vad_get_flag(VOID);

Support

If you have any problems with TuyaOS development, you can post your questions in the Tuya Developer Forum.

Prev DocAcoustic Structure Tuning

Next DocInteraction Mode