This topic describes how to debug acoustic echo cancellation (AEC) and voice activity detection (VAD) in this example. These features are implemented in src/wukong/audio/wukong_audio_aec_vad.c, with their external interfaces defined in the header file wukong_audio_aec_vad.h. This version uses Speex AEC + AES + noise suppression and RNN VAD, without online serial port parameter tuning. Debugging primarily focuses on initialization parameters and VAD sensitivity levels, combined with capturing audio data for analysis.
It is recommended to use the official Tuya serial debugging tool tyuTool for joint debugging and data capture.
debug ser or debug ser_auto mode.In the serial debugging mode of tyuTool, common commands include (refer to the tool’s built-in help for specifics):
| Command | Description |
|---|---|
start/stop/reset |
Start/Stop/Reset recording and processing |
dump 0 |
Dump raw data of the microphone |
dump 1 |
Dump reference loopback data |
dump 2 |
Dump AEC output data (after echo cancellation) |
dump 4 |
Dump VAD-related data |
volume <0-100> / micgain <0-100> |
Adjust playback volume and microphone gain |
By using dump 0, dump 1, and dump 2, you can compare the MIC, REF, and AEC output to evaluate the echo cancellation effect. Combined with the VAD start/stop prints in the logs, you can analyze VAD behavior.
reset, then send start to put the device into recording and processing state.bg 0, bg 1… as needed, or simply speak to the device. Observe the audio effect and logs.stop, then sequentially send dump 0, dump 1, and dump 2 to capture data from the MIC, REF, and AEC channels.Before debugging AEC/VAD performance, it is recommended to fix the following items to avoid environmental changes interfering with judgment.
speex_aes_set_param(handle, value)
value is an integer representing the strength of residual echo suppression. The default value in the current source code is 5.speex_ns_set_param(handle, level1, level2)
level1 and level2 are integers with no strict range limits.
level1: Noise suppression strength. A larger value means stronger suppression.level2: Noise floor level. A smaller value indicates a lower noise floor, allowing more noise to be suppressed.speex_ns_set_param(handle, 8, 10). Adjust based on the actual environment in the source code and recompile.dump 0/1/2 data.speex_aes_set_param value. If the interruption/double-talk performance is poor: Try decreasing this value.level1 and level2 in speex_ns_set_param according to the SNR.wukong_aec_vad_init(), with the following parameters:
min_speech_len_ms: Minimum valid speech duration (ms). A value too small might cause false triggers.max_speech_interval_ms: Maximum silence interval (ms). A timeout indicates the end of a sentence.wukong_vad_set_threshold(level) function selects the sensitivity level (see APIs below). Internally, this maps to RNN threshold values in dB:
WUKONG_AUDIO_VAD_HIGH: - 40 dB, less prone to false triggeringWUKONG_AUDIO_VAD_MID: - 50 dB, default levelWUKONG_AUDIO_VAD_LOW: - 60 dB, more sensitive and easier to triggerWUKONG_AUDIO_VAD_LOW, or appropriately increase max_speech_interval_ms.WUKONG_AUDIO_VAD_HIGH, or appropriately increase min_speech_len_ms.[vad start]/[vad stop] logs from the serial port/logs, along with the captured dump 4 data, to verify if they are reasonable.Interface definitions are in src/wukong/audio/wukong_audio_aec_vad.h, with implementations in wukong_audio_aec_vad.c.
/**
* Initialize AEC and VAD modules (creates Speex AEC and RNN VAD internally).
* @param min_speech_len_ms Minimum valid speech duration (ms).
* @param max_speech_interval_ms Maximum silence interval (ms). A timeout indicates the end of a sentence.
* @param frame_size Frame length (e.g., 320 for 20 ms at 16 kHz sampling rate).
*/
OPERATE_RET wukong_aec_vad_init(UINT32_T min_speech_len_ms, UINT32_T max_speech_interval_ms, UINT32_T frame_size);
OPERATE_RET wukong_aec_vad_deinit(VOID);
/**
* Feed one frame of data for AEC and VAD processing.
* @param mic_data Microphone input data.
* @param ref_data Reference signal (e.g., speaker echo capture).
* @param out_data AEC output data (to be sent to RNN VAD).
*/
OPERATE_RET wukong_aec_vad_process(INT16_T *mic_data, INT16_T *ref_data, INT16_T *out_data);
typedef enum {
WUKONG_AUDIO_VAD_HIGH, // Threshold -40 dB, less prone to false triggering.
WUKONG_AUDIO_VAD_MID, // Threshold -50 dB, default.
WUKONG_AUDIO_VAD_LOW, // Threshold -60 dB, more sensitive.
} WUKONG_AUDIO_VAD_THRESHOLD_E;
/** Set the VAD sensitivity level. This is the only interface for runtime VAD adjustment. */
OPERATE_RET wukong_vad_set_threshold(WUKONG_AUDIO_VAD_THRESHOLD_E level);
/** Manually start/stop VAD detection. */
OPERATE_RET wukong_vad_start(VOID);
OPERATE_RET wukong_vad_stop(VOID);
/** Get the current VAD status: WUKONG_AUDIO_VAD_START or WUKONG_AUDIO_VAD_STOP */
INT_T wukong_vad_get_flag(VOID);
If you have any problems with TuyaOS development, you can post your questions in the Tuya Developer Forum.
Is this page helpful?
YesFeedbackIs this page helpful?
YesFeedback