Audio Quality Test

Last Updated on : 2025-05-14 07:33:55download

Voice interaction is a fundamental feature of AI hardware. The quality of audio data during interaction serves as the foundation for ensuring product functionality and user experience. It is also a prerequisite for accurate speech recognition and precise comprehension/response from large language models. Therefore, every AI hardware product requires audio quality tests.

Audio quality test

Tuya provides basic audio data test methods to help you get audio data and adjust product structure, hardware, and software accordingly during the development process. Therefore, this ensures the quality of audio data, improves the wake-up rate and voice recognition rate, and optimizes the large model interaction experience.

Get data

Wukong AI Hardware Development Framework provides basic audio test methods. Taking tuyaos_demo_ai_toy as an example, follow the steps below:

Enable the audio test feature. Modify tuya_ai_debug.h, and set TUYA_UPLOAD_DEBUG to 1.
```
#define TUYA_UPLOAD_DEBUG 1
```

Modify tuya_ai_debug.c and change the server IP address to that of the host software, which is usually the IP address of your test computer.

#define TCP_SERVER_IP "192.168.32.160" // Change to the IP address of the computer where your host software is located
#define TCP_SERVER_PORT 5055

Copy scripts/ai_audio_proc.py in the project to windows and run the file.

You need to install pyaudio.
```
python ai_audio_proc.py
```
Build and flash the firmware to the device, run it, perform pairing, and then start chatting. When picking up sound and uploading during a chat, the firmware will automatically upload the audio data of the acoustic echo cancellation (AEC) and voice activity detection (VAD) stages to the current script directory and play it. You can wear headphones to check whether the audio data is complete, clear, and clean.

Analyze data

Use professional tools such as Ocenaudio to analyze audio data files and compare captured audio files.

Optimize hardware

Refer to Tuya’s hardware solution, adjust the hardware structure and position according to the solution, and purchase the recommended supporting components.

Optimize software — VAD

You can set the following parameters of VAD.

#define TY_AI_AUDIO_CFG_DEF { \
    .sample_rate = TKL_AUDIO_SAMPLE_16K, \
    .sample_bits = TKL_AUDIO_DATABITS_16, \
    .channel = TKL_AUDIO_CHANNEL_MONO, \
    .upload_slice_duration = 100, \
    .record_duration = 10000, \
    .vad_active_duration = 300, \
    .vad_pre_active_duration = 500,\
    .vad_inactive_duration = 500, \
    .vad_frame_duration = 10, \
    .vad_silence_timeout = 30000, \
}

Initial audio loss

Initial audio loss primarily occurs when energy fails to trigger VAD. After mid-speech VAD activation, the system should include initial audio as much as possible by adjusting vad_pre_active_duration. Currently, the default value is 500 ms. The issue has been basically resolved.

Pauses cause interruptions in speech

In case speech pauses cause voice interruption, you can adjust vad_inactive_duration to delay VAD deactivation. This merges segmented phrases into continuous speech but increases detection latency and system response time. The default value is 500 ms, and you can adjust it to 800 ms to verify the effect.

Additionally, the cloud performs VAD on the uploaded audio file. If valid human speech is detected, it interrupts the current session to enable ‘free-talk’ functionality.

Optimize software — AEC

Adjusting the AEC parameters can modify the echo cancellation depth ec_depth. Generally, louder sounds require a larger ec_depth setting. To adjust ec_depth, you need to modify the value of temp_aec_info->aec_config->ec_depth in the aud_tras_drv_aec_cfg function. 0x14 is generally recommended.

// aec parameter
temp_aec_info->aec_config->mic_delay = 16;//0x0
temp_aec_info->aec_config->ec_depth = 0x2;//0x14
temp_aec_info->aec_config->voice_vol =0x0d;//0xe
temp_aec_info->aec_config->ns_level = 0x5;//0x2
temp_aec_info->aec_config->ns_para = 0x02;//0x1
temp_aec_info->aec_config->drc = 0x0;//0xf

Optimize software — MIC gain

Adjusting the MIC gain can increase the pickup distance of the MIC. However, it will also pick up additional noise. You are required to make adjustments as needed to debug and achieve a balanced state.

/**
* @brief ai set mic volume
*
* @param[in] card: card number
* @param[in] chn: channel number
* @param[in] vol: mic volume,[0, 100]
*
* @return OPRT_OK on success. Others on error, please refer to tkl_error_code.h
*/
OPERATE_RET tkl_ai_set_vol(INT32_T card, TKL_AI_CHN_E chn, INT32_T vol)

// example
tkl_ai_set_vol(TKL_AUDIO_TYPE_BOARD, 0, 100);

Handle the audio algorithm yourself

If you possess specialized expertise in speech processing and have substantial experience, you can independently handle front-end audio processing. Integrate your custom voice algorithm into the T5 development framework and replace T5’s built-in front-end audio processing with your own implementation.

Replace AEC

The front-end audio data entry point is located in the aud_tras_aec function within the aud_tras_drv.c file. This function processes microphone-captured audio and reference audio from the loopback circuit through aec_proc for echo cancellation. Simply replace this interface.

static bk_err_t aud_tras_aec(void)
{
    ...
    aec_proc(aec_info_pr->aec, aec_info_pr->ref_addr, aec_info_pr->mic_addr, aec_info_pr->out_addr);
    ...
}

void aec_proc(AECContext* aec, int16_t* rin, int16_t* sin, int16_t* out);

Replace VAD

After AEC processing, the data is sent to the application layer via the put_cb registered through the tkl_ai_init interface. The put_cb forwards the data to the VAD module for real-time processing to detect a valid human voice. To replace the existing functionality, implement features equivalent to ty_vad_app. ty_vad_app is located in vendor/T5/tuyaos/tuyaos_adapter/src/misc/ty_vad_app.c.

Support and help

If you have any problems with TuyaOS development, you can post your questions in the Tuya Developer Forum.

Prev DocEmotion Perception

Next DocNetworked Product Framework