Built-in Voice Wake-up

Last Updated on : 2026-05-27 06:37:08Copy for LLMView as MarkdownDownload PDF

Overview

This topic describes the Keyword Spotting (KWS) module, which provides keyword wake-up detection for the Wukong AI system. The module supports two integration modes:

  • On-board microphone, through the TUTUClear or SNDX engine.
  • External voice chip, through Universal Asynchronous Receiver-Transmitter (UART).

Directory structure

kws/
├── wukong_kws.h/c               # KWS interface and core implementation
├── tutuclear/                   # TUTUClear wake word engine
│   ├── tutuclear.h/c
│   └── [model library].a
├── sndx/                        # SNDX wake word engine
│   ├── sndx.h/c
│   └── [model library].a
└── uart/                        # UART external mode (external CODEC detection)
    └── uart.h/c

Supported keywords

TUTUClear engine

  • libtutuClear_wakeup_nihaotuya_*.a: Nihao Tuya
  • libtutuClear_wakeup_xiaozhitongxue_*.a: Xiaozhi Tongxue
  • libtutuClear_wakeup_heytuya_*.a: Hey Tuya
  • libtutuClear_*_small_model.a: All-in-one model

SNDX engine

  • libsndxasr-nihaotuya.a: Nihao Tuya
  • libsndxasr-hey-tuya.a: Hey Tuya
  • libsndxasr-hey-smart-life.a: Hey SmartLife

User-defined wake words

  • Use WUKONG_KWS_UDF1, WUKONG_KWS_UDF2, and WUKONG_KWS_UDF3 to define custom wake words.

Process

Mode 1: Onboard microphone (USING_BOARD_AUDIO_INPUT=1)

┌─────────────────────────────────────────────────────────────────────────────┐
│ 1. Initialize                                                               │
│    wukong_kws_init() / wukong_kws_default_init()                            │
│    ├── Create the ring buffer (2 s, WUKONG_KWS_BUFSZ)                       │
│    ├── Create the semaphore and mutex                                       │
│    ├── Create the KWS worker thread                                         │
│    └── Call cfg.create() to initialize the engine (TUTUClear or SNDX)       │
└─────────────────────────────────────────────────────────────────────────────┘
                                        │
                                        ▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│ 2. Feed audio (about 20 ms per frame, called by the AEC or VAD module)                │
│    wukong_kws_feed_with_vad(data, datalen, vadflag)                                   │
│    ├── __wukong_kws_feed: write to the ring buffer                                    │
│    ├── VAD logic branch:                                                              │
│    │   • vad on or vad end → __wukong_kws_post (post semaphore when conditions match) │
│    │   • vad off → __wukong_kws_drop (drop stale data on silence timeout)             │
│    └── Condition: buffer ≥ 100 ms, or force post on vad end                           │
└───────────────────────────────────────────────────────────────────────────────────────┘
                                        │
                                        ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ 3. Run worker thread (__wukong_kws_thread)                                              │
│    while (no exit request) {                                                            │
│        sem_wait (VAD mode: block indefinitely; continuous mode: 100 ms timeout)         │
│        mutex_lock → read from the ring buffer → mutex_unlock                            │
│        If readlen > 0:                                                                  │
│            cfg.detect(ctx, buffer, readlen)                                             │
│            on wakeup: reset engine + reset ring buffer + publish EVENT_WUKONG_KWS_WAKEUP│
│    }                                                                                    │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                        │
                                        ▼
┌───────────────────────────────────────────────────────────────────────────────────────────────┐
│ 4. Publish event                                                                              │
│    ty_publish_event(EVENT_WUKONG_KWS_WAKEUP, WUKONG_KWS_INDEX_E)                              │
│    Subscribers, such as the mode module, handle the wake-up and switch the conversation state.│
└───────────────────────────────────────────────────────────────────────────────────────────────┘

Mode 2: External UART (USING_BOARD_AUDIO_INPUT=0)

External UART CODEC chip ──detects wake word──► tdl_comm_audio callback
                                          │
                                          ▼
                              __WAKE_UP_CB() → wukong_kws_event(index)
                                          │
                                          ▼
                              ty_publish_event(EVENT_WUKONG_KWS_WAKEUP, ...)

In this mode, the device does not feed local Pulse Code Modulation (PCM) data. The external CODEC reports the wake event.

API reference

Initialization

/* Use the default engine (TUTUClear). */
INT_T wukong_kws_default_init(VOID);

/* Use a custom engine configuration. */
typedef struct {
    INT_T (*create)(WUKONG_KWS_CTX_T *ctx);
    INT_T (*detect)(WUKONG_KWS_CTX_T *ctx, UINT8_T *data, UINT32_T datalen);
    INT_T (*reset)(WUKONG_KWS_CTX_T *ctx);
    INT_T (*deinit)(WUKONG_KWS_CTX_T *ctx);
    UINT8_T is_detect_vad;   /* 1 = VAD throttling; 0 = continuous detection */
} WUKONG_KWS_CFG_T;

INT_T wukong_kws_init(WUKONG_KWS_CFG_T *cfg);
INT_T wukong_kws_uninit(VOID);

Control

INT_T wukong_kws_enable(VOID);
INT_T wukong_kws_disable(VOID);
INT_T wukong_kws_set_vad_detect(UINT8_T is_detect_vad);

Data feed (onboard mode)

/* Called by the AEC or VAD module per frame. vadflag: 1 = voice, 0 = silence. */
INT_T wukong_kws_feed_with_vad(UINT8_T *data, UINT16_T datalen, UINT8_T vadflag);

Event

  • Name: EVENT_WUKONG_KWS_WAKEUP
  • Data: WUKONG_KWS_INDEX_E (wake word index)
ty_subscribe_event(EVENT_WUKONG_KWS_WAKEUP, "my_module", on_kws_wakeup, SUBSCRIBE_TYPE_NORMAL);

Audio requirements

  • Sample rate: 16 kHz
  • Format: 16-bit PCM, mono
  • Frame length: typically 320 samples per frame (20 ms)

Switch engines

/* Use TUTUClear (default). */
WUKONG_KWS_CFG_T cfg = {
    .create  = TUTUClear_kws_create,
    .detect  = TUTUClear_kws_detect,
    .reset   = TUTUClear_kws_reset,
    .deinit  = TUTUClear_kws_deinit,
    .is_detect_vad = 1,
};
wukong_kws_init(&cfg);

/* Use SNDX. */
cfg.create = SNDX_kws_create;
cfg.detect = SNDX_kws_detect;
cfg.reset  = SNDX_kws_reset;
cfg.deinit = SNDX_kws_deinit;
wukong_kws_init(&cfg);

Customize wake words

  1. Train or obtain a wake word model. Contact Tuya or your solution provider.
  2. Add the corresponding index to WUKONG_KWS_INDEX_E.
  3. Compile the model into lib<name>.a and place it in the libs/ directory.
  4. Integrate the new engine in cfg.create and cfg.detect, and call wukong_kws_event(index) in detect.

Support and help

If you have any problems with TuyaOS development, you can post your questions in the Tuya Developer Forum.