Voice Service

Last Updated on : 2026-03-17 02:15:55Copy for LLMView as MarkdownDownload PDF

This topic describes the voice services that apply to the 1302 and 8006 voice modules only.

Basic commands

Configure AI voice capabilities (0x37 + 0x00)

To enable the AI capabilities on the general firmware, the MCU must include the AI field in the JSON data returned by the device’s new feature setting notification command (0x37 + 0x00) (optional) to describe the specific configuration.

JSON example for configuring AI voice capabilities

"ai": {
  "type": 0,
  "scode":"aaa",
  "mode":0
  "wkio": 15,
  "wkm": 0,
  "wkt": 5,
  "spk_io":28,
  "vol_in_io":4,
  "vol_de_io":5,
  "vol_mute_io":6,
  "up": [1,2],
  "down": [1,2],
  "vtype":0,
  "mt":60,
  "asr": 0,
  "rsize":1024,
  "ssize":1024
}

Field	Required	Description
type	Yes	Hardware solutions. `0`: Self-processing mode of the MCU `1`: Self-processing mode of the module `2`: Cooperation between the module and the MCU
scode	No	Solution code in the format of a string.
mode	No	`0`: Press and hold the button to initiate a conversation `1`: Press the button to interrupt the conversation `2`: Wake up a conversation via keyword `3`: Interrupt the conversation freely
wkio	No	For scenarios where the module has built-in voice capabilities or uses an external third-party voice chip, if a button is needed to wake up audio capture, configure the module’s wake-up pin here (Default: 12).
wkt	No	For mode values 1, 2, or 3, set the wake-up timeout (that is, VAD detection timeout). Default: 30 seconds, range: 30 to 180 seconds.
spk_io	No	Speaker pin (Default: 28).
vol_in_io	No	Volume up pin (Default: 4).
vol_de_io	No	Volume down pin (Default: 5).
vol_mute_io	No	Mute pin (Default: 6).
Up	No	An array of uplink data types the module is responsible for reporting. Multiple types can be included. Value is the message type, such as [0,1].
down	No	An array of downlink data types the module is responsible for handling. Multiple types can be included. Value is the message type, such as [1,2].
vtype	No	If uplink video is handled by the module, select the camera type. `0`: DVP (with USB) `1`: UVC (onboard)
mt	No	Maximum microphone pickup time when audio is present. Maximum upload time when video is present. Default: 60 seconds, range: 30 to 60 seconds.
asr	No	Specifies whether to receive text from Automatic Speech Recognition (ASR) . `0`: Do not receive. `1`: Receive (Default).
rsize	No	Maximum packet size the MCU can receive.
ssize	No	Maximum packet size the module can receive.
pcm	No	Specifies whether the MCU requires PCM audio.

The MCU sends the following data.

Field	Bytes	Description
Header	2	0x55aa
Version	1	0x03
Command	1	0x37
Data length	2	N
Data	1	Subcommand: 0x00
	n	JSON file for configuring AI voice capabilities
Checksum	1	Start from the header, add up all the bytes, and then divide the sum by 256 to get the remainder

Example:

55 AA 03 37 00 DA 00 7B 22 62 5F 6E 6D 22 3A 22 22 2C 22 61 69 22 3A 7B 22 74 79 70 65 22 3A 31 2C 22 77 6B 69 6F 22 3A 35 2C 22 73 70 6B 5F 69 6F 22 3A 32 38 2C 22 76 6F 6C 5F 69 6E 5F 69 6F 22 3A 32 32 2C 22 76 6F 6C 5F 64 65 5F 69 6F 22 3A 32 33 2C 22 77 6B 74 22 3A 33 30 2C 22 75 70 22 3A 5B 31 5D 2C 22 64 6F 77 6E 22 3A 5B 31 5D 2C 22 61 73 72 22 3A 31 2C 22 70 63 6D 22 3A 30 2C 22 72 73 69 7A 65 22 3A 36 31 34 34 2C 22 73 73 69 7A 65 22 3A 31 30 32 34 2C 22 6D 6F 64 65 22 3A 32 2C 22 6D 74 22 3A 36 30 2C 22 76 6F 6C 5F 6D 75 74 65 5F 69 6F 22 3A 32 36 7D 2C 22 73 62 75 73 22 3A 7B 22 70 6F 72 74 22 3A 30 2C 22 70 72 6F 74 22 3A 30 2C 22 63 73 22 3A 30 7D 7D F0

The module returns the following data.

Field	Bytes	Description
Header	2	0x55aa
Version	1	0x00
Command	1	0x37
Data length	2	0x0002
Data	2	Data format: Subcommand (0x00) + execution result. Specifically, the execution result consists of: `0x00`: Success `0x01`: Invalid data `0x02`: Failure
Checksum	1	Start from the header, add up all the bytes, and then divide the sum by 256 to get the remainder

Example: 55 AA 00 37 00 02 00 00 38

Prev DocFunction Control

Next DocGateway MCU Development