Voice Services

Last Updated on : 2023-02-13 03:09:24download

AI voice technologies enable hands-free device control and open up some fascinating possibilities. This topic describes the voice services provided in TuyaOS Central Control SDK and how to use the APIs.

Background

The voice services depend on the cloud to work. The central control device uploads voice data to the cloud. The cloud crunches the data and sends back the result. The voice services include the following capabilities:

Upload voice data to the cloud.
Send text-to-speech (TTS) audio to the central control device.
Send music or FM radio to the central control device.
Send automatic speech recognition (ASR) text output to the central control device.
Send TTS text to the central control device.
Get the details of alarm clocks.
Pair devices by voice.
Set a nickname.
Voice system data points (DPs), including volume setting, alarm clock/reminder, mic on/off, play/pause, previous/next, and Bluetooth playback on/off.

Enable voice services

Function prototype	OPERATE_RET home_control_user_voice_sevice_start(VOID)
Function description	This is a demo function, used to enable voice services.
Parameter	-
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	-

Example

OPERATE_RET home_control_user_voice_sevice_start(VOID)
{
    OPERATE_RET rt = OPRT_OK;
    TY_SPEAKER_CLOUD_CBS_S speaker_cloud_cbs;
    TY_VOICE_CAPABLE_CBS_S voice_cap = {0};
    // Register callbacks for handling commands received from the cloud.
    memset(&speaker_cloud_cbs, 0x0, sizeof(TY_SPEAKER_CLOUD_CBS_S));
    speaker_cloud_cbs.rev_media_cb = __voice_rcv_cloud_media_cb;
    speaker_cloud_cbs.sync_audio_cb = __voice_rcv_sync_audio_cb;
    speaker_cloud_cbs.thing_config_cb = __voice_rev_cloud_thing_config_cb;
    speaker_cloud_cbs.nick_name_cb = __voice_rev_cloud_set_nick_name_cb;
    speaker_cloud_cbs.rev_ext_custom_cb = __voice_rcv_cloud_ext_custom_cb;
    rt = tuya_speaker_cloud_register(&speaker_cloud_cbs);
    if (OPRT_OK != rt) {
        PR_ERR("tuya_speaker_cloud_register error, rt: %d", rt);
        return rt;
    }

    // Initialize voice upload channel.
    SPEAKER_UPLOAD_CONFIG_S upload_config = SPEAKER_UPLOAD_CONFIG_FOR_SPEEX();
    upload_config.report_stat_cb          = __voice_upload_media_status_cb;
    rt = speaker_intf_upload_init(&upload_config);
    if (OPRT_OK != rt) {
        PR_ERR("speaker_upload_init error:%d", rt);
        return rt;
    }

    // Register callbacks for voice system DPs.
    user_voice_system_dp_register();
    return OPRT_OK;
}

Register callbacks on initialization

Volume setting callback

Function prototype	VOID (*TY_VOICE_VOL_CTL_CB)(UINT8_T vol)
Function description	Set the volume through the voice system DP 203.
Parameter	`vol`: The volume level, ranging from 0 to 100.
Return value	VOID
Detailed description	After the setting is done, call `tuya_voice_capable_report_vol` to synchronize the current volume level with the cloud.

DP control command callback

Function prototype	VOID (*TY_VOICE_CTL_CB)(TY_VOICE_CTL_E ctl)
Function description	The cloud sends DP commands to control the voice system of the central control device.
Parameter	`ctl`: The control command.
Return value	VOID
Detailed description	The `TY_VOICE_CTRL_E` struct includes all the voice system commands. `TY_VOICE_MIC_OPEN`: Turn on mic. `TY_VOICE_MIC_CLOSE`: Turn off mic. `TY_VOICE_BT_PLAY_OPEN`: Turn on Bluetooth playback. `TY_VOICE_BT_PLAY_CLOSE`: Turn off Bluetooth playback. `TY_VOICE_PLAY_START`: Start playing the media. `TY_VOICE_PLAY_PAUSE`: Pause the media. `TY_VOICE_PLAY_PREV`: Go to the previous item. `TY_VOICE_PLAY_NEXT`: Go to the next item. After setting is done, call `tuya_voice_capable_report_ctl` to synchronize the current status with the cloud.

Create local alarm clocks

Function prototype	VOID (TY_VOICE_ALARM_CLOCK_CB)(CHAR_T alarm)
Function description	The cloud sends a local alarm clock to the central control device through the voice system DP 207.
Parameter	`alarm`: The details of the alarm.
Return value	VOID
Detailed description	-

Voice upload status

Function prototype	VOID(*SPEAKER_UPLOAD_REPORT_STAT_CB)(SPEAKER_UPLOAD_STAT_E stat)
Function description	This callback is invoked when an error occurs on voice upload.
Parameter	`status`: The status of voice upload.
Return value	VOID
Detailed description	`status`: `SPEAKER_UP_STAT_NET_ERR` indicates a network error.

Receive multimedia data from the cloud

Function prototype	VOID (TY_SPEAKER_REV_CLOUD_MEDIA_CB)(IN TY_CLOUD_MEDIA_S *pp_media_arr, IN CONST UINT_T arr_size)
Function description	This callback is invoked when the central control device receives multimedia data from the cloud. The URL and audio parameters are used to request the audio data stream.
Parameter	`pp_media_arr`: Multimedia data. `arr_size`: The number of struct arrays.
Return value	VOID
Detailed description	Parameters for multimedia data: `comm`: The audio parameter. `p_url`: The URL of the HTTP request. `p_req_body`: The body of the HTTP request.

Sub-device pairing request

Function prototype	VOID (TY_SPEAKER_THING_CONFIG_CB)(IN TY_THING_CONFIG_MODE_T mode, IN CONST CHAR_T token, IN CONST UINT_T timeout)
Function description	The central control device receives a request from the cloud to start or stop pairing a sub-device.
Parameter	`mode`: The pairing command. `token`: The token used to pair a Wi-Fi device. `timeout`: The pairing timeout period.
Return value	VOID
Detailed description	The pairing command: `TY_THING_CONFIG_START`: Start pairing. `TY_THING_CONFIG_STOP`: Stop pairing.

Nickname setting request

Function prototype	VOID (TY_SPEAKER_NICK_NAME_CB)(IN CONST TY_NICK_NAME_MODE_T mode, IN CONST CHAR_T nickname, IN CONST CHAR_T *pinyin)
Function description	The central control device receives a nickname setting request from the cloud.
Parameter	`mode`: The nickname setting command. `nickname`: The nickname in Chinese. `pinyin`: The nickname in Pinyin.
Return value	VOID
Detailed description	The nickname setting command: `TY_NICK_NAME_SET`: Add a nickname. `TY_NICK_NAME_DEL`: Delete a nickname.

Pass through custom data

Function prototype	VOID (TY_SPEAKER_REV_CUSTOM_CB)(IN CONST CHAR_T type, IN CONST ty_cJSON *json)
Function description	A customized 501 pass-through function to pass custom data. You can define the protocol and provide a description.
Parameter	`type`: The interface type. `json`: The JSON data.
Return value	VOID
Detailed description	-

Report the current volume level

Function prototype	OPERATE_RET tuya_voice_capable_report_vol (UINT8_T vol)
Function description	The central control device reports the current volume level to the cloud.
Parameter	`vol`: The volume level, ranging from 0 to 100.
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	-

Report the current voice system status

Function prototype	OPERATE_RET tuya_voice_capable_report_vol (UINT8_T vol)
Function description	The central control device reports the current voice system status to the cloud.
Parameter	`ctl`: The command.
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	The TY_VOICE_CTRL_E struct includes all the voice system commands. `TY_VOICE_MIC_OPEN`: Turn on mic. `TY_VOICE_MIC_CLOSE`: Turn off mic. `TY_VOICE_BT_PLAY_OPEN`: Turn on Bluetooth playback. `TY_VOICE_BT_PLAY_CLOSE`: Turn off Bluetooth playback. `TY_VOICE_PLAY_START`: Start playing the media. `TY_VOICE_PLAY_PAUSE`: Pause the media. `TY_VOICE_PLAY_PREV`: Go to the previous item. `TY_VOICE_PLAY_NEXT`: Go to the next item.

Upload voice data

The following example opens a WAV audio file, reads the PCM audio data, and uploads it to the cloud. You can find the audio reference file in the audio_resource directory.

OPERATE_RET home_control_user_voice_upload_cloud(CONST CHAR_T *pcm_file)
{
    OPERATE_RET ret = OPRT_OK;
    CONST INT_T send_len = 640;//704
    INT_T remain_len = 0;
    INT_T sended_len = 0;
    CONST INT_T wav_head_len = 44;

    PR_DEBUG("start upload test pcm");

    if (pcm_file == NULL) {
        PR_ERR("Param is invalid");
        return OPRT_INVALID_PARM;
    }

    FILE *p_file = fopen(pcm_file, "r");
    if (p_file == NULL) {
        PR_ERR("Unable to open file %s", pcm_file);
        return OPRT_INVALID_PARM;
    }
    struct stat statbuf;
    stat(pcm_file, &statbuf);
    INT_T total_len = statbuf.st_size - wav_head_len;

    CHAR_T *buf = (char *)malloc(total_len);
    if (buf == NULL) {
        PR_ERR("Malloc error");
        return OPRT_MALLOC_FAILED;
    }
    memset(buf, 0, sizeof(total_len));
    fseek(p_file, wav_head_len, SEEK_SET);
    ret = fread(buf, 1, total_len, p_file);
    if (ret != total_len) {
        PR_ERR("Read error\n", ret);
        ret = OPRT_COM_ERROR;
        goto exit;
    }

    // Initialize encoding parameters.
    TY_AUDIO_INFO_S audio_info = {
        .channels = 1,
        .rate = 16000,
        .bits_per_sample = 16,
    };
    tuya_voice_upload_media_start(TY_SPEEX, &audio_info);

    do {
        ret = tuya_voice_upload_media_send((BYTE_T *)buf + sended_len, send_len);
        if (ret < 0) {
            PR_ERR("speaker_intf_upload_media_send faild:%d", ret);
            ret = OPRT_COM_ERROR;
            goto exit;
        }
        sended_len += send_len;
        remain_len = total_len - sended_len;
    } while (remain_len >= send_len);

    PR_DEBUG("remain_len:%d\n", remain_len);
    if (remain_len > 0) {
        tuya_voice_upload_media_send((BYTE_T *)buf + sended_len, remain_len);
    }

    ret =  OPRT_OK;

exit:
    if (p_file)
        fclose(p_file);
    if (buf)
        free(buf);
    tuya_voice_upload_media_stop();

    return ret;
}

Voice data upload only supports a single thread. There are three steps involved: start, upload, and stop.

Initialize voice upload

Function prototype	OPERATE_RET tuya_voice_upload_media_start (IN CONST TY_MEDIA_ENCODE_T type, IN CONST TY_AUDIO_INFO_S *info)
Function description	Enable uploading voice data to the cloud.
Parameter	`info`: Audio parameters. `type`: The encoding type.
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	The audio parameters: `channels`: The audio channel. Set it to `1` for mono. `sample_freq`: The sample rate. Set it to 16000. `bit_depth`: The sample depth. Set it to 16 bits.

Upload voice data

Function prototype	OPERATE_RET tuya_voice_upload_media_send(IN CONST BYTE_T *p_buf, IN CONST UINT_T buf_len)
Function description	The central control device uploads voice data to the cloud, writing PCM raw stream to the buffer.
Parameter	`p_buf`: The buffer address of PCM raw stream. `buf_len`: The length of PCM raw stream.
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	-

Stop uploading voice data

Function prototype	OPERATE_RET tuya_voice_upload_media_stop(VOID)
Function description	The central control device stops uploading voice data and frees resources.
Parameter	VOID
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	-

Free the resources for multimedia reception

Function prototype	OPERATE_RET tuya_speaker_free_cloud_media_arr (IN TY_CLOUD_MEDIA_S **pp_media_arr, IN CONST UINT_T arr_size)
Function description	The central control device frees the resources used for multimedia reception.
Parameter	`pp_media_arr`: Same as the parameter of `__voice_rcv_cloud_media_cb`. `arr_size`: Same as the parameter of `__voice_rcv_cloud_media_cb`.
Return value	OPERATE_RET `0`: Success. Other values: Failure. See the error code.
Detailed description	When `__voice_rcv_cloud_media_cb` is invoked, the user downloads the requested audio file, such as TTS or music. After the playback is finished, the resources are freed.

Notify the cloud of the end of a conversation

Function prototype	OPERATE_RET tuya_speaker_mqtt_report_complete_tts(IN CONST CHAR_T *p_callback_val)
Function description	The central control device notifies the cloud that the current conversation ended.
Parameter	`p_callback_val`: The type of conversation.
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	The user can hold a continuous conversation with the voice assistant by a single use of a wake word. In this case, when TTS playback is finished, this interface is called to notify the cloud that the current conversation is closed so that a new conversation can be initiated.

Pair devices by voice

Start pairing

Function prototype	OPERATE_RET tuya_speaker_mqtt_report_thing_config_request (VOID)
Function description	The central control device reports the start of pairing to the cloud.
Parameter	VOID
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	-

Stop pairing

Function prototype	OPERATE_RET tuya_speaker_mqtt_report_thing_config_stop (VOID)
Function description	The central control device reports the stop of pairing to the cloud.
Parameter	VOID
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	-

Reject pairing

Function prototype	OPERATE_RET tuya_speaker_mqtt_report_thing_config_reject (VOID)
Function description	The central control device reports the rejection of pairing to the cloud.
Parameter	VOID
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	-

Report the number of paired devices

Function prototype	OPERATE_RET tuya_speaker_mqtt_report_thing_config_access_count (IN CONST INT_T count)
Function description	The central control device reports the number of paired devices to the cloud.
Parameter	`count`: The number of paired devices.
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	-

Report the result of nickname setting

Function prototype	OPERATE_RET tuya_speaker_mqtt_report_nick_name (IN CONST TY_NICK_NAME_MODE_T mode, IN CONST CHAR_T nickname, IN CONST CHAR_T pinyin, IN CONST BOOL_T set_result)
Function description	The central control device reports the result of adding or deleting a nickname.
Parameter	`mode`: The nickname setting command. `nickname`: The nickname in Chinese. `pinyin`: The nickname in Pinyin. `set_result TRUE`: Success. `FALSE`: Failure.
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	The nickname setting command: `TY_NICK_NAME_SET`: Add a nickname. `TY_NICK_NAME_DEL`: Delete a nickname.

Upload text to the cloud

Function prototype	OPERATE_RET tuya_speaker_mqtt_get_tts(IN CONST CHAR_T *p_tts_content)
Function description	Upload the text content to the cloud.
Parameter	`p_tts_content`: The text content.
Return value	OPERATE_RET: `0`: Success. Other values: Failure. See the error code.
Detailed description	`__voice_rcv_cloud_media_cb` receives the TTS audio for playback.

Prev DocDevice Control

Next DocVoice Alarm Clock