Gateway Failover

Last Updated on : 2025-08-08 01:50:23download

This topic describes how to implement gateway failover with the Gateway Development Framework.

Background

End users can switch to a new gateway if the current gateway fails. All the sub-device data and scene configuration on the failed device can be migrated to the new one, which frees users from setting up sub-devices and scenes all over again.

How it works

Essentially, failover is all about data backup and recovery. A gateway with failover enabled regularly backs up data to the cloud. When a hardware failure occurs, the end user can switch to a new gateway. The new gateway downloads the last backup of the failed gateway from the cloud after getting notified for data recovery.

An IoT gateway can support various connectivity protocols by interfacing a microcontroller with network modules. Therefore, the module’s data, such as wireless networks, also needs to be recovered, which depends on the used protocol or module. If you do not use Tuya’s network module, be sure the module you use supports failover.

How to

  • Call tuya_gw_replacement_enable before tuya_iot_sdk_pre_init to enable failover.
  • For gateways using Tuya’s Zigbee module, call tuya_iot_wired_wf_sdk_init to initialize the SDK, and then call tuya_gw_replacement_zigbee_init to initialize Zigbee gateway failover. There is no need to care about data backup and recovery.
  • If your gateway uses a third-party module or you want to back up and recover private data, you can register callbacks to implement backup and recovery. Call tuya_gw_replacement_register_cb to register failover callbacks. In the backup callback, call tuya_gw_flt_rpl_put_cfg_file to push the backup file. In the recovery callback, call tuya_gw_flt_rpl_get_cfg_file to extract the backup file.

Flowchart

UserFaulty GatewayCloudNew GatewayBackup trigger phase (Log info in parentheses)Device gets connected/activatedReport online status (Net state: online)Backup preparationDetect unbacked state (Never back up, need backup)Initiate backup to cloud after 15 minBackup execution phaseEnter GW_FLT_RPL_BACK_UP_CB callbacktuya_gw_flt_rpl_put_cfg_file packages custom files and uploads backupBackup completedReports restore completion (Restore done/Success)Fault replacement phaseInitiate replacement via app (enter faulty gateway SN)Validate replacement conditions (same home/category/offline status)Request backup data (Reboot to take effect restore procedure)Power cycles to trigger restore processDetect MAC change and initiate fault replacement processPull cloud backup (gw_fr.tar.gz) to local device and extract to temp filesPush network config/sub-device list (backup address + NCP MAC)Provide metadata (virtual IDs, product_key, and ability)Write Zigbee network/device tables to NCP & local DBManual reboot after success to activate new Zigbee tablesRestore Zigbee network (zigChannel:X)Report "replacement success", and trigger first post-reboot backupUpload full gateway/sub-device config (auto-backup every 7 days thereafter)UserFaulty GatewayCloudNew Gateway

Callbacks

Backup notification

/**
 * @brief Apply fault replacement package notification
 *
 * @param app_ver Application version
 *
 * @return OPERATE_RET OPRT_OK is success
 */
typedef OPERATE_RET(*GW_FLT_RPL_BACK_UP_CB)(OUT USHORT_T *app_ver);

Backup result

/**
 * @brief Notification of the end of file backup, and the application can clear the cache file at this time
 *
 * @param result Backup result
 *
 * @return OPERATE_RET OPRT_OK is success
 */
typedef OPERATE_RET(*GW_FLT_RPL_BACK_UP_DONE_NOTIFY_CB)(IN INT_T result);

Recovery notification

/**
 * @brief Gateway failure replacement recovery notification application callback
 *
 * @param app_ver Application version
 * @param errcode Recovery result, 0 means success.
 *
 * @return OPERATE_RET OPRT_OK is success
 */
typedef OPERATE_RET(*GW_FLT_RPL_RESTORE_NOTIFY_CB)(IN USHORT_T app_ver,OUT INT_T *errcode);

Phase-2 recovery notification

/**
 * @brief Gateway failure replacement recovery step 2
 *
 * @return OPERATE_RET OPRT_OK is success
 */
typedef OPERATE_RET(*GW_FLT_RPL_RESTORE_STAGE2_CB)(VOID);

Recovery result

/**
 * @brief Notification of end of recovery process
 *
 * @param errcode Recovery result, 0 means success.
 *
 * @return OPERATE_RET OPRT_OK is success
 */
typedef OPERATE_RET(*GW_FLT_RPL_RESTORE_DONE_NOTIFY_CB)(IN INT_T result);

API description

Enable failover

/**
 * @brief Enable gateway replacement feature, which is disabled by default.
 * @note This API should be called before `tuya_iot_init`.
 */
VOID tuya_gw_replacement_enable(VOID);

Initialize failover

/**
 * @brief Initiate Zigbee NCP replacement when Tuya's Zigbee module is used.
 * @note If you use Tuya's Zigbee module, SDK will backup and restore data for Zigbee NCP, you only
 *       need to call this API to initiate it with an NCP version.
 *
 * @param[in] version minimum NCP version that supports replacement feature.
 *
 * @return OPRT_OK on success. For others on error, please refer to tuya_error_code.h
 */
OPERATE_RET tuya_gw_replacement_zigbee_init(CHAR_T *version);
#define tuya_gw_user_fault_replace_init tuya_gw_replacement_zigbee_init

Extract backup file

/**
 * @brief Extract the required files from the backup package
 *
 * @param filename Compressed package name
 * @param dir file name to extract
 * @return OPERATE_RET OPRT_OK is success
 */
OPERATE_RET tuya_gw_flt_rpl_get_cfg_file(IN CONST CHAR_T *filename, IN CONST CHAR_T *dir);

Push backup file

/**
 * @brief Fill the files that need to be backed up into the package
 *
 * @param filename Compressed package name
 * @param dir File name to be packaged
 * @return OPERATE_RET OPRT_OK is success
 */
OPERATE_RET tuya_gw_flt_rpl_put_cfg_file(IN CONST CHAR_T *filename, IN CONST CHAR_T *dir);

Register failover callback

/**
 * @brief Register gateway replacement callback, which is used for custom module.
 * @note This API should be called after `tuya_iot_init`.
 *
 * @param[in] handler callback function.
 *
 * @return OPRT_OK on success. For others on error, please refer to tuya_error_code.h
 */
OPERATE_RET tuya_gw_replacement_register_cb(GW_FLT_RPL_CBS_S *handler);

Example

STATIC OPERATE_RET __fr_backup_cb(OUT USHORT_T *app_ver)
{
    PR_DEBUG("fr backup");

    /* Push the file */
    tuya_gw_flt_rpl_put_cfg_file("shadow", "/etc/");
    tuya_gw_flt_rpl_put_cfg_file("passwd", "/etc/");

    return OPRT_OK;
}

STATIC OPERATE_RET __fr_backup_done_cb(IN INT_T result)
{
    PR_DEBUG("fr backup done");

    return OPRT_OK;
}

STATIC OPERATE_RET __fr_restore_stage1_cb(IN USHORT_T app_ver,OUT INT_T *errcode)
{
    PR_DEBUG("fr restore stage1");

    *errcode = 0;

    tuya_gw_flt_rpl_get_cfg_file("shadow", "/etc/");
    tuya_gw_flt_rpl_get_cfg_file("passwd", "/etc/");

    return OPRT_OK;
}

STATIC OPERATE_RET __fr_restore_stage2_cb(VOID)
{
    PR_DEBUG("fr restore stage2");

    return OPRT_OK;
}

STATIC OPERATE_RET __fr_restore_done_cb(IN INT_T result)
{
    PR_DEBUG("fr restore done");

    if (result) {
        PR_ERR("restore failed");
    } else {
        PR_DEBUG("restore success");
    }

    return OPRT_OK;
}

VOID test_gw_fault_replacement(VOID)
{
    GW_FLT_RPL_CBS_S __gw_rf_cbs = {
        .gw_flt_rpl_back_cb                = __fr_backup_cb,
        .gw_flt_rpl_back_up_done_notify_cb = __fr_backup_done_cb,
        .gw_flt_rpl_restore_notify_cb      = __fr_restore_stage1_cb,
        .gw_flt_rpl_restore_stage2_cb      = __fr_restore_stage2_cb,
        .gw_flt_rpl_restore_done_notify_cb = __fr_restore_done_cb,
    };

    tuya_gw_replacement_register_cb( &__gw_rf_cbs );
}

FAQs

  • In scenarios involving gateway replacement, if the system prompts “No sub-devices available for migration”, first verify that the input is correct. Suppose it is confirmed on site that the faulty gateway previously had connected sub-devices. In that case, it is highly likely that the old gateway never backed up its configuration to the cloud. Make sure the gateway remains online continuously for at least 15 minutes.

    When the gateway log displays Never back up, need backup, this indicates that the gateway has detected its configuration was never successfully backed up to the cloud, triggering the automatic backup mechanism. The first configuration sync will be executed after 15 minutes.

  • If the Replace Faulty Gateway option is missing in the gateway app’s settings page, you need to enable the fault replacement interface in the SDK demo.