Health Monitoring

Last Updated on : 2024-01-18 07:23:53download

This topic describes the device state monitoring and regular health checks. The framework will inform the application of any anomalies detected on the device. The application then executes the designated actions. For example, it can send alerts to stakeholders or perform automatic repairs, such as reset. Health monitoring is essential for ensuring the reliability and stability of your services.

Features

Types of health monitoring metrics

Query

Regularly query key metrics such as memory usage and queue depth.
Event

Monitor anomalies and errors, such as HTTP API call errors.

The event-based monitoring will subscribe to a public event ID using the framework’s event service. This event will be published when an exception occurs, with the input parameter being the index of the monitoring metric.
```
#define EVENT_HEALTH_ALERT      "health.alert"      // Public event ID for health monitoring 
```

System health monitoring metrics

The framework comes with metrics for system health monitoring to check if the framework runs properly. When you call the health monitoring initialization API, these metrics will be automatically added to the monitoring list.

Macro definition	Metrics	Type	Exception count (threshold)	Monitoring period (seconds)	Exception	Exception handling
`HEALTH_RULE_FREE_MEM_SIZE`	Free memory	Query	1	600	Memory less than 8 KB	Restart the device
`HEALTH_RULE_MAX_MEM_SIZE`	The maximum memory that can be requested at a time	Query	1	600	/	/
`HEALTH_RULE_ATOP_REFUSE`	Access to cloud API denied	Event	5	/	/	/
`HEALTH_RULE_ATOP_SIGN_FAILED`	Sign error in access to cloud API	Event	5	/	/	/
`HEALTH_RULE_WORKQ_DEPTH`	Queue depth	Query	1	600	Queue depth exceeds 50	Print system queue information for troubleshooting
`HEALTH_RULE_MSGQ_NUM`	Number of message queues	Query	1	600	Queue depth exceeds 50	Print message queue information for troubleshooting
`HEALTH_RULE_TIMER_NUM`	Number of software timers	Query	1	600	The number exceeds 100	/
`HEALTH_RULE_FEED_WATCH_DOG`	Feeding the watchdog	Query	0	20	/	/
`HEALTH_RULE_RUNTIME_REPT`	Report real-time device status to the cloud: timestamp, daylight saving time, free memory, and signal strength.	Query	0	3,600	/	/

How it works

Query

Event

Development guide

How to use

The framework will call the API for health monitoring initialization during system service initialization. The query and notification callbacks will be automatically registered and handled accordingly. In other words, you only need to call the system service initialization API.
To manage custom metrics, call the respective API to add, delete, or update metrics.

API description

Initialize health monitoring service

Create a thread for health monitoring and register the health metrics. This API will be called during system service initialization.

/**
 * @brief devos health init function
 *
 * @return OPRT_OK on success. Others on error, please refer to tuya_error_code.h
 */
INT_T tuya_devos_health_init_and_start();

Add custom metrics

typedef VOID (*health_notify_cb)();
typedef BOOL_T(*health_query_cb)();

/**
 * @brief add health item
 *
 * @param[in] threshold: the threshold
 * @param[in] period: the period
 * @param[in] query: query cb
 * @param[in] notify: notify cb
 *
 * @return type id, success when large than 0,others failed
 */
INT_T tuya_devos_add_health_item(UINT_T threshold,UINT_T period,health_query_cb query,health_notify_cb notify);

Delete metrics

/**
 * @brief delete health item
 *
 * @param[in] type: the type
 *
 */
VOID tuya_devos_delete_health_item(INT_T type);

Update monitoring period

/**
 * @brief update health item period
 *
 * @param[in] type: the type
 * @param[in] period: the period
 *
 */
VOID tuya_devos_update_health_item_period(INT_T type,UINT_T period);

Update exception count threshold

/**
 * @brief update health item threshold
 *
 * @param[in] type: the type
 * @param[in] threshold: the threshold
 *
 */
VOID tuya_devos_update_health_item_threshold(INT_T type,UINT_T threshold);

Example

service_health_manager in TuyaOS example collection contains the complete example code.

FAQs

What is the priority of health monitoring?

It has the highest priority THREAD_PRIO_0.

What other features does health monitoring provide apart from anomaly detection?

You schedule watchdog feeding and runtime log uploading.

Prev DocThread Management

Next DocEvent Service