Problem Diagnosis

Last Updated on : 2023-09-06 10:40:14download

TuyaOS provides a Core Dump tool to capture the stack trace when a segmentation fault (segfault) occurs, helping you find the function call that caused the problem.

This tool is only applicable to the chip platforms that Tuya has adapted. Its usage on non-adapted platforms is not verified. It is recommended to simulate segfaults to test if this tool works for your platform. If it does not work, use your platform-specific debugger instead.

This topic describes how to implement the problem diagnosis feature with TuyaOS Gateway Development Framework.

Background

A device might be restarted due to a segfault in a program. If a segfault occurs during the running of business logic, the corresponding feature might not work. In some cases, exceptions are not found in the testing stages but after going live.

For devices running on Linux, you can enable core dumps to create a .core file when various errors occur. You can use GDB to read this file to track function calls and identify the line of code that caused the problem. However, accumulative .core files will use a lot of storage space. Generally, for IoT devices, core dumps are enabled only in the debugging stages.

Given this, TuyaOS provides a lightweight Core Dump tool to help you track down program errors on deployed devices.

How it works

The SDK handles exception signals. When an exception in the program occurs, the SDK saves the stack trace to a file that is only a few KB in size. You can use the Core Dump tool to examine the stack trace and determine the function that caused the problem by reviewing the function at the top of the stack and the function call stack.

Generally, the executable program does not come with a symbol table. But, be sure to use the -g option to compile an additional program with debugging information that can be read by the Core Dump tool for stack trace analysis.

Development guide

After gateway initialization, call tuya_gw_app_debug_start to enable the problem diagnosis feature to capture the stack trace when exceptions occur in the program.

When you implement local logs in Log Management, package the stack trace file into the log file so that you can get the stack information at the time of a crash.

Example

int main(int argc, char **argv)
{
    OPERATE_RET rt = OPRT_OK;

    // TuyaOS
    TUYA_CALL_ERR_RETURN(tuya_iot_init("./"));

    // Set authorization information.
    TUYA_CALL_ERR_RETURN(tuya_iot_set_gw_prod_info(&prod_info));

    // Gateway pre-initialization.
    TUYA_CALL_ERR_RETURN(tuya_iot_sdk_pre_init(TRUE));

    // Gateway initialization.
    TUYA_CALL_ERR_RETURN(tuya_iot_wr_wf_sdk_init(IOT_GW_NET_WIRED_WIFI, GWCM_OLD, WF_START_AP_ONLY, M_PID, M_SW_VERSION, NULL, 0));

    // Gateway startup.
    TUYA_CALL_ERR_RETURN(tuya_iot_sdk_start());

    // Enable problem diagnosis. The parameter is the directory where the stack trace file resides.
	tuya_gw_app_debug_start("./log_dir/");

    while (1) {
        tuya_hal_system_sleep(10*1000);
    }

    return OPRT_OK;
}

When a segfault occurs, copy the stack trace file and the program compiled with debug information to the directory of the Core Dump tool and run Core Dump for analysis. Note that the name of the debugging program must be the same as the executable program.

Run the following command to use Core Dump:

python3 coredump.py -d <dump file>

Source code

import argparse
import os

parser = argparse.ArgumentParser(description='SDK Coredump Analyzer')
parser.add_argument(
    '-d', '--dump_file', required=True, type=str, help='crash dump file')
args = parser.parse_args()

sys_so = ["libc.so", "libc-", "libpthread-", "libpthread.so", "ld-", "ld.so", "stdc++", "uClibc", "libgcc"]

'''
crash dump file format:
stack dump:
00000c00 00000001 7fd10000 00000001
stack dump End
dump text section
00400000-00897000 r-xp 00000000 00:08  237597    /var/tmp/tyZ3Gw
'''
def parse_dump_file(filename):
    is_stack = False
    is_text = False
    stack = []
    text = {}

    if not os.path.isfile(filename):
        return stack, text

    with open(filename, 'r') as f:
        for line in f:
            if line.find("stack dump:") != -1:
                is_stack = True
                continue

            if line.find("stack dump End") != -1:
                is_stack = False
                continue

            if line.find("dump text section") != -1:
                is_text = True

            if is_stack:
                stack.extend(line.split())

            if is_text and line.find("r-xp") != -1:
                text_content = line.split()
                if len(text_content) != 6:
                    print("parse text section error")
                    continue

                addr = text_content[0]
                path = text_content[-1]
                filename = os.path.basename(path)

                # Filter system so
                is_omit = False
                for so_name in sys_so:
                    if filename.find(so_name) != -1:
                        is_omit = True
                        break

                if is_omit:
                    continue

                addr_range = addr.split('-')
                if len(addr_range) != 2:
                    continue

                text[filename] = addr_range

    return stack, text

def dump_addr2line(stack, text):
    for addr in stack:
        addr = int(addr, 16)
        for name in text:
            addr_start = int(text[name][0], 16)
            addr_end = int(text[name][1], 16)
            if addr >= addr_start and addr <= addr_end:
                # Shared object need to offset
                if name.find(".so") != -1:
                    addr = addr - addr_start
                addr = str(hex(addr))
                if not os.path.exists(name):
                    print("{} is not found".format(name))
                    break
                os.system('addr2line {} -e {} -f'.format(addr, name))
                break

def main():
    dump_file = args.dump_file
    print("crash dump file: {}".format(dump_file))
    stack, text = parse_dump_file(dump_file)
    dump_addr2line(stack, text)

if __name__ == '__main__':
    main()

Example of parsing

kyson@LAPTOP-ORFJBPHU:~/workspace/tuya/tools/crash_dump$ python3 coredump.py -d 959_user_iot_1645100484
crash dump file: 959_user_iot_1645100484
__start
??:?
sig_proc
/root/workspace_temp/EmbedSDKs/ty_gw_zigbee_ext_sdk/ty_gw_zigbee_ext_sdk/sdk/svc_linux_crash_dump/src/crash_dump.c:287
??
??:0
emberAfSendDefaultResponseWithCallback
/root/workspace_temp/EmbedSDKs/ty_gw_zigbee_ext_sdk/ty_gw_zigbee_ext_sdk/sdk/zigbee_host/slabs/v2.2/protocol/zigbee/app/framework/util/util.c:764
__start
??:?
...

Stack trace analysis prints the stack trace at the time of a segfault. In the print output, you can focus on the function at the top of the stack and review other information as needed. The above example shows the segfault occurs in the function emberAfSendDefaultResponseWithCallback. With the context, you can then identify the line of code that caused the segfault.