How to Debug Segmentation Faults

Gateway equipment, voice control devices, etc.


Post Reply
Kyson
Posts: 233

In the daily process of software development, it is inevitable that we will encounter program crashes. During development and debugging, we can use tools such as GDB or GDB Server to reproduce the problem and perform stack tracing, making it easy to locate and solve problems.

However, for occasional crash problems in online products, handling them becomes more difficult. Although we can use Linux's backtrace function to capture the SIGSEGV signal and call the backtrace and backtrace_symbols interfaces in the signal callback to perform stack tracing, unfortunately, many toolchains have already cut this feature, so in most cases, the backtrace function cannot be used.

This article will provide two methods to solve occasional crash problems in online products.

Core Dump

Core Dump refers to saving the data and state of a program in memory in the form of a file when the program terminates abnormally. It contains a snapshot of the program's memory at the time of the crash, which can be used for subsequent debugging and analysis.

Enable Core Dump

By default, the Linux system closes coredump, which needs to be manually opened by us.

Enable the coredump function, and adjust the size of the coredump file to unlimited:

Code: Select all

$ ulimit -c
$ ulimit -c unlimited

Note that we can adjust the size limit of the coredump file according to the hardware resources.

By default, the storage path of the coredump file is the current directory, and we can modify the storage path of the coredump:

Code: Select all

echo "/data/core_%e_%p" > /proc/sys/kernel/core_pattern

The explanations for %e and %p are as follows:

Code: Select all

%p - insert pid into filename
%u - insert current uid into filename
%g - insert current gid into filename
%s - insert signal that caused the coredump into the filename
%t - insert UNIX time that the coredump occurred into filename
%h - insert hostname where the coredump happened into filename
%e - insert coredumping executable name into filename

Analyzing Core Dump

Once the coredump file is generated, we can use the gdb tool to analyze it.

Code: Select all

$ arm-linux-gnueabihf-gdb /path/to/executable /path/to/coredump

Note that when compiling the program, the -g option needs to be added to preserve the symbol table.

In gdb, you can use the bt command to view the stack traceback information and determine the location of the program crash.

Code: Select all

(gdb) bt

Crash Dump

Crash Dump is a lightweight Core Dump function provided by the TuyaOS gateway development framework, which can serve as an alternative to Core Dump.

Basic Principle

The TuyaOS gateway development framework captures the SIGSEGV signal and saves the current stack memory after alignment into a file in the signal callback. This file occupies only a few KB of storage space.

Stack traceback involves traversing all addresses, first checking whether the address is within the address space during program execution. If it is within the runtime address space, the addr2line tool is used to convert the address into a file and function name.

How to Use

After gateway initialization, call the tuya_gw_app_debug_start interface to enable the Crash Dump function. Here is an example code:

Code: Select all

int main(int argc, char **argv)
{
    OPERATE_RET rt = OPRT_OK;

    TUYA_CALL_ERR_RETURN(tuya_iot_init("./"));

    TUYA_CALL_ERR_RETURN(tuya_iot_set_gw_prod_info(&prod_info));

    TUYA_CALL_ERR_RETURN(tuya_iot_sdk_pre_init(TRUE));

    TUYA_CALL_ERR_RETURN(tuya_iot_wr_wf_sdk_init(IOT_GW_NET_WIRED_WIFI, GWCM_OLD, WF_START_AP_ONLY, M_PID, M_SW_VERSION, NULL, 0));

    TUYA_CALL_ERR_RETURN(tuya_iot_sdk_start());

	tuya_gw_app_debug_start("./log_dir/");

    while (1) {
        tuya_hal_system_sleep(10*1000);
    }

    return OPRT_OK;
}

How to Parse

When a segmentation fault occurs in the program, place the saved stack information file and the program compiled with the -g option in the same path. Then, create a new file named coredump.py and copy the following script content into the file. Finally, execute the command for parsing:

Code: Select all

code
python3 coredump.py -d <dump_file>

Note that the above program name must be consistent with the program running on the device.

Here is the coredump.py script:

Code: Select all

import argparse
import os

parser = argparse.ArgumentParser(description='SDK Coredump Analyzer')
parser.add_argument(
    '-d', '--dump_file', required=True, type=str, help='crash dump file')
args = parser.parse_args()

sys_so = ["libc.so", "libc-", "libpthread-", "libpthread.so", "ld-", "ld.so", "stdc++", "uClibc", "libgcc"]

'''
crash dump file format:
stack dump:
00000c00 00000001 7fd10000 00000001
stack dump End
dump text section
00400000-00897000 r-xp 00000000 00:08  237597    /var/tmp/tyZ3Gw
'''
def parse_dump_file(filename):
    is_stack = False
    is_text = False
    stack = []
    text = {}

    if not os.path.isfile(filename):
        return stack, text

    with open(filename, 'r') as f:
        for line in f:
            if line.find("stack dump:") != -1:
                is_stack = True
                continue

            if line.find("stack dump End") != -1:
                is_stack = False
                continue

            if line.find("dump text section") != -1:
                is_text = True

            if is_stack:
                stack.extend(line.split())

            if is_text and line.find("r-xp") != -1:
                text_content = line.split()
                if len(text_content) != 6:
                    print("parse text section error")
                    continue

                addr = text_content[0]
                path = text_content[-1]
                filename = os.path.basename(path)

                # Filter system so
                is_omit = False
                for so_name in sys_so:
                    if filename.find(so_name) != -1:
                        is_omit = True
                        break

                if is_omit:
                    continue

                addr_range = addr.split('-')
                if len(addr_range) != 2:
                    continue

                text[filename] = addr_range

    return stack, text

def dump_addr2line(stack, text):
    for addr in stack:
        addr = int(addr, 16)
        for name in text:
            addr_start = int(text[name][0], 16)
            addr_end = int(text[name][1], 16)
            if addr >= addr_start and addr <= addr_end:
                # Shared object need to offset
                if name.find(".so") != -1:
                    addr = addr - addr_start
                addr = str(hex(addr))
                if not os.path.exists(name):
                    print("{} is not found".format(name))
                    break
                os.system('addr2line {} -e {} -f'.format(addr, name))
                break

def main():
    dump_file = args.dump_file
    print("crash dump file: {}".format(dump_file))
    stack, text = parse_dump_file(dump_file)
    dump_addr2line(stack, text)

if __name__ == '__main__':
    main()
Post Reply