In the daily process of software development, it is inevitable that we will encounter program crashes. During development and debugging, we can use tools such as GDB
or GDB Server
to reproduce the problem and perform stack tracing, making it easy to locate and solve problems.
However, for occasional crash problems in online products, handling them becomes more difficult. Although we can use Linux's backtrace function to capture the SIGSEGV signal and call the backtrace and backtrace_symbols interfaces in the signal callback to perform stack tracing, unfortunately, many toolchains have already cut this feature, so in most cases, the backtrace function cannot be used.
This article will provide two methods to solve occasional crash problems in online products.
Core Dump
Core Dump refers to saving the data and state of a program in memory in the form of a file when the program terminates abnormally. It contains a snapshot of the program's memory at the time of the crash, which can be used for subsequent debugging and analysis.
Enable Core Dump
By default, the Linux system closes coredump, which needs to be manually opened by us.
Enable the coredump function, and adjust the size of the coredump file to unlimited:
Code: Select all
$ ulimit -c
$ ulimit -c unlimited
Note that we can adjust the size limit of the coredump file according to the hardware resources.
By default, the storage path of the coredump file is the current directory, and we can modify the storage path of the coredump:
Code: Select all
echo "/data/core_%e_%p" > /proc/sys/kernel/core_pattern
The explanations for %e and %p are as follows:
Code: Select all
%p - insert pid into filename
%u - insert current uid into filename
%g - insert current gid into filename
%s - insert signal that caused the coredump into the filename
%t - insert UNIX time that the coredump occurred into filename
%h - insert hostname where the coredump happened into filename
%e - insert coredumping executable name into filename
Analyzing Core Dump
Once the coredump file is generated, we can use the gdb
tool to analyze it.
Code: Select all
$ arm-linux-gnueabihf-gdb /path/to/executable /path/to/coredump
Note that when compiling the program, the -g
option needs to be added to preserve the symbol table.
In gdb
, you can use the bt
command to view the stack traceback information and determine the location of the program crash.
Code: Select all
(gdb) bt
Crash Dump
Crash Dump is a lightweight Core Dump function provided by the TuyaOS gateway development framework, which can serve as an alternative to Core Dump.
Basic Principle
The TuyaOS gateway development framework captures the SIGSEGV signal and saves the current stack memory after alignment into a file in the signal callback. This file occupies only a few KB of storage space.
Stack traceback involves traversing all addresses, first checking whether the address is within the address space during program execution. If it is within the runtime address space, the addr2line
tool is used to convert the address into a file and function name.
How to Use
After gateway initialization, call the tuya_gw_app_debug_start
interface to enable the Crash Dump function. Here is an example code:
Code: Select all
int main(int argc, char **argv)
{
OPERATE_RET rt = OPRT_OK;
TUYA_CALL_ERR_RETURN(tuya_iot_init("./"));
TUYA_CALL_ERR_RETURN(tuya_iot_set_gw_prod_info(&prod_info));
TUYA_CALL_ERR_RETURN(tuya_iot_sdk_pre_init(TRUE));
TUYA_CALL_ERR_RETURN(tuya_iot_wr_wf_sdk_init(IOT_GW_NET_WIRED_WIFI, GWCM_OLD, WF_START_AP_ONLY, M_PID, M_SW_VERSION, NULL, 0));
TUYA_CALL_ERR_RETURN(tuya_iot_sdk_start());
tuya_gw_app_debug_start("./log_dir/");
while (1) {
tuya_hal_system_sleep(10*1000);
}
return OPRT_OK;
}
How to Parse
When a segmentation fault occurs in the program, place the saved stack information file and the program compiled with the -g
option in the same path. Then, create a new file named coredump.py
and copy the following script content into the file. Finally, execute the command for parsing:
Code: Select all
code
python3 coredump.py -d <dump_file>
Note that the above program name must be consistent with the program running on the device.
Here is the coredump.py
script:
Code: Select all
import argparse
import os
parser = argparse.ArgumentParser(description='SDK Coredump Analyzer')
parser.add_argument(
'-d', '--dump_file', required=True, type=str, help='crash dump file')
args = parser.parse_args()
sys_so = ["libc.so", "libc-", "libpthread-", "libpthread.so", "ld-", "ld.so", "stdc++", "uClibc", "libgcc"]
'''
crash dump file format:
stack dump:
00000c00 00000001 7fd10000 00000001
stack dump End
dump text section
00400000-00897000 r-xp 00000000 00:08 237597 /var/tmp/tyZ3Gw
'''
def parse_dump_file(filename):
is_stack = False
is_text = False
stack = []
text = {}
if not os.path.isfile(filename):
return stack, text
with open(filename, 'r') as f:
for line in f:
if line.find("stack dump:") != -1:
is_stack = True
continue
if line.find("stack dump End") != -1:
is_stack = False
continue
if line.find("dump text section") != -1:
is_text = True
if is_stack:
stack.extend(line.split())
if is_text and line.find("r-xp") != -1:
text_content = line.split()
if len(text_content) != 6:
print("parse text section error")
continue
addr = text_content[0]
path = text_content[-1]
filename = os.path.basename(path)
# Filter system so
is_omit = False
for so_name in sys_so:
if filename.find(so_name) != -1:
is_omit = True
break
if is_omit:
continue
addr_range = addr.split('-')
if len(addr_range) != 2:
continue
text[filename] = addr_range
return stack, text
def dump_addr2line(stack, text):
for addr in stack:
addr = int(addr, 16)
for name in text:
addr_start = int(text[name][0], 16)
addr_end = int(text[name][1], 16)
if addr >= addr_start and addr <= addr_end:
# Shared object need to offset
if name.find(".so") != -1:
addr = addr - addr_start
addr = str(hex(addr))
if not os.path.exists(name):
print("{} is not found".format(name))
break
os.system('addr2line {} -e {} -f'.format(addr, name))
break
def main():
dump_file = args.dump_file
print("crash dump file: {}".format(dump_file))
stack, text = parse_dump_file(dump_file)
dump_addr2line(stack, text)
if __name__ == '__main__':
main()