Skip to content

CS基础-如何开启 coredump

1. coredump 是什么

  • coredump 是程序意外终止时操作系统创建的文件,包含了程序崩溃时的内存快照和其他重要信息。
  • 它包含了程序崩溃时的内存内容和 CPU 寄存器状态。
  • 必须同时满足以下两个要素:
    • 程序接收到某些信号(如 SIGSEGV、SIGABRT)
    • 操作系统配置允许生成 coredump

2. 开启 coredump

2.1 为 core file 配置可写入目录

  • 既然需要生成 core file, 为了方便起见,最好将其指定为特定目录。
  • 此处,假定该目录为 ${HOME}/corefile
sh
mkdir ${HOME}/corefile

2.2 调整内核 ulimit

  • 该操作的本质目的是允许操作系统记录 coredump file,并且对其上限不做任何限制
  • 此命令的作用范围是当前 shell session 期间,为了长久生效,建议将其写入 .zshrc/.bashrc 当中。
  • 命令如下:
sh
# change
ulimit -c unlimited

# check
ulimit -c 
# if succeed, should be unlimited

2.3 调整内核 core pattern

  • 此操作的目的在于指定 corefile 的格式和路径
  • 使用 sysctl -w 亦可达到效果,不过,还是建议显式修改文件。
  • 修改 /etc/sysctl.conf 文件,在行尾添加如下内容:kernel.core_pattern = [your_path]/corefile/my-core-%e-%p-%t请将 [your_path] 替换为对应的路径
  • 添加完毕后,/etc/sysctl.conf 基本格式如下:
sh
# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
net.ipv6.conf.all.disable_ipv6=0
net.ipv6.conf.default.disable_ipv6=0
net.ipv6.conf.lo.disable_ipv6=0
kernel.printk = 5
kernel.sysrq = 1
kernel.core_pattern = [your_path]/corefile/my-core-%e-%p-%t

2.4 使 sysctl 生效

TIP

  • sysctl 是什么:允许系统管理员在运行时动态地修改内核参数,而无需重新编译内核或重启系统。
  • sysctl 常见参数: -n:打印值时不打印关键字; -e:忽略未知关键字错误; -N:仅打印名称; -w:当改变sysctl设置时使用此项; -p:从配置文件 /etc/sysctl.conf 加载内核参数设置; -a:打印当前所有可用的内核参数变量和值; -A:以表格方式打印当前所有可用的内核参数变量和值。

  • 为了开启 coredump,可以通过如下命令,使之生效:
sh
# load /etc/sysctl.conf
sysctl -p

3. 配合 GDB 调试

  • 通过以上步骤,可以顺利开启 coredump,接下来,需要通过一个具体的例子,介绍其工作机制。

3.1 准备预期报错的文件

  • 该文件的用途:内置一个 fault,将其写入 coredump file
  • 文件内容如下:
c
# test.c

static int g_t1;
static int g_t2 = 2;

int div(int div_i, int div_j) {
    int a4, b4;
    char *c4;
    
    a4 = div_i + 3;
    b4 = div_j + 3;
    c4 = "divf";

    return (div_i / div_j);
}

int sub(int sub_i, int sub_j) {
    int a3, b3;
    char *c3;

    a3 = sub_i + 2;
    b3 = sub_j + 2;
    c3 = "subf";
    div(a3, 0);    // error here

    return (sub_i - sub_j);
}

int add(int add_i, int add_j) {
    int a2, b2;
    char *c2;

    a2 = add_i + 1;
    b2 = add_j + 1;
    c2 = "addf";
    sub(a2, b2);

    return (add_i + add_j);
}

int main(int argc, char *argv[]) {
    int a1, b1;
    char *c1;
    static int u1;
    static int x1 = 1;

    a1 = 1;
    b1 = 0;
    c1 = "main function";

    add(a1, b1);

    return 0;
}
  • 编译该文件,得到可执行文件 test
sh
gcc -g -o test test.c
  • 执行该文件崩溃后,会得到 core-file,该文件名称如:`[your_path]/corefile/my-core-new-2172110-1722830165

3.2 配合 GDB 进行调试

  • 命令格式为 GDB [bin] [corefile]
sh
gdb ./test [your_path]/corefile/my-core-new-2172110-1722830165
  • 该文件在 GDB 看来,是一个静态的场景,可以直接查看其调用栈:
sh
# gdb ./test [your_path]/corefile/my-core-new-2172110-1722830165

GNU gdb (GDB) Red Hat Enterprise Linux 9.2-4.tl3
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test...
[New LWP 2172110]
Core was generated by `./test'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0  0x000000000040112e in div (div_i=4, div_j=0) at test.c:13
13          return (div_i / div_j);

(gdb) bt
#0  0x000000000040112e in div (div_i=4, div_j=0) at new.c:13
#1  0x000000000040116a in sub (sub_i=2, sub_j=1) at new.c:24
#2  0x00000000004011a9 in add (add_i=1, add_j=0) at new.c:37
#3  0x00000000004011e7 in main (argc=1, argv=0x7ffda69d2d08) at new.c:53
  • 至此,coredump 调试链路基本走通