We’ve all been there: you build your application, run it, and see it crash: possibly with an error message like Segmentation fault (core dumped)
.
The (core dumped)
part is what we’ll discover today. Most of the information on this page is documented more concisely in the core dump manual.
What is a Core Dump?
When an application is launched, it becomes a process that is tracked by the operating system, and uses up memory (RAM) to store its instructions (compiled code) and data (the variables in your code). A core dump is a file containing the memory that a process was using at the moment it crashed.
Similar to how an application can be run through a debugger (like gdb) to add breakpoints and inspect values, a core dump can be analyzed through a debugger to see what was going on at the moment the process crashed.
Generating Core Dumps
If you’re lucky, when you get a message saying something like Aborted (core dumped)
, you’ll have a file called core
in your current working directory.
Rarely are you so lucky.
When an application would like to generate a core dump, there are a few things that need to happen:
- The process needs to abort or be killed by a signal that generates a core dump
- The kernel needs to be configured to generate the core dumps
- The shell that launched the process needs to have a non-zero max core file size
- The process needs to be able to write to the corresponding core dump file
Testing the Core Dump Generation
An easy way to check whether core dumps are generated is to run sleep 60
and kill it by pressing Ctrl+\. That key combination will send a SIGQUIT
signal, which is similar to the SIGINT
signal sent by pressing Ctrl+C, except that SIGINT
doesn’t generate a core dump. If a core dump appears in your current directory (likely something called core
), then you probably wouldn’t be reading this.
Signals Generating Core Dumps
A process may manually generate a core dump by calling the abort
function in the standard C library. If you wish to keep your own process running, you can call fork
to create an identical child process and abort
that one instead.
A process may also generate a core dump if it was killed by a signal that generates core dumps. Below are some common signals with my commentary:
SIGSEGV Segmentation fault; likely some invalid pointer dereference
SIGABRT Calling abort(), or not handling an exception in C++
SIGQUIT Quit from keyboard (Ctrl+\)
For a full list of signals that generate core dumps, refer to the signals man page.
Configuring kernel.core_pattern
The Linux Kernel has some parameters that can be modified at runtime. You can view all of these parameters by running sysctl -a
.
The parameter that configures where core dump files go is kernel.core_pattern
(documentation). You can find where it goes by running sysctl kernel.core_pattern
(or by reading the /proc/sys/kernel/core_pattern
file). By default, it should be set to core
, meaning that if an application produces a core dump it will appear in your current working directory as a file named core
. However, the default value is usually modified by the Linux distro being used.
In any case, you can restore the default value (core
) by modifying kernel.core_pattern
in one of two ways:
- You can run
sudo sysctl kernel.core_pattern=core
- You can overwrite the file
/proc/sys/kernel/core_pattern
. This requiressudo
.
I personally recommend setting the pattern to core-%e-%t
, which appends the application name (%e
) and a timestamp (%t
). This makes it easier to organize and identify a core dump when dealing with many of them.
Since this is a kernel parameter, the sudo sysctl
command will need to be run outside of any docker containers. If the core file pattern refers to an absolute path, it will be the absolute path in the host machine, not in a container.
To set this pattern permanently, you can add the line kernel.core_pattern = core-%e-%t
to /etc/sysctl.conf
, or to another .conf file under /etc/sysctl.d/
.
Configuring ulimit
The kernel.core_pattern
parameter was a kernel-wide limit, affecting everything. There are also some per-user or per-process limits. These limits can be found by running ulimit -a
(or by running cat /proc/$$/limits
). ulimit
is not (and cannot be) an executable: it is a shell builtin command because it needs to modify the running shell, and is documented for bash here. There is more info on ulimit
here. The csh
shell uses a builtin called limit
, and the syntax is slightly different, see here.
There is a concept of a hard limit and a soft limit. The soft limit is the effective limitation, and can be configured by the user to be increased up to a maximum of the hard limit. Privileged (root) processes can increase or decrease the hard limit. Non-privileged processes can decrease the hard limit, but usually cannot increase it back to the original hard limit. You can add a -H
or -S
to ulimit
to print the hard or soft limit respectively.
The limit relevant to core dumps is the “core file size” limit, and can be printed by running ulimit -c
. If this prints out 0, then core dumps will not be generated. You can increase the limit to unlimited by running ulimit -c unlimited
. Do not use sudo
. Fortunately, the hard limit is usually already set to unlimited
.
ulimit
settings are inherited by sub-shells, but they do not persist, nor do they affect parent processes. To set limits permanently, you can modify /etc/security/limits.conf
.
Possible causes and solutions of permission errors are:
- You are trying to set the soft limit higher than the hard limit
- Solution: Increase the hard limit first
- You are trying to increase the hard limit but are not root
- Solution: Increase the hard limits for your user by modifying /etc/security/limits.conf (and rebooting), or run your application as root
- Reminder:
root
in a docker container does not mean that you haveroot
permissions in the host.
Other Possible Issues
Fixing the kernel.core_pattern
and ulimit -c
settings solve most problems. If core dumps are still not being generated, there may be issues during the writing of the core file itself. For example, the destination directory may not be writable. This can happen if the directory does not exist, if it has a missing write permission, if the filesystem ran out of space or if it was mounted without write permissions. Another reason could be if the destination file already exists and cannot be overwritten.
There are some other edge cases that are covered in the core dump manual.
Debugging Core Dumps
Lets use the following code sample to generate a core dump and debug it:
#include <stdio.h>
void foo() {
int* ptr = NULL;
printf("*ptr = %d\n", *ptr);
}
int main() {
foo();
return 0;
}
We’ll walk through different setups to get a full fledged debug experience.
Missing Debug Symbols
If you build your application by running gcc -o app main.c
and run it (./app
), it should immediately print Segmentation fault (core dumped)
. If it doesn’t, or if you cannot find where the core file was dumped, follow the instructions in the first section to generate them.
With a core file named core
, you can use the gdb debugger to inspect the core dump by running gdb app core
. You should get something like the following:
Reading symbols from app...
(No debugging symbols found in app)
...
Core was generated by `./app'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005633ff9aa161 in foo ()
Notice that gdb
is able to determine that the program crashed when running a function called foo
. If you type in backtrace
at the gdb prompt, you should see where this function got called:
(gdb) backtrace
#0 0x00005633ff9aa161 in foo ()
#1 0x00005633ff9aa18e in main ()
So foo
was called from main
. This is good information, but we can do better if we resolve gdb’s warning at the start: No debugging symbols found in app
.
Adding Debug Symbols
If you now compile your program by adding the -g compile flag like so: gcc -g -o app main.c
, you will now have debug symbols in your debugger. If you’re using CMake, you should use the -DCMAKE_BUILD_TYPE=Debug
or -DCMAKE_BUILD_TYPE=RelWithDebInfo
options to enable debug information.
With debug information in your application (you don’t need to regenerate another core dump), gdb app core
will print:
Reading symbols from app...
warning: exec file is newer than core file.
...
Core was generated by `./app'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000562332807161 in foo () at main.c:5
5 printf("*ptr = %d\n", *ptr);
Now, we can see exactly which line was running (main.c:5
) when the program crashed. Getting the stack trace also provides more information:
(gdb) backtrace
#0 0x0000562332807161 in foo () at main.c:5
#1 0x000056233280718e in main () at main.c:9
This is the ideal setup.
Copying Another Environment
Sometimes, you may receive a core dump generated in another environment. If the environment is too different (e.g. different hardware/platform), then you will need to use a cross-compiled gdb, which is beyond this scope. However, if the environment is only slightly different, like another distro running on the same platform, or a docker container running on the same host, then you should be able to debug those core dumps as well. We’ll call the other environment the target environment, and the debugging environment (where you’ll run gdb) the host environment.
Not only do you need to retrieve the core dump (lets call it core-target
) from the target environment, you also need to retrieve the application (app-target
) from that target if you’re not able to generate the exact same one locally. Sometimes, you may be able to run gdb app-target core-target
and begin debugging immediately. Other times, you won’t be so lucky:
Reading symbols from app...
...
Core was generated by `./app'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000557f95e9a149 in ?? ()
(gdb) backtrace
#0 0x0000557f95e9a149 in ?? ()
#1 0x0000557f95e9a180 in ?? ()
#2 0x0000000000000000 in ?? ()
gdb doesn’t know how to resolve any of those symbols. Quick note: another reason for gdb not being able to resolve any symbols is if the core dump does not match the application, i.e. you are trying to debug a core dump for app1 as if it came from app2, or even from a different version of app1. Be careful.
Applications run code from more than just the application that was built: they also dynamically load a number of shared libraries. You can copy those shared libraries to a local directory and inform gdb to look in that directory by running set sysroot ${directory}
and set solib-search-path ${directory}
.
For example, to debug an application that crashes in a docker container, you may need to run something like:
CONTAINER_NAME=MyContainer
LOCAL_SYSROOT=sysroot
LIB_PATHS="/lib/. /lib64/. /usr/lib/. /usr/lib64/. /usr/local/lib/."
mkdir -p ${LOCAL_SYSROOT}
for LIB_PATH in ${LIB_PATHS}; do
mkdir -p ${LOCAL_SYSROOT}${LIB_PATH}
docker cp ${CONTAINER_NAME}:${LIB_PATH} ${LOCAL_SYSROOT}${LIB_PATH}
done
# ...
gdb -ex "set sysroot ${LOCAL_SYSROOT}" -ex "set solib-search-path ${LOCAL_SYSROOT}" app core
Adjust the LIB_PATHS to the library paths on your specific system. Don’t forget to inform gdb
of the new sysroot
by running the set sysroot
and set solib-search-path
commands mentioned above.
Conclusion
Core dumps are a handy way of conducting a post-mortem analysis to debug a crashing program. We’ve gone over how to generate core dumps, and how to begin analyzing them in gdb. We did not go over the various features supported by gdb: there are enough tutorials for that going around.