Kdump is a kernel feature which is used to capture crash dumps when the system or kernel crash. For enabling kdump we have to reserve some portion of physical RAM which will be used to execute kdump kernel in the event of kernel panic or crash.
When a kernel crash or kernel panic occurs then running kernel runs ‘kexec(kdump kernel)‘ and it loads kdump kernel from reserve memory and then contents of RAM and Swap is copied to vmcore file either on local disk or on remote disk and finally reboot the box.
By analyzing the crash dumps we can find the reason or the root case of system failure. If you have OS support then you can share the crash dumps to the vendor for analysis.
In this article we will demonstrate how to enable kdump on RHEL 7 and CentOS 7
Step:1 Install ‘kexec-tools’ using yum command
Use the below yum command to install ‘kexec-tools’ package in case it is not installed.
[root@cloud ~]# yum install kexec-tools
Step:2 Update the GRUB2 file to Reserve Memory for Kdump kernel
Edit the GRUB2 file (/etc/default/grub), add the parameter ‘crashkernel=<Reserved_size_of_RAM>‘ in the line beginning with ‘GRUB_CMDLINE_LINUX‘
GRUB_CMDLINE_LINUX="rd.lvm.lv=centos/swap vconsole.font=latarcyrheb-sun16 rd.lvm.lv=centos/root crashkernel=128M vconsole.keymap=us rhgb quiet"
Execute the below command to regenerate grub2 configuration.
[root@cloud ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
In case of UEFI firmware, use the below command
[root@cloud ~]# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
Above command will inform bootlaoder to reserve 128 MB RAM after reboot.
Reboot the box now using below command :
[root@cloud ~]# shutdown -r now
Step:3 Update the dump location & default action in the file (/etc/kdump.conf)
To store crash dump or vmcore file on a local file system, edit the file ‘/etc/kdump.conf‘ and specify the location as per your setup. In my case i am using a separate local file system ( /var/crash). It is recommended that size of file system should be equivalent to the size of your system’s RAM or file system should have free space equivalent to the size of RAM. Kdump allows to compress the dump data using ‘core collector’ option (core_collector makedumpfile -c ) where -c is used for compression.
In case if kdump fails to store the dump file to specified location then default action will be performed which is mention in the default directive. In my case default action is reboot.
Update the below three directives in kdump.conf file.
[root@cloud ~]# vi /etc/kdump.conf path /var/crash core_collector makedumpfile -c default reboot
Different Options to store dump :
Step:4 Start and enable kdump service
[root@cloud ~]# systemctl start kdump.service [root@cloud ~]# systemctl enable kdump.service [root@cloud ~]#
Step:5 Now Test Kdump by manually crashing the system
Before crashing your system , please verify whether the kdump service is running or not using below command.
[root@cloud crash]# systemctl is-active kdump.service [root@cloud crash]# service kdump status
To test our kdump configuration we will manually crash our system with below commands.
[root@cloud ~]# echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
This will create a crash dump file (vmcore ) under ‘/var/crash‘ file system.
[root@cloud ~]# ls -lR /var/crash /var/crash: total 0 drwxr-xr-x. 2 root root 42 Mar 4 03:02 127.0.0.1-2016-03-04-03:02:17 /var/crash/127.0.0.1-2016-03-04-03:02:17: total 135924 -rw-------. 1 root root 139147524 Mar 4 03:02 vmcore -rw-r--r--. 1 root root 35640 Mar 4 03:02 vmcore-dmesg.txt [root@cloud ~]#
Step:6 Use ‘crash’ command to analyze and debug crash dumps
Crash is the utility or command to debug and analyze the crash dump or vmcore file.
To use the crash, make sure two packages are installed : ‘crash & kernel-debuginfo‘
[root@cloud ~]# yum install crash
To install ‘kernel-debuginfo’ package , first enable debug repo. Edit the repo file /etc/yum.repos.d/CentOS-Debuginfo.repo
change ‘enbled=0’ to ‘enabled=1’
[root@cloud ~]# yum install kernel-debuginfo
Once the kernel-debuginfo is installed , then try to execute below crash command, it will give us a crash prompt where we can run commands to find process info , list of open files when the system got crashed.
[root@cloud ~]# crash /var/crash/127.0.0.1-2016-03-04-14\:20\:06/vmcore /usr/lib/debug/lib/modules/`uname -r`/vmlinux crash>
Type ‘ps‘ command to list the Process which were running when the system got crashed.
crash> ps
To view the files that were open when system got crashed , type ‘files’ command at crash prompt.
crash> files PID: 5577 TASK: ffff88007b44f300 CPU: 0 COMMAND: "bash" ROOT: / CWD: /root FD FILE DENTRY INODE TYPE PATH 0 ffff880036b85000 ffff8800796fa540 ffff88007966f4d0 CHR /dev/pts/0 1 ffff880036b73900 ffff880068c409c0 ffff8800794a8d10 REG /proc/sysrq-trigger 2 ffff880036b85000 ffff8800796fa540 ffff88007966f4d0 CHR /dev/pts/0 10 ffff880036b85000 ffff8800796fa540 ffff88007966f4d0 CHR /dev/pts/0 255 ffff880036b85000 ffff8800796fa540 ffff88007966f4d0 CHR /dev/pts/0 crash>
Type ‘sys’ command to list the system info when it got crashed.
crash> sys KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.10.1.el7.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2016-03-04-14:20:06/vmcore CPUS: 1 DATE: Fri Mar 4 14:20:01 2016 UPTIME: 00:02:00 LOAD AVERAGE: 0.75, 0.48, 0.19 TASKS: 115 NODENAME: cloud.linuxtechi.com RELEASE: 3.10.0-327.10.1.el7.x86_64 VERSION: #1 SMP Tue Feb 16 17:03:50 UTC 2016 MACHINE: x86_64 (2388 Mhz) MEMORY: 2 GB PANIC: "SysRq : Trigger a crash" crash>
To get help of any command on crash prompt , type ‘help <command>‘ , example is shown below.
That’s conclude the article, Please don’t hesitate to share it if you have enjoyed.