linux-kernel - Re: [PATCH V2] panic: Add sysctl/cmdline to dump all CPUs backtraces on oops event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202003241137.A90B14A@keescook>
Date:   Tue, 24 Mar 2020 11:37:19 -0700
From:   Kees Cook <keescook@...omium.org>
To:     "Guilherme G. Piccoli" <gpiccoli@...onical.com>
Cc:     linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        mcgrof@...nel.org, yzaikin@...gle.com, tglx@...utronix.de,
        akpm@...ux-foundation.org, linux-api@...r.kernel.org,
        rdunlap@...radead.org, willy@...radead.org, kernel@...ccoli.net
Subject: Re: [PATCH V2] panic: Add sysctl/cmdline to dump all CPUs backtraces
 on oops event

On Mon, Mar 23, 2020 at 07:30:35PM -0300, Guilherme G. Piccoli wrote:
> Usually when kernel reach an oops condition, it's a point of no return;
> in case not enough debug information is available in the kernel splat,
> one of the last resorts would be to collect a kernel crash dump and
> analyze it. The problem with this approach is that in order to collect
> the dump, a panic is required (to kexec-load the crash kernel). When
> in an environment of multiple virtual machines, users may prefer to
> try living with the oops, at least until being able to properly
> shutdown their VMs / finish their important tasks.
> 
> This patch implements a way to collect a bit more debug details when an
> oops event is reached, by printing all the CPUs backtraces through the
> usage of NMIs (on architectures that support that). The sysctl/kernel
> parameter added (and documented) here was called "oops_all_cpu_backtrace"
> and when set will (as the name suggests) dump all CPUs backtraces.
> 
> Far from ideal, this may be the last option though for users that for
> some reason cannot panic on oops. Most of times oopses are clear enough
> to indicate the kernel portion that must be investigated, but in virtual
> environments it's possible to observe hypervisor/KVM issues that could
> lead to oopses shown in other guests CPUs (like virtual APIC crashes).
> This patch hence aims to help debug such complex issues without
> resorting to kdump.
> 
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@...onical.com>

Reviewed-by: Kees Cook <keescook@...omium.org>

-Kees

> ---
> 
> 
> V2: Implemented grammar suggestions from Randy, Andrew and
> Matthew. Thanks in advance for the reviews!
> Cheers,
> 
> Guilherme
> 
> 
>  .../admin-guide/kernel-parameters.txt         |  8 +++++++
>  Documentation/admin-guide/sysctl/kernel.rst   | 17 +++++++++++++++
>  include/linux/kernel.h                        |  6 ++++++
>  kernel/panic.c                                | 21 +++++++++++++++++++
>  kernel/sysctl.c                               | 11 ++++++++++
>  5 files changed, 63 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 7a14caac6c94..7db622028c00 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3333,6 +3333,14 @@
>  			This will also cause panics on machine check exceptions.
>  			Useful together with panic=30 to trigger a reboot.
>  
> +	oops_all_cpu_backtrace=
> +			[KNL] Should kernel generate backtraces on all cpus
> +			when oops occurs - this should be a last measure resort
> +			in case	a kdump cannot be collected, for example.
> +			Defaults to 0 and can be controlled by the sysctl
> +			kernel.oops_all_cpu_backtrace.
> +			Format: <integer>
> +
>  	page_alloc.shuffle=
>  			[KNL] Boolean flag to control whether the page allocator
>  			should randomize its free lists. The randomization may
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 8b4ff69d2348..8660001d3a3e 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -57,6 +57,7 @@ show up in /proc/sys/kernel:
>  - msgmnb
>  - msgmni
>  - nmi_watchdog
> +- oops_all_cpu_backtrace
>  - osrelease
>  - ostype
>  - overflowgid
> @@ -572,6 +573,22 @@ numa_balancing_scan_size_mb is how many megabytes worth of pages are
>  scanned for a given scan.
>  
>  
> +oops_all_cpu_backtrace:
> +================
> +
> +If this option is set, the kernel will send an NMI to all CPUs to dump
> +their backtraces when an oops event occurs. It should be used as a last
> +resort in case a panic cannot be triggered (to protect VMs running, for
> +example) or kdump can't be collected. This file shows up if CONFIG_SMP
> +is enabled.
> +
> +0: Won't show all CPUs backtraces when an oops is detected.
> +This is the default behavior.
> +
> +1: Will non-maskably interrupt all CPUs and dump their backtraces when
> +an oops event is detected.
> +
> +
>  osrelease, ostype & version:
>  ============================
>  
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 0d9db2a14f44..6cd00257b572 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -513,6 +513,12 @@ static inline u32 int_sqrt64(u64 x)
>  }
>  #endif
>  
> +#ifdef CONFIG_SMP
> +extern unsigned int sysctl_oops_all_cpu_backtrace;
> +#else
> +#define sysctl_oops_all_cpu_backtrace 0
> +#endif /* CONFIG_SMP */
> +
>  extern void bust_spinlocks(int yes);
>  extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in progress */
>  extern int panic_timeout;
> diff --git a/kernel/panic.c b/kernel/panic.c
> index b69ee9e76cb2..73c340418575 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -36,6 +36,24 @@
>  #define PANIC_TIMER_STEP 100
>  #define PANIC_BLINK_SPD 18
>  
> +#ifdef CONFIG_SMP
> +/*
> + * Should we dump all CPUs backtraces in an oops event?
> + * Defaults to 0, can be changed either via cmdline or sysctl.
> + */
> +unsigned int __read_mostly sysctl_oops_all_cpu_backtrace;
> +
> +static int __init oops_backtrace_setup(char *str)
> +{
> +	int rc = kstrtouint(str, 0, &sysctl_oops_all_cpu_backtrace);
> +
> +	if (rc)
> +		return rc;
> +	return 1;
> +}
> +__setup("oops_all_cpu_backtrace=", oops_backtrace_setup);
> +#endif /* CONFIG_SMP */
> +
>  int panic_on_oops = CONFIG_PANIC_ON_OOPS_VALUE;
>  static unsigned long tainted_mask =
>  	IS_ENABLED(CONFIG_GCC_PLUGIN_RANDSTRUCT) ? (1 << TAINT_RANDSTRUCT) : 0;
> @@ -515,6 +533,9 @@ void oops_enter(void)
>  	/* can't trust the integrity of the kernel anymore: */
>  	debug_locks_off();
>  	do_oops_enter_exit();
> +
> +	if (sysctl_oops_all_cpu_backtrace)
> +		trigger_all_cpu_backtrace();
>  }
>  
>  /*
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 238f268de486..1ac31d9d5b7e 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -813,6 +813,17 @@ static struct ctl_table kern_table[] = {
>  		.proc_handler	= proc_dointvec,
>  	},
>  #endif
> +#ifdef CONFIG_SMP
> +	{
> +		.procname	= "oops_all_cpu_backtrace",
> +		.data		= &sysctl_oops_all_cpu_backtrace,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= SYSCTL_ONE,
> +	},
> +#endif /* CONFIG_SMP */
>  	{
>  		.procname	= "pid_max",
>  		.data		= &pid_max,
> -- 
> 2.25.1
> 

-- 
Kees Cook