linux-kernel - Re: [PATCH V3] panic: Add sysctl to dump all CPUs backtraces on oops event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20200417174654.9af0c51afb5d9e35e5519113@linux-foundation.org>
Date:   Fri, 17 Apr 2020 17:46:54 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     "Guilherme G. Piccoli" <gpiccoli@...onical.com>
Cc:     linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
        linux-doc@...r.kernel.org, mcgrof@...nel.org,
        keescook@...omium.org, yzaikin@...gle.com, tglx@...utronix.de,
        vbabka@...e.cz, rdunlap@...radead.org, willy@...radead.org,
        kernel@...ccoli.net
Subject: Re: [PATCH V3] panic: Add sysctl to dump all CPUs backtraces on
 oops event

On Fri, 27 Mar 2020 19:41:16 -0300 "Guilherme G. Piccoli" <gpiccoli@...onical.com> wrote:

> Usually when kernel reach an oops condition, it's a point of no return;
> in case not enough debug information is available in the kernel splat,
> one of the last resorts would be to collect a kernel crash dump and
> analyze it. The problem with this approach is that in order to collect
> the dump, a panic is required (to kexec-load the crash kernel). When
> in an environment of multiple virtual machines, users may prefer to
> try living with the oops, at least until being able to properly
> shutdown their VMs / finish their important tasks.
> 
> This patch implements a way to collect a bit more debug details when an
> oops event is reached, by printing all the CPUs backtraces through the
> usage of NMIs (on architectures that support that). The sysctl added
> (and documented) here was called "oops_all_cpu_backtrace", and when
> set will (as the name suggests) dump all CPUs backtraces.
> 
> Far from ideal, this may be the last option though for users that for
> some reason cannot panic on oops. Most of times oopses are clear enough
> to indicate the kernel portion that must be investigated, but in virtual
> environments it's possible to observe hypervisor/KVM issues that could
> lead to oopses shown in other guests CPUs (like virtual APIC crashes).
> This patch hence aims to help debug such complex issues without
> resorting to kdump.
> 
> ...
>
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -513,6 +513,12 @@ static inline u32 int_sqrt64(u64 x)
>  }
>  #endif
>  
> +#ifdef CONFIG_SMP
> +extern unsigned int sysctl_oops_all_cpu_backtrace;
> +#else
> +#define sysctl_oops_all_cpu_backtrace 0
> +#endif /* CONFIG_SMP */
> +

hm, we have a ton of junk in kernel.h just to communicate between
sysctl.c and a handful of other files.  Perhaps one day someone can
move all that into a new sysctl-externs.h.