[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR21MB30252C55EB4F97F3D78021BDD7FC9@PH0PR21MB3025.namprd21.prod.outlook.com>
Date: Fri, 29 Apr 2022 17:53:58 +0000
From: "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To: "Guilherme G. Piccoli" <gpiccoli@...lia.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"bhe@...hat.com" <bhe@...hat.com>,
"pmladek@...e.com" <pmladek@...e.com>,
"kexec@...ts.infradead.org" <kexec@...ts.infradead.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"bcm-kernel-feedback-list@...adcom.com"
<bcm-kernel-feedback-list@...adcom.com>,
"coresight@...ts.linaro.org" <coresight@...ts.linaro.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
"linux-alpha@...r.kernel.org" <linux-alpha@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-leds@...r.kernel.org" <linux-leds@...r.kernel.org>,
"linux-mips@...r.kernel.org" <linux-mips@...r.kernel.org>,
"linux-parisc@...r.kernel.org" <linux-parisc@...r.kernel.org>,
"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
"linux-remoteproc@...r.kernel.org" <linux-remoteproc@...r.kernel.org>,
"linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>,
"linux-um@...ts.infradead.org" <linux-um@...ts.infradead.org>,
"linux-xtensa@...ux-xtensa.org" <linux-xtensa@...ux-xtensa.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"openipmi-developer@...ts.sourceforge.net"
<openipmi-developer@...ts.sourceforge.net>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
"sparclinux@...r.kernel.org" <sparclinux@...r.kernel.org>,
"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
"x86@...nel.org" <x86@...nel.org>,
"kernel-dev@...lia.com" <kernel-dev@...lia.com>,
"kernel@...ccoli.net" <kernel@...ccoli.net>,
"halves@...onical.com" <halves@...onical.com>,
"fabiomirmar@...il.com" <fabiomirmar@...il.com>,
"alejandro.j.jimenez@...cle.com" <alejandro.j.jimenez@...cle.com>,
"andriy.shevchenko@...ux.intel.com"
<andriy.shevchenko@...ux.intel.com>,
"arnd@...db.de" <arnd@...db.de>, "bp@...en8.de" <bp@...en8.de>,
"corbet@....net" <corbet@....net>,
"d.hatayama@...fujitsu.com" <d.hatayama@...fujitsu.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"dyoung@...hat.com" <dyoung@...hat.com>,
"feng.tang@...el.com" <feng.tang@...el.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"hidehiro.kawai.ez@...achi.com" <hidehiro.kawai.ez@...achi.com>,
"jgross@...e.com" <jgross@...e.com>,
"john.ogness@...utronix.de" <john.ogness@...utronix.de>,
"keescook@...omium.org" <keescook@...omium.org>,
"luto@...nel.org" <luto@...nel.org>,
"mhiramat@...nel.org" <mhiramat@...nel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"paulmck@...nel.org" <paulmck@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"senozhatsky@...omium.org" <senozhatsky@...omium.org>,
"stern@...land.harvard.edu" <stern@...land.harvard.edu>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"vgoyal@...hat.com" <vgoyal@...hat.com>,
vkuznets <vkuznets@...hat.com>,
"will@...nel.org" <will@...nel.org>
Subject: RE: [PATCH 24/30] panic: Refactor the panic path
From: Guilherme G. Piccoli <gpiccoli@...lia.com> Sent: Wednesday, April 27, 2022 3:49 PM
>
> The panic() function is somewhat convoluted - a lot of changes were
> made over the years, adding comments that might be misleading/outdated
> now, it has a code structure that is a bit complex to follow, with
> lots of conditionals, for example. The panic notifier list is something
> else - a single list, with multiple callbacks of different purposes,
> that run in a non-deterministic order and may affect hardly kdump
> reliability - see the "crash_kexec_post_notifiers" workaround-ish flag.
>
> This patch proposes a major refactor on the panic path based on Petr's
> idea [0] - basically we split the notifiers list in three, having a set
> of different call points in the panic path. Below a list of changes
> proposed in this patch, culminating in the panic notifiers level
> concept:
>
> (a) First of all, we improved comments all over the function
> and removed useless variables / includes. Also, as part of this
> clean-up we concentrate the console flushing functions in a helper.
>
> (b) As mentioned before, there is a split of the panic notifier list
> in three, based on the purpose of the callback. The code contains
> good documentation in form of comments, but a summary of the three
> lists follows:
>
> - the hypervisor list aims low-risk procedures to inform hypervisors
> or firmware about the panic event, also includes LED-related functions;
>
> - the informational list contains callbacks that provide more details,
> like kernel offset or trace dump (if enabled) and also includes the
> callbacks aimed at reducing log pollution or warns, like the RCU and
> hung task disable callbacks;
>
> - finally, the pre_reboot list is the old notifier list renamed,
> containing the more risky callbacks that didn't fit the previous
> lists. There is also a 4th list (the post_reboot one), but it's not
> related with the original list - it contains late time architecture
> callbacks aimed at stopping the machine, for example.
>
> The 3 notifiers lists execute in different moments, hypervisor being
> the first, followed by informational and finally the pre_reboot list.
>
> (c) But then, there is the ordering problem of the notifiers against
> the crash_kernel() call - kdump must be as reliable as possible.
> For that, a simple binary "switch" as "crash_kexec_post_notifiers"
> is not enough, hence we introduce here concept of panic notifier
> levels: there are 5 levels, from 0 (no notifier executes before
> kdump) until 4 (all notifiers run before kdump); the default level
> is 2, in which the hypervisor and (iff we have any kmsg dumper)
> the informational notifiers execute before kdump.
>
> The detailed documentation of the levels is present in code comments
> and in the kernel-parameters.txt file; as an analogy with the previous
> panic() implementation, the level 0 is exactly the same as the old
> behavior of notifiers, running all after kdump, and the level 4 is
> the same as "crash_kexec_post_notifiers=Y" (we kept this parameter as
> a deprecated one).
>
> (d) Finally, an important change made here: we now use only the
> function "crash_smp_send_stop()" to shut all the secondary CPUs
> in the panic path. Before, there was a case of using the regular
> "smp_send_stop()", but the better approach is to simplify the
> code and try to use the function which was created exclusively
> for the panic path. Experiments showed that it works fine, and
> code was very simplified with that.
>
> Functional change is expected from this refactor, since now we
> call some notifiers by default before kdump, but the goal here
> besides code clean-up is to have a better panic path, more
> reliable and deterministic, but also very customizable.
>
> [0] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/
>
> Suggested-by: Petr Mladek <pmladek@...e.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@...lia.com>
> ---
>
> Special thanks to Petr and Baoquan for the suggestion and feedback in a previous
> email thread. There's some important design decisions that worth mentioning and
> discussing:
>
> * The default panic notifiers level is 2, based on Petr Mladek's suggestion,
> which makes a lot of sense. Of course, this is customizable through the
> parameter, but would be something worthwhile to have a KConfig option to set
> the default level? It would help distros that want the old behavior
> (no notifiers before kdump) as default.
>
> * The implementation choice was to _avoid_ intricate if conditionals in the
> panic path, which would _definitely_ be present with the panic notifiers levels
> idea; so, instead of lots of if conditionals, the set/clear bits approach with
> functions called in 2 points (but executing only in one of them) is much easier
> to follow an was used here; the ordering helper function and the comments also
> help a lot to avoid confusion (hopefully).
>
> * Choice was to *always* use crash_smp_send_stop() instead of sometimes making
> use of the regular smp_send_stop(); for most architectures they are the same,
> including Xen (on x86). For the ones that override it, all should work fine,
> in the powerpc case it's even more correct (see the subsequent patch
> "powerpc: Do not force all panic notifiers to execute before kdump")
>
> There seems to be 2 cases that requires some plumbing to work 100% right:
> - ARM doesn't disable local interrupts / FIQs in the crash version of
> send_stop(); we patched that early in this series;
> - x86 could face an issue if we have VMX and do use crash_smp_send_stop()
> _without_ kdump, but this is fixed in the first patch of the series (and
> it's a bug present even before this refactor).
>
> * Notice we didn't add a sysrq for panic notifiers level - should have it?
> Alejandro proposed recently to add a sysrq for "crash_kexec_post_notifiers",
> let me know if you feel the need here Alejandro, since the core parameters are
> present in /sys, I didn't consider much gain in having a sysrq, but of course
> I'm open to suggestions!
>
> Thanks advance for the review!
>
> .../admin-guide/kernel-parameters.txt | 42 ++-
> include/linux/panic_notifier.h | 1 +
> kernel/kexec_core.c | 8 +-
> kernel/panic.c | 292 +++++++++++++-----
> .../selftests/pstore/pstore_crash_test | 5 +-
> 5 files changed, 252 insertions(+), 96 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> index 3f1cc5e317ed..8d3524060ce3 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -829,6 +829,13 @@
> It will be ignored when crashkernel=X,high is not used
> or memory reserved is below 4G.
>
> + crash_kexec_post_notifiers
> + This was DEPRECATED - users should always prefer the
> + parameter "panic_notifiers_level" - check its entry
> + in this documentation for details on how it works.
> + Setting this parameter is exactly the same as setting
> + "panic_notifiers_level=4".
> +
> cryptomgr.notests
> [KNL] Disable crypto self-tests
>
> @@ -3784,6 +3791,33 @@
> timeout < 0: reboot immediately
> Format: <timeout>
>
> + panic_notifiers_level=
> + [KNL] Set the panic notifiers execution order.
> + Format: <unsigned int>
> + We currently have 4 lists of panic notifiers; based
> + on the functionality and risk (for panic success) the
> + callbacks are added in a given list. The lists are:
> + - hypervisor/FW notification list (low risk);
> + - informational list (low/medium risk);
> + - pre_reboot list (higher risk);
> + - post_reboot list (only run late in panic and after
> + kdump, not configurable for now).
> + This parameter defines the ordering of the first 3
> + lists with regards to kdump; the levels determine
> + which set of notifiers execute before kdump. The
> + accepted levels are:
> + 0: kdump is the first thing to run, NO list is
> + executed before kdump.
> + 1: only the hypervisor list is executed before kdump.
> + 2 (default level): the hypervisor list and (*if*
> + there's any kmsg_dumper defined) the informational
> + list are executed before kdump.
> + 3: both the hypervisor and the informational lists
> + (always) execute before kdump.
I'm not clear on why level 2 exists. What is the scenario where
execution of the info list before kdump should be conditional on the
existence of a kmsg_dumper? Maybe the scenario is described
somewhere in the patch set and I just missed it.
> + 4: the 3 lists (hypervisor, info and pre_reboot)
> + execute before kdump - this behavior is analog to the
> + deprecated parameter "crash_kexec_post_notifiers".
> +
> panic_print= Bitmask for printing system info when panic happens.
> User can chose combination of the following bits:
> bit 0: print all tasks info
> @@ -3814,14 +3848,6 @@
> panic_on_warn panic() instead of WARN(). Useful to cause kdump
> on a WARN().
>
> - crash_kexec_post_notifiers
> - Run kdump after running panic-notifiers and dumping
> - kmsg. This only for the users who doubt kdump always
> - succeeds in any situation.
> - Note that this also increases risks of kdump failure,
> - because some panic notifiers can make the crashed
> - kernel more unstable.
> -
> parkbd.port= [HW] Parallel port number the keyboard adapter is
> connected to, default is 0.
> Format: <parport#>
> diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
> index bcf6a5ea9d7f..b5041132321d 100644
> --- a/include/linux/panic_notifier.h
> +++ b/include/linux/panic_notifier.h
> @@ -10,6 +10,7 @@ extern struct atomic_notifier_head panic_info_list;
> extern struct atomic_notifier_head panic_pre_reboot_list;
> extern struct atomic_notifier_head panic_post_reboot_list;
>
> +bool panic_notifiers_before_kdump(void);
> extern bool crash_kexec_post_notifiers;
>
> enum panic_notifier_val {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 68480f731192..f8906db8ca72 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -74,11 +74,11 @@ struct resource crashk_low_res = {
> int kexec_should_crash(struct task_struct *p)
> {
> /*
> - * If crash_kexec_post_notifiers is enabled, don't run
> - * crash_kexec() here yet, which must be run after panic
> - * notifiers in panic().
> + * In case any panic notifiers are set to execute before kexec,
> + * don't run crash_kexec() here yet, which must run after these
> + * panic notifiers are executed, in the panic() path.
> */
> - if (crash_kexec_post_notifiers)
> + if (panic_notifiers_before_kdump())
> return 0;
> /*
> * There are 4 panic() calls in make_task_dead() path, each of which
> diff --git a/kernel/panic.c b/kernel/panic.c
> index bf792102b43e..b7c055d4f87f 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -15,7 +15,6 @@
> #include <linux/kgdb.h>
> #include <linux/kmsg_dump.h>
> #include <linux/kallsyms.h>
> -#include <linux/notifier.h>
> #include <linux/vt_kern.h>
> #include <linux/module.h>
> #include <linux/random.h>
> @@ -52,14 +51,23 @@ static unsigned long tainted_mask =
> static int pause_on_oops;
> static int pause_on_oops_flag;
> static DEFINE_SPINLOCK(pause_on_oops_lock);
> -bool crash_kexec_post_notifiers;
> +
> int panic_on_warn __read_mostly;
> +bool panic_on_taint_nousertaint;
> unsigned long panic_on_taint;
> -bool panic_on_taint_nousertaint = false;
>
> int panic_timeout = CONFIG_PANIC_TIMEOUT;
> EXPORT_SYMBOL_GPL(panic_timeout);
>
> +/* Initialized with all notifiers set to run before kdump */
> +static unsigned long panic_notifiers_bits = 15;
> +
> +/* Default level is 2, see kernel-parameters.txt */
> +unsigned int panic_notifiers_level = 2;
> +
> +/* DEPRECATED in favor of panic_notifiers_level */
> +bool crash_kexec_post_notifiers;
> +
> #define PANIC_PRINT_TASK_INFO 0x00000001
> #define PANIC_PRINT_MEM_INFO 0x00000002
> #define PANIC_PRINT_TIMER_INFO 0x00000004
> @@ -109,10 +117,14 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs)
> }
>
> /*
> - * Stop other CPUs in panic. Architecture dependent code may override this
> - * with more suitable version. For example, if the architecture supports
> - * crash dump, it should save registers of each stopped CPU and disable
> - * per-CPU features such as virtualization extensions.
> + * Stop other CPUs in panic context.
> + *
> + * Architecture dependent code may override this with more suitable version.
> + * For example, if the architecture supports crash dump, it should save the
> + * registers of each stopped CPU and disable per-CPU features such as
> + * virtualization extensions. When not overridden in arch code (and for
> + * x86/xen), this is exactly the same as execute smp_send_stop(), but
> + * guarded against duplicate execution.
> */
> void __weak crash_smp_send_stop(void)
> {
> @@ -183,6 +195,112 @@ static void panic_print_sys_info(bool console_flush)
> ftrace_dump(DUMP_ALL);
> }
>
> +/*
> + * Helper that accumulates all console flushing routines executed on panic.
> + */
> +static void console_flushing(void)
> +{
> +#ifdef CONFIG_VT
> + unblank_screen();
> +#endif
> + console_unblank();
> +
> + /*
> + * In this point, we may have disabled other CPUs, hence stopping the
> + * CPU holding the lock while still having some valuable data in the
> + * console buffer.
> + *
> + * Try to acquire the lock then release it regardless of the result.
> + * The release will also print the buffers out. Locks debug should
> + * be disabled to avoid reporting bad unlock balance when panic()
> + * is not being called from OOPS.
> + */
> + debug_locks_off();
> + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +
> + panic_print_sys_info(true);
> +}
> +
> +#define PN_HYPERVISOR_BIT 0
> +#define PN_INFO_BIT 1
> +#define PN_PRE_REBOOT_BIT 2
> +#define PN_POST_REBOOT_BIT 3
> +
> +/*
> + * Determine the order of panic notifiers with regards to kdump.
> + *
> + * This function relies in the "panic_notifiers_level" kernel parameter
> + * to determine how to order the notifiers with regards to kdump. We
> + * have currently 5 levels. For details, please check the kernel docs for
> + * "panic_notifiers_level" at Documentation/admin-guide/kernel-parameters.txt.
> + *
> + * Default level is 2, which means the panic hypervisor and informational
> + * (unless we don't have any kmsg_dumper) lists will execute before kdump.
> + */
> +static void order_panic_notifiers_and_kdump(void)
> +{
> + /*
> + * The parameter "crash_kexec_post_notifiers" is deprecated, but
> + * valid. Users that set it want really all panic notifiers to
> + * execute before kdump, so it's effectively the same as setting
> + * the panic notifiers level to 4.
> + */
> + if (panic_notifiers_level >= 4 || crash_kexec_post_notifiers)
> + return;
> +
> + /*
> + * Based on the level configured (smaller than 4), we clear the
> + * proper bits in "panic_notifiers_bits". Notice that this bitfield
> + * is initialized with all notifiers set.
> + */
> + switch (panic_notifiers_level) {
> + case 3:
> + clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> + break;
> + case 2:
> + clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> +
> + if (!kmsg_has_dumpers())
> + clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> + break;
> + case 1:
> + clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> + clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> + break;
> + case 0:
> + clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
> + clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
> + clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);
> + break;
> + }
I think the above switch statement could be done as follows:
if (panic_notifiers_level <= 3)
clear_bit(PN_PRE_REBOOT_BIT, &panic_notifiers_bits);
if (panic_notifiers_level <= 2)
if (!kmsg_has_dumpers())
clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
if (panic_notifiers_level <=1)
clear_bit(PN_INFO_BIT, &panic_notifiers_bits);
if (panic_notifiers_level == 0)
clear_bit(PN_HYPERVISOR_BIT, &panic_notifiers_bits);
That's about half the lines of code. It's somewhat a matter of style,
so treat this as just a suggestion to consider. I just end up looking
for a better solution when I see the same line of code repeated
3 or 4 times!
> +}
> +
> +/*
> + * Set of helpers to execute the panic notifiers only once.
> + * Just the informational notifier cares about the return.
> + */
> +static inline bool notifier_run_once(struct atomic_notifier_head head,
> + char *buf, long bit)
> +{
> + if (test_and_change_bit(bit, &panic_notifiers_bits)) {
> + atomic_notifier_call_chain(&head, PANIC_NOTIFIER, buf);
> + return true;
> + }
> + return false;
> +}
> +
> +#define panic_notifier_hypervisor_once(buf)\
> + notifier_run_once(panic_hypervisor_list, buf, PN_HYPERVISOR_BIT)
> +
> +#define panic_notifier_info_once(buf)\
> + notifier_run_once(panic_info_list, buf, PN_INFO_BIT)
> +
> +#define panic_notifier_pre_reboot_once(buf)\
> + notifier_run_once(panic_pre_reboot_list, buf, PN_PRE_REBOOT_BIT)
> +
> +#define panic_notifier_post_reboot_once(buf)\
> + notifier_run_once(panic_post_reboot_list, buf, PN_POST_REBOOT_BIT)
> +
> /**
> * panic - halt the system
> * @fmt: The text string to print
> @@ -198,32 +316,29 @@ void panic(const char *fmt, ...)
> long i, i_next = 0, len;
> int state = 0;
> int old_cpu, this_cpu;
> - bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
>
> - if (panic_on_warn) {
> - /*
> - * This thread may hit another WARN() in the panic path.
> - * Resetting this prevents additional WARN() from panicking the
> - * system on this thread. Other threads are blocked by the
> - * panic_mutex in panic().
> - */
> - panic_on_warn = 0;
> - }
> + /*
> + * This thread may hit another WARN() in the panic path, so
> + * resetting this option prevents additional WARN() from
> + * re-panicking the system here.
> + */
> + panic_on_warn = 0;
>
> /*
> * Disable local interrupts. This will prevent panic_smp_self_stop
> - * from deadlocking the first cpu that invokes the panic, since
> - * there is nothing to prevent an interrupt handler (that runs
> - * after setting panic_cpu) from invoking panic() again.
> + * from deadlocking the first cpu that invokes the panic, since there
> + * is nothing to prevent an interrupt handler (that runs after setting
> + * panic_cpu) from invoking panic() again. Also disables preemption
> + * here - notice it's not safe to rely on interrupt disabling to avoid
> + * preemption, since any cond_resched() or cond_resched_lock() might
> + * trigger a reschedule if the preempt count is 0 (for reference, see
> + * Documentation/locking/preempt-locking.rst). Some functions called
> + * from here want preempt disabled, so no point enabling it later.
> */
> local_irq_disable();
> preempt_disable_notrace();
>
> /*
> - * It's possible to come here directly from a panic-assertion and
> - * not have preempt disabled. Some functions called from here want
> - * preempt to be disabled. No point enabling it later though...
> - *
> * Only one CPU is allowed to execute the panic code from here. For
> * multiple parallel invocations of panic, all other CPUs either
> * stop themself or will wait until they are stopped by the 1st CPU
> @@ -266,73 +381,75 @@ void panic(const char *fmt, ...)
> kgdb_panic(buf);
>
> /*
> - * If we have crashed and we have a crash kernel loaded let it handle
> - * everything else.
> - * If we want to run this after calling panic_notifiers, pass
> - * the "crash_kexec_post_notifiers" option to the kernel.
> + * Here lies one of the most subtle parts of the panic path,
> + * the panic notifiers and their order with regards to kdump.
> + * We currently have 4 sets of notifiers:
> *
> - * Bypass the panic_cpu check and call __crash_kexec directly.
> + * - the hypervisor list is composed by callbacks that are related
> + * to warn the FW / hypervisor about panic, or non-invasive LED
> + * controlling functions - (hopefully) low-risk for kdump, should
> + * run early if possible.
> + *
> + * - the informational list is composed by functions dumping data
> + * like kernel offsets, device error registers or tracing buffer;
> + * also log flooding prevention callbacks fit in this list. It is
> + * relatively safe to run before kdump.
> + *
> + * - the pre_reboot list basically is everything else, all the
> + * callbacks that don't fit in the 2 previous lists. It should
> + * run *after* kdump if possible, as it contains high-risk
> + * functions that may break kdump.
> + *
> + * - we also have a 4th list of notifiers, the post_reboot
> + * callbacks. This is not strongly related to kdump since it's
> + * always executed late in the panic path, after the restart
> + * mechanism (if set); its goal is to provide a way for
> + * architecture code effectively power-off/disable the system.
> + *
> + * The kernel provides the "panic_notifiers_level" parameter
> + * to adjust the ordering in which these notifiers should run
> + * with regards to kdump - the default level is 2, so both the
> + * hypervisor and informational notifiers should execute before
> + * the __crash_kexec(); the info notifier won't run by default
> + * unless there's some kmsg_dumper() registered. For details
> + * about it, check Documentation/admin-guide/kernel-parameters.txt.
> + *
> + * Notice that the code relies in bits set/clear operations to
> + * determine the ordering, functions *_once() execute only one
> + * time, as their name implies. The goal is to prevent too much
> + * if conditionals and more confusion. Finally, regarding CPUs
> + * disabling: unless NO panic notifier executes before kdump,
> + * we always disable secondary CPUs before __crash_kexec() and
> + * the notifiers execute.
> */
> - if (!_crash_kexec_post_notifiers) {
> + order_panic_notifiers_and_kdump();
> +
> + /* If no level, we should kdump ASAP. */
> + if (!panic_notifiers_level)
> __crash_kexec(NULL);
>
> - /*
> - * Note smp_send_stop is the usual smp shutdown function, which
> - * unfortunately means it may not be hardened to work in a
> - * panic situation.
> - */
> - smp_send_stop();
> - } else {
> - /*
> - * If we want to do crash dump after notifier calls and
> - * kmsg_dump, we will need architecture dependent extra
> - * works in addition to stopping other CPUs.
> - */
> - crash_smp_send_stop();
> - }
> + crash_smp_send_stop();
> + panic_notifier_hypervisor_once(buf);
>
> - /*
> - * Run any panic handlers, including those that might need to
> - * add information to the kmsg dump output.
> - */
> - atomic_notifier_call_chain(&panic_hypervisor_list, PANIC_NOTIFIER, buf);
> - atomic_notifier_call_chain(&panic_info_list, PANIC_NOTIFIER, buf);
> - atomic_notifier_call_chain(&panic_pre_reboot_list, PANIC_NOTIFIER, buf);
> + if (panic_notifier_info_once(buf)) {
> + panic_print_sys_info(false);
> + kmsg_dump(KMSG_DUMP_PANIC);
> + }
>
> - panic_print_sys_info(false);
> + panic_notifier_pre_reboot_once(buf);
>
> - kmsg_dump(KMSG_DUMP_PANIC);
> + __crash_kexec(NULL);
>
> - /*
> - * If you doubt kdump always works fine in any situation,
> - * "crash_kexec_post_notifiers" offers you a chance to run
> - * panic_notifiers and dumping kmsg before kdump.
> - * Note: since some panic_notifiers can make crashed kernel
> - * more unstable, it can increase risks of the kdump failure too.
> - *
> - * Bypass the panic_cpu check and call __crash_kexec directly.
> - */
> - if (_crash_kexec_post_notifiers)
> - __crash_kexec(NULL);
> + panic_notifier_hypervisor_once(buf);
>
> -#ifdef CONFIG_VT
> - unblank_screen();
> -#endif
> - console_unblank();
> -
> - /*
> - * We may have ended up stopping the CPU holding the lock (in
> - * smp_send_stop()) while still having some valuable data in the console
> - * buffer. Try to acquire the lock then release it regardless of the
> - * result. The release will also print the buffers out. Locks debug
> - * should be disabled to avoid reporting bad unlock balance when
> - * panic() is not being callled from OOPS.
> - */
> - debug_locks_off();
> - console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> + if (panic_notifier_info_once(buf)) {
> + panic_print_sys_info(false);
> + kmsg_dump(KMSG_DUMP_PANIC);
> + }
>
> - panic_print_sys_info(true);
> + panic_notifier_pre_reboot_once(buf);
>
> + console_flushing();
> if (!panic_blink)
> panic_blink = no_blink;
>
> @@ -363,8 +480,7 @@ void panic(const char *fmt, ...)
> emergency_restart();
> }
>
> - atomic_notifier_call_chain(&panic_post_reboot_list,
> - PANIC_NOTIFIER, buf);
> + panic_notifier_post_reboot_once(buf);
>
> pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
>
> @@ -383,6 +499,15 @@ void panic(const char *fmt, ...)
>
> EXPORT_SYMBOL(panic);
>
> +/*
> + * Helper used in the kexec code, to validate if any
> + * panic notifier is set to execute early, before kdump.
> + */
> +inline bool panic_notifiers_before_kdump(void)
> +{
> + return panic_notifiers_level || crash_kexec_post_notifiers;
> +}
> +
> /*
> * TAINT_FORCED_RMMOD could be a per-module flag but the module
> * is being removed anyway.
> @@ -692,6 +817,9 @@ core_param(panic, panic_timeout, int, 0644);
> core_param(panic_print, panic_print, ulong, 0644);
> core_param(pause_on_oops, pause_on_oops, int, 0644);
> core_param(panic_on_warn, panic_on_warn, int, 0644);
> +core_param(panic_notifiers_level, panic_notifiers_level, uint, 0644);
> +
> +/* DEPRECATED in favor of panic_notifiers_level */
> core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644);
>
> static int __init oops_setup(char *s)
> diff --git a/tools/testing/selftests/pstore/pstore_crash_test
> b/tools/testing/selftests/pstore/pstore_crash_test
> index 2a329bbb4aca..1e60ce4501aa 100755
> --- a/tools/testing/selftests/pstore/pstore_crash_test
> +++ b/tools/testing/selftests/pstore/pstore_crash_test
> @@ -25,6 +25,7 @@ touch $REBOOT_FLAG
> sync
>
> # cause crash
> -# Note: If you use kdump and want to see kmesg-* files after reboot, you should
> -# specify 'crash_kexec_post_notifiers' in 1st kernel's cmdline.
> +# Note: If you use kdump and want to see kmsg-* files after reboot, you should
> +# be sure that the parameter "panic_notifiers_level" is more than '2' (the
> +# default value for this parameter is '2') in the first kernel's cmdline.
> echo c > /proc/sysrq-trigger
> --
> 2.36.0
Powered by blists - more mailing lists