linux-kernel - RE: [PATCH v2] panic: call panic handlers before panic_other_cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <SN6PR02MB4157D993CCE04F2D46E2B8A1D4C72@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Fri, 21 Feb 2025 23:01:09 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: Dexuan Cui <decui@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>, Haiyang Zhang
	<haiyangz@...rosoft.com>, Petr Mladek <pmladek@...e.com>, Andrew Morton
	<akpm@...ux-foundation.org>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	John Ogness <john.ogness@...utronix.de>, Jani Nikula <jani.nikula@...el.com>,
	Baoquan He <bhe@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, Ryo
 Takakura <takakura@...inux.co.jp>
Subject: RE: [PATCH v2] panic: call panic handlers before
 panic_other_cpus_shutdown()

From: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com> Sent: Friday, February 21, 2025 1:31 PM
> 
> Since, the panic handlers may require certain cpus to be online to panic
> gracefully, we should call them before turning off SMP. Without this
> re-ordering, on Hyper-V hv_panic_vmbus_unload() times out, because the
> vmbus channel is bound to VMBUS_CONNECT_CPU and unless the crashing cpu
> is the same as VMBUS_CONNECT_CPU, VMBUS_CONNECT_CPU will be offlined by
> crash_smp_send_stop() before the vmbus channel can be deconstructed.

Hamza -- what specifically is the problem with the way vmbus_wait_for_unload()
works today? That code is aware of the problem that the unload response comes
only on the VMBUS_CONNECT_CPU, and that cpu may not be able to handle
the interrupt. So the code polls the message page of each CPU to try to get the
unload response message. Is there a scenario where that approach isn't working?

Note also that Hyper-V itself can take a long time (10's of seconds) to respond
to the unload request. See the comments in vmbus_wait_for_unload() about
flushing the Azure host disk cache. I worked on this code and did the
measurements, so I have some familiarity with the problems. :-)

Michael

> 
> Signed-off-by: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com>
> ---
> v2: keep printk_legacy_allow_panic_sync() after
>     panic_other_cpus_shutdown().
> ---
>  kernel/panic.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..433cf651e213 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -372,16 +372,16 @@ void panic(const char *fmt, ...)
>  	if (!_crash_kexec_post_notifiers)
>  		__crash_kexec(NULL);
> 
> -	panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
> -
> -	printk_legacy_allow_panic_sync();
> -
>  	/*
>  	 * Run any panic handlers, including those that might need to
>  	 * add information to the kmsg dump output.
>  	 */
>  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> 
> +	panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
> +
> +	printk_legacy_allow_panic_sync();
> +
>  	panic_print_sys_info(false);
> 
>  	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
> --
> 2.47.1
>