lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157D993CCE04F2D46E2B8A1D4C72@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Fri, 21 Feb 2025 23:01:09 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: Dexuan Cui <decui@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>, Haiyang Zhang
	<haiyangz@...rosoft.com>, Petr Mladek <pmladek@...e.com>, Andrew Morton
	<akpm@...ux-foundation.org>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	John Ogness <john.ogness@...utronix.de>, Jani Nikula <jani.nikula@...el.com>,
	Baoquan He <bhe@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, Ryo
 Takakura <takakura@...inux.co.jp>
Subject: RE: [PATCH v2] panic: call panic handlers before
 panic_other_cpus_shutdown()

From: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com> Sent: Friday, February 21, 2025 1:31 PM
> 
> Since, the panic handlers may require certain cpus to be online to panic
> gracefully, we should call them before turning off SMP. Without this
> re-ordering, on Hyper-V hv_panic_vmbus_unload() times out, because the
> vmbus channel is bound to VMBUS_CONNECT_CPU and unless the crashing cpu
> is the same as VMBUS_CONNECT_CPU, VMBUS_CONNECT_CPU will be offlined by
> crash_smp_send_stop() before the vmbus channel can be deconstructed.

Hamza -- what specifically is the problem with the way vmbus_wait_for_unload()
works today? That code is aware of the problem that the unload response comes
only on the VMBUS_CONNECT_CPU, and that cpu may not be able to handle
the interrupt. So the code polls the message page of each CPU to try to get the
unload response message. Is there a scenario where that approach isn't working?

Note also that Hyper-V itself can take a long time (10's of seconds) to respond
to the unload request. See the comments in vmbus_wait_for_unload() about
flushing the Azure host disk cache. I worked on this code and did the
measurements, so I have some familiarity with the problems. :-)

Michael

> 
> Signed-off-by: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com>
> ---
> v2: keep printk_legacy_allow_panic_sync() after
>     panic_other_cpus_shutdown().
> ---
>  kernel/panic.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..433cf651e213 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -372,16 +372,16 @@ void panic(const char *fmt, ...)
>  	if (!_crash_kexec_post_notifiers)
>  		__crash_kexec(NULL);
> 
> -	panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
> -
> -	printk_legacy_allow_panic_sync();
> -
>  	/*
>  	 * Run any panic handlers, including those that might need to
>  	 * add information to the kmsg dump output.
>  	 */
>  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> 
> +	panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
> +
> +	printk_legacy_allow_panic_sync();
> +
>  	panic_print_sys_info(false);
> 
>  	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
> --
> 2.47.1
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ