linux-kernel - Re: [PATCH v2] panic: call panic handlers before panic_other_cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z7yGv_ZyeyUueXLz@hm-sls2>
Date: Mon, 24 Feb 2025 09:48:31 -0500
From: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com>
To: Michael Kelley <mhklinux@...look.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Dexuan Cui <decui@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	Haiyang Zhang <haiyangz@...rosoft.com>,
	Petr Mladek <pmladek@...e.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	John Ogness <john.ogness@...utronix.de>,
	Jani Nikula <jani.nikula@...el.com>, Baoquan He <bhe@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ryo Takakura <takakura@...inux.co.jp>
Subject: Re: [PATCH v2] panic: call panic handlers before
 panic_other_cpus_shutdown()

On Fri, Feb 21, 2025 at 11:01:09PM +0000, Michael Kelley wrote:
> From: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com> Sent: Friday, February 21, 2025 1:31 PM
> > 
> > Since, the panic handlers may require certain cpus to be online to panic
> > gracefully, we should call them before turning off SMP. Without this
> > re-ordering, on Hyper-V hv_panic_vmbus_unload() times out, because the
> > vmbus channel is bound to VMBUS_CONNECT_CPU and unless the crashing cpu
> > is the same as VMBUS_CONNECT_CPU, VMBUS_CONNECT_CPU will be offlined by
> > crash_smp_send_stop() before the vmbus channel can be deconstructed.
> 
> Hamza -- what specifically is the problem with the way vmbus_wait_for_unload()
> works today? That code is aware of the problem that the unload response comes
> only on the VMBUS_CONNECT_CPU, and that cpu may not be able to handle
> the interrupt. So the code polls the message page of each CPU to try to get the
> unload response message. Is there a scenario where that approach isn't working?
> 

It doesn't work on arm64 (if the crashing cpu isn't VMBUS_CONNECT_CPU), it
always ends up at "VMBus UNLOAD did not complete" without fail. It seems
like arm64's crash_smp_send_stop() is more aggressive than x86's.

> Note also that Hyper-V itself can take a long time (10's of seconds) to respond
> to the unload request. See the comments in vmbus_wait_for_unload() about
> flushing the Azure host disk cache. I worked on this code and did the
> measurements, so I have some familiarity with the problems. :-)
> 
> Michael
> 
> > 
> > Signed-off-by: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com>
> > ---
> > v2: keep printk_legacy_allow_panic_sync() after
> >     panic_other_cpus_shutdown().
> > ---
> >  kernel/panic.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index fbc59b3b64d0..433cf651e213 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -372,16 +372,16 @@ void panic(const char *fmt, ...)
> >  	if (!_crash_kexec_post_notifiers)
> >  		__crash_kexec(NULL);
> > 
> > -	panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
> > -
> > -	printk_legacy_allow_panic_sync();
> > -
> >  	/*
> >  	 * Run any panic handlers, including those that might need to
> >  	 * add information to the kmsg dump output.
> >  	 */
> >  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> > 
> > +	panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
> > +
> > +	printk_legacy_allow_panic_sync();
> > +
> >  	panic_print_sys_info(false);
> > 
> >  	kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
> > --
> > 2.47.1
> > 
>