linux-kernel - RE: [PATCH v2] panic: call panic handlers before panic_other_cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <BN7PR02MB41481BB6067A7265A459AF69D4C02@BN7PR02MB4148.namprd02.prod.outlook.com>
Date: Mon, 24 Feb 2025 19:59:28 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Dexuan Cui
	<decui@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>, Haiyang Zhang
	<haiyangz@...rosoft.com>, Petr Mladek <pmladek@...e.com>, Andrew Morton
	<akpm@...ux-foundation.org>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	John Ogness <john.ogness@...utronix.de>, Jani Nikula <jani.nikula@...el.com>,
	Baoquan He <bhe@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, Ryo
 Takakura <takakura@...inux.co.jp>
Subject: RE: [PATCH v2] panic: call panic handlers before
 panic_other_cpus_shutdown()

From: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com> Sent: Monday, February 24, 2025 6:49 AM
> 
> On Fri, Feb 21, 2025 at 11:01:09PM +0000, Michael Kelley wrote:
> > From: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com> Sent: Friday, February
> 21, 2025 1:31 PM
> > >
> > > Since, the panic handlers may require certain cpus to be online to panic
> > > gracefully, we should call them before turning off SMP. Without this
> > > re-ordering, on Hyper-V hv_panic_vmbus_unload() times out, because the
> > > vmbus channel is bound to VMBUS_CONNECT_CPU and unless the crashing cpu
> > > is the same as VMBUS_CONNECT_CPU, VMBUS_CONNECT_CPU will be offlined by
> > > crash_smp_send_stop() before the vmbus channel can be deconstructed.
> >
> > Hamza -- what specifically is the problem with the way vmbus_wait_for_unload()
> > works today? That code is aware of the problem that the unload response comes
> > only on the VMBUS_CONNECT_CPU, and that cpu may not be able to handle
> > the interrupt. So the code polls the message page of each CPU to try to get the
> > unload response message. Is there a scenario where that approach isn't working?
> >
> 
> It doesn't work on arm64 (if the crashing cpu isn't VMBUS_CONNECT_CPU), it
> always ends up at "VMBus UNLOAD did not complete" without fail. It seems
> like arm64's crash_smp_send_stop() is more aggressive than x86's.

FWIW, I tested on a D16plds_v6 arm64 VM in Azure, running Ubuntu 20.04 with
a linux-next20252021 kernel. I caused a panic using "echo c >/proc/sysrq-trigger"
using "taskset" to make sure the panic is triggered on a CPU other than CPU 0.
I didn't see any problem. The panic code path completely quickly, and there were
no messages from vmbus_wait_for_unload(), including none of the periodic
"Waiting for unload" messages . I tried initiating the panic on several different
CPUs (4, 7, and 15) with the same result. I tested with kdump disabled and with
kdump enabled, both with no problems.

So I think the current vmbus_wait_for_unload() code works on arm64, as least
in some ordinary scenarios. Any key differences in the configuration or test
environment when you see the "did not complete" message?

Michael