[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<BN7PR02MB41481BB6067A7265A459AF69D4C02@BN7PR02MB4148.namprd02.prod.outlook.com>
Date: Mon, 24 Feb 2025 19:59:28 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Dexuan Cui
<decui@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>, Haiyang Zhang
<haiyangz@...rosoft.com>, Petr Mladek <pmladek@...e.com>, Andrew Morton
<akpm@...ux-foundation.org>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
John Ogness <john.ogness@...utronix.de>, Jani Nikula <jani.nikula@...el.com>,
Baoquan He <bhe@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, Ryo
Takakura <takakura@...inux.co.jp>
Subject: RE: [PATCH v2] panic: call panic handlers before
panic_other_cpus_shutdown()
From: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com> Sent: Monday, February 24, 2025 6:49 AM
>
> On Fri, Feb 21, 2025 at 11:01:09PM +0000, Michael Kelley wrote:
> > From: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com> Sent: Friday, February
> 21, 2025 1:31 PM
> > >
> > > Since, the panic handlers may require certain cpus to be online to panic
> > > gracefully, we should call them before turning off SMP. Without this
> > > re-ordering, on Hyper-V hv_panic_vmbus_unload() times out, because the
> > > vmbus channel is bound to VMBUS_CONNECT_CPU and unless the crashing cpu
> > > is the same as VMBUS_CONNECT_CPU, VMBUS_CONNECT_CPU will be offlined by
> > > crash_smp_send_stop() before the vmbus channel can be deconstructed.
> >
> > Hamza -- what specifically is the problem with the way vmbus_wait_for_unload()
> > works today? That code is aware of the problem that the unload response comes
> > only on the VMBUS_CONNECT_CPU, and that cpu may not be able to handle
> > the interrupt. So the code polls the message page of each CPU to try to get the
> > unload response message. Is there a scenario where that approach isn't working?
> >
>
> It doesn't work on arm64 (if the crashing cpu isn't VMBUS_CONNECT_CPU), it
> always ends up at "VMBus UNLOAD did not complete" without fail. It seems
> like arm64's crash_smp_send_stop() is more aggressive than x86's.
FWIW, I tested on a D16plds_v6 arm64 VM in Azure, running Ubuntu 20.04 with
a linux-next20252021 kernel. I caused a panic using "echo c >/proc/sysrq-trigger"
using "taskset" to make sure the panic is triggered on a CPU other than CPU 0.
I didn't see any problem. The panic code path completely quickly, and there were
no messages from vmbus_wait_for_unload(), including none of the periodic
"Waiting for unload" messages . I tried initiating the panic on several different
CPUs (4, 7, and 15) with the same result. I tested with kdump disabled and with
kdump enabled, both with no problems.
So I think the current vmbus_wait_for_unload() code works on arm64, as least
in some ordinary scenarios. Any key differences in the configuration or test
environment when you see the "did not complete" message?
Michael
Powered by blists - more mailing lists