lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <BN7PR02MB41481BB6067A7265A459AF69D4C02@BN7PR02MB4148.namprd02.prod.outlook.com>
Date: Mon, 24 Feb 2025 19:59:28 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Dexuan Cui
	<decui@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>, Haiyang Zhang
	<haiyangz@...rosoft.com>, Petr Mladek <pmladek@...e.com>, Andrew Morton
	<akpm@...ux-foundation.org>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	John Ogness <john.ogness@...utronix.de>, Jani Nikula <jani.nikula@...el.com>,
	Baoquan He <bhe@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, Ryo
 Takakura <takakura@...inux.co.jp>
Subject: RE: [PATCH v2] panic: call panic handlers before
 panic_other_cpus_shutdown()

From: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com> Sent: Monday, February 24, 2025 6:49 AM
> 
> On Fri, Feb 21, 2025 at 11:01:09PM +0000, Michael Kelley wrote:
> > From: Hamza Mahfooz <hamzamahfooz@...ux.microsoft.com> Sent: Friday, February
> 21, 2025 1:31 PM
> > >
> > > Since, the panic handlers may require certain cpus to be online to panic
> > > gracefully, we should call them before turning off SMP. Without this
> > > re-ordering, on Hyper-V hv_panic_vmbus_unload() times out, because the
> > > vmbus channel is bound to VMBUS_CONNECT_CPU and unless the crashing cpu
> > > is the same as VMBUS_CONNECT_CPU, VMBUS_CONNECT_CPU will be offlined by
> > > crash_smp_send_stop() before the vmbus channel can be deconstructed.
> >
> > Hamza -- what specifically is the problem with the way vmbus_wait_for_unload()
> > works today? That code is aware of the problem that the unload response comes
> > only on the VMBUS_CONNECT_CPU, and that cpu may not be able to handle
> > the interrupt. So the code polls the message page of each CPU to try to get the
> > unload response message. Is there a scenario where that approach isn't working?
> >
> 
> It doesn't work on arm64 (if the crashing cpu isn't VMBUS_CONNECT_CPU), it
> always ends up at "VMBus UNLOAD did not complete" without fail. It seems
> like arm64's crash_smp_send_stop() is more aggressive than x86's.

FWIW, I tested on a D16plds_v6 arm64 VM in Azure, running Ubuntu 20.04 with
a linux-next20252021 kernel. I caused a panic using "echo c >/proc/sysrq-trigger"
using "taskset" to make sure the panic is triggered on a CPU other than CPU 0.
I didn't see any problem. The panic code path completely quickly, and there were
no messages from vmbus_wait_for_unload(), including none of the periodic
"Waiting for unload" messages . I tried initiating the panic on several different
CPUs (4, 7, and 15) with the same result. I tested with kdump disabled and with
kdump enabled, both with no problems.

So I think the current vmbus_wait_for_unload() code works on arm64, as least
in some ordinary scenarios. Any key differences in the configuration or test
environment when you see the "did not complete" message?

Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ