lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z0o9FMFZ9Y8HkvzJ@pavilion.home>
Date: Fri, 29 Nov 2024 23:15:48 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Will Deacon <will@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Marc Zyngier <maz@...nel.org>,
	Oliver Upton <oliver.upton@...ux.dev>,
	Ard Biesheuvel <ardb@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 11/21] arm64: Keep first mismatched 32bits el0 capable
 CPU online through its callbacks

Le Thu, Nov 28, 2024 at 04:26:19PM +0000, Will Deacon a écrit :
> Hi Frederic,
> 
> On Tue, Nov 12, 2024 at 03:22:35PM +0100, Frederic Weisbecker wrote:
> > +static int mismatched_32bit_el0_offline(unsigned int cpu)
> > +{
> > +	return cpu == cpu_32bit_unofflineable ? -EBUSY : 0;
> > +}
> 
> I think this is far too late. The reason we prevent hot-unplug of the
> last 32-bit CPU is because 32-bit tasks need somewhere to run. By the
> time our offline notifier runs, those tasks have already been migrated.
> 
> On my setup, this explodes because that migration fails (as expected):

Duh!

> 
> 
> [  125.954586] ------------[ cut here ]------------
> [  125.955661] kernel BUG at kernel/sched/core.c:3501!
> [  125.957585] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> [  125.959371] Modules linked in:
> [  125.961850] CPU: 2 UID: 0 PID: 27 Comm: migration/2 Not tainted 6.12.0-00001-ge7689036c862-dirty #10
> [  125.963711] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> [  125.964859] Stopper: __balance_push_cpu_stop+0x0/0x134 <- balance_push+0x118/0x1ac
> [  125.968507] pstate: 614000c9 (nZCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [  125.969648] pc : select_fallback_rq+0x2f4/0x2f8
> [  125.970477] lr : select_fallback_rq+0x198/0x2f8
> [  125.971273] sp : ffff800080153d20
> [  125.971897] x29: ffff800080153d30 x28: 0000000000000002 x27: ffffddbd31d79d30
> [  125.973416] x26: 0000000000000001 x25: ffffddbd31d79c70 x24: 0000000000000008
> [  125.974436] x23: ffffddbd31d79000 x22: ffffddbd31a77190 x21: 0000000000000004
> [  125.975452] x20: ffff0000c506a280 x19: 0000000000000000 x18: ffffddbd30fb1df0
> [  125.976467] x17: 0000000000000000 x16: 0000000000000001 x15: 000000000018132d
> [  125.977490] x14: 00000000ffffffe0 x13: 0000040000000000 x12: 00000c0000000000
> [  125.978499] x11: 000000002ae00002 x10: 000000000000000c x9 : 0000000000000040
> [  125.979507] x8 : 0000000000000000 x7 : 000000000000000c x6 : 000000000000000c
> [  125.980501] x5 : ffff0000c506a578 x4 : ffffddbd2fe53eb0 x3 : 0000000000000010
> [  125.981508] x2 : 0000000000000004 x1 : 0000000000000004 x0 : 0000000000000004
> [  125.982671] Call trace:
> [  125.983065]  select_fallback_rq+0x2f4/0x2f8
> [  125.983550]  __balance_push_cpu_stop+0x94/0x134
> [  125.983983]  cpu_stopper_thread+0xbc/0x174
> [  125.984352]  smpboot_thread_fn+0x1e4/0x24c
> [  125.984732]  kthread+0xfc/0x184
> [  125.985065]  ret_from_fork+0x10/0x20
> [  125.985741] Code: 9000d9c0 91306000 9441667a 17ffffef (d4210000) 
> [  125.986445] ---[ end trace 0000000000000000 ]---
> 
> 
> As I mentioned before, I think we need to turn this the other way around
> so that the non-unpluggable 32-bit core is forced to be a housekeeping]
> CPU. You said it was hard to revert CPUs from being treated as nohz_full,
> but I'm wondering whether we can prevent it from being added in the first
> place. The arch code has fingers in all the early boot pies.

So the mismatch is detected (and the unpluggableability forced) late on boot,
during secondary boot up. OTOH housekeeping is set up much earlier, on kernel
parameters parsing. Would it be possible to detect at this stage if a
possible yet non-booted CPU doesn't support 32bits el0?


> Yet another option (which I'm not hugely fond of, but may be ok) would
> be to treat 32-bit-capable nohz_full CPUs as being 64-bit-only when
> 'allow_mismatched_32bit_el0' is enabled (and documenting this as a
> limitation).

There is that too indeed.

Thanks.

> 
> Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ