[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241209162322.GE12428@willie-the-truck>
Date: Mon, 9 Dec 2024 16:23:22 +0000
From: Will Deacon <will@...nel.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Catalin Marinas <catalin.marinas@....com>,
Marc Zyngier <maz@...nel.org>,
Oliver Upton <oliver.upton@...ux.dev>,
Ard Biesheuvel <ardb@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 11/21] arm64: Keep first mismatched 32bits el0 capable
CPU online through its callbacks
On Fri, Nov 29, 2024 at 11:15:48PM +0100, Frederic Weisbecker wrote:
> Le Thu, Nov 28, 2024 at 04:26:19PM +0000, Will Deacon a écrit :
> > Hi Frederic,
> >
> > On Tue, Nov 12, 2024 at 03:22:35PM +0100, Frederic Weisbecker wrote:
> > > +static int mismatched_32bit_el0_offline(unsigned int cpu)
> > > +{
> > > + return cpu == cpu_32bit_unofflineable ? -EBUSY : 0;
> > > +}
> >
> > I think this is far too late. The reason we prevent hot-unplug of the
> > last 32-bit CPU is because 32-bit tasks need somewhere to run. By the
> > time our offline notifier runs, those tasks have already been migrated.
> >
> > On my setup, this explodes because that migration fails (as expected):
>
> Duh!
>
> >
> >
> > [ 125.954586] ------------[ cut here ]------------
> > [ 125.955661] kernel BUG at kernel/sched/core.c:3501!
> > [ 125.957585] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> > [ 125.959371] Modules linked in:
> > [ 125.961850] CPU: 2 UID: 0 PID: 27 Comm: migration/2 Not tainted 6.12.0-00001-ge7689036c862-dirty #10
> > [ 125.963711] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > [ 125.964859] Stopper: __balance_push_cpu_stop+0x0/0x134 <- balance_push+0x118/0x1ac
> > [ 125.968507] pstate: 614000c9 (nZCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > [ 125.969648] pc : select_fallback_rq+0x2f4/0x2f8
> > [ 125.970477] lr : select_fallback_rq+0x198/0x2f8
> > [ 125.971273] sp : ffff800080153d20
> > [ 125.971897] x29: ffff800080153d30 x28: 0000000000000002 x27: ffffddbd31d79d30
> > [ 125.973416] x26: 0000000000000001 x25: ffffddbd31d79c70 x24: 0000000000000008
> > [ 125.974436] x23: ffffddbd31d79000 x22: ffffddbd31a77190 x21: 0000000000000004
> > [ 125.975452] x20: ffff0000c506a280 x19: 0000000000000000 x18: ffffddbd30fb1df0
> > [ 125.976467] x17: 0000000000000000 x16: 0000000000000001 x15: 000000000018132d
> > [ 125.977490] x14: 00000000ffffffe0 x13: 0000040000000000 x12: 00000c0000000000
> > [ 125.978499] x11: 000000002ae00002 x10: 000000000000000c x9 : 0000000000000040
> > [ 125.979507] x8 : 0000000000000000 x7 : 000000000000000c x6 : 000000000000000c
> > [ 125.980501] x5 : ffff0000c506a578 x4 : ffffddbd2fe53eb0 x3 : 0000000000000010
> > [ 125.981508] x2 : 0000000000000004 x1 : 0000000000000004 x0 : 0000000000000004
> > [ 125.982671] Call trace:
> > [ 125.983065] select_fallback_rq+0x2f4/0x2f8
> > [ 125.983550] __balance_push_cpu_stop+0x94/0x134
> > [ 125.983983] cpu_stopper_thread+0xbc/0x174
> > [ 125.984352] smpboot_thread_fn+0x1e4/0x24c
> > [ 125.984732] kthread+0xfc/0x184
> > [ 125.985065] ret_from_fork+0x10/0x20
> > [ 125.985741] Code: 9000d9c0 91306000 9441667a 17ffffef (d4210000)
> > [ 125.986445] ---[ end trace 0000000000000000 ]---
> >
> >
> > As I mentioned before, I think we need to turn this the other way around
> > so that the non-unpluggable 32-bit core is forced to be a housekeeping]
> > CPU. You said it was hard to revert CPUs from being treated as nohz_full,
> > but I'm wondering whether we can prevent it from being added in the first
> > place. The arch code has fingers in all the early boot pies.
>
> So the mismatch is detected (and the unpluggableability forced) late on boot,
> during secondary boot up. OTOH housekeeping is set up much earlier, on kernel
> parameters parsing. Would it be possible to detect at this stage if a
> possible yet non-booted CPU doesn't support 32bits el0?
Sadly not; we need the CPU to enter the kernel so that we can inspect its
ID registers. We _do_ do this before the CPU is marked as online, but we
can't tell before it's actually up and running.
> > Yet another option (which I'm not hugely fond of, but may be ok) would
> > be to treat 32-bit-capable nohz_full CPUs as being 64-bit-only when
> > 'allow_mismatched_32bit_el0' is enabled (and documenting this as a
> > limitation).
>
> There is that too indeed.
Maybe that's what we do then (but *please* update 'asymmetric-32bit.rst').
Will
Powered by blists - more mailing lists