[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241112121843.GF6497@noisy.programming.kicks-ass.net>
Date: Tue, 12 Nov 2024 13:18:43 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Patryk Wlazlyn <patryk.wlazlyn@...ux.intel.com>, x86@...nel.org,
linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
rafael.j.wysocki@...el.com, len.brown@...el.com,
artem.bityutskiy@...ux.intel.com, dave.hansen@...ux.intel.com
Subject: Re: [PATCH v3 2/3] x86/smp native_play_dead: Prefer
cpuidle_play_dead() over mwait_play_dead()
On Tue, Nov 12, 2024 at 01:03:49PM +0100, Rafael J. Wysocki wrote:
> On Tue, Nov 12, 2024 at 12:47 PM Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > On Fri, Nov 08, 2024 at 01:29:08PM +0100, Patryk Wlazlyn wrote:
> > > The generic implementation, based on cpuid leaf 0x5, for looking up the
> > > mwait hint for the deepest cstate, depends on them to be continuous in
> > > range [0, NUM_SUBSTATES-1]. While that is correct on most Intel x86
> > > platforms, it is not architectural and may not result in reaching the
> > > most optimized idle state on some of them.
> > >
> > > Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and
> > > fallback to the later in case of missing enter_dead() handler.
> > >
> > > Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@...ux.intel.com>
> > > ---
> > > arch/x86/kernel/smpboot.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > > index 44c40781bad6..721bb931181c 100644
> > > --- a/arch/x86/kernel/smpboot.c
> > > +++ b/arch/x86/kernel/smpboot.c
> > > @@ -1416,9 +1416,9 @@ void native_play_dead(void)
> > > play_dead_common();
> > > tboot_shutdown(TB_SHUTDOWN_WFS);
> > >
> > > - mwait_play_dead();
> > > if (cpuidle_play_dead())
> > > - hlt_play_dead();
> > > + mwait_play_dead();
> > > + hlt_play_dead();
> > > }
> >
> > Yeah, I don't think so. we don't want to accidentally hit
> > acpi_idle_play_dead().
>
> Fair enough.
>
> Then we are back to the original approach though:
>
> https://lore.kernel.org/linux-pm/20241029101507.7188-3-patryk.wlazlyn@linux.intel.com/
Well, that won't be brilliant for hybrid systems where the available
states are different per CPU.
Also, all of this is a bit of a trainwreck... AFAICT AMD wants IO based
idle (per the 2018 commit). So they want the ACPI thing.
But on Intel we really don't want HLT, and had that MWAIT, but that has
real problems with KEXEC. And I don't think we can rely on INTEL_IDLE=y.
The ACPI thing doesn't support FFh states for it's enter_dead(), should
it?
Anyway, ideally x86 would grow a new instruction to offline a CPU, both
MWAIT and HLT have problems vs non-maskable interrupts.
I really don't know what is best here, maybe moving that whole CPUID
loop to boot, store the value in a per-cpu mwait_play_dead_hint. Have
AMD explicitly clear the value, and avoid mwait when 0 -- hint 0 is
equal to HLT anyway.
But as said, we need a new instruction.
Powered by blists - more mailing lists