[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzSQcq5JxGgKVh5Z@BLRRASHENOY1.amd.com>
Date: Wed, 13 Nov 2024 17:11:38 +0530
From: "Gautham R. Shenoy" <gautham.shenoy@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
Patryk Wlazlyn <patryk.wlazlyn@...ux.intel.com>, x86@...nel.org,
linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
rafael.j.wysocki@...el.com, len.brown@...el.com,
artem.bityutskiy@...ux.intel.com, dave.hansen@...ux.intel.com
Subject: Re: [PATCH v3 2/3] x86/smp native_play_dead: Prefer
cpuidle_play_dead() over mwait_play_dead()
On Tue, Nov 12, 2024 at 03:56:18PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 12, 2024 at 02:23:14PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Nov 12, 2024 at 12:47 PM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Fri, Nov 08, 2024 at 01:29:08PM +0100, Patryk Wlazlyn wrote:
> > > > The generic implementation, based on cpuid leaf 0x5, for looking up the
> > > > mwait hint for the deepest cstate, depends on them to be continuous in
> > > > range [0, NUM_SUBSTATES-1]. While that is correct on most Intel x86
> > > > platforms, it is not architectural and may not result in reaching the
> > > > most optimized idle state on some of them.
> > > >
> > > > Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and
> > > > fallback to the later in case of missing enter_dead() handler.
> > > >
> > > > Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@...ux.intel.com>
> > > > ---
> > > > arch/x86/kernel/smpboot.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > > > index 44c40781bad6..721bb931181c 100644
> > > > --- a/arch/x86/kernel/smpboot.c
> > > > +++ b/arch/x86/kernel/smpboot.c
> > > > @@ -1416,9 +1416,9 @@ void native_play_dead(void)
> > > > play_dead_common();
> > > > tboot_shutdown(TB_SHUTDOWN_WFS);
> > > >
> > > > - mwait_play_dead();
> > > > if (cpuidle_play_dead())
> > > > - hlt_play_dead();
> > > > + mwait_play_dead();
> > > > + hlt_play_dead();
> > > > }
> > >
> > > Yeah, I don't think so. we don't want to accidentally hit
> > > acpi_idle_play_dead().
> >
> > Having inspected the code once again, I'm not sure what your concern is.
> >
> > :enter.dead() is set to acpi_idle_play_dead() for all states in ACPI
> > idle - see acpi_processor_setup_cstates() and the role of the type
> > check is to filter out bogus table entries (the "type" must be 1, 2,
> > or 3 as per the spec).
> >
> > Then cpuidle_play_dead() calls drv->states[i].enter_dead() for the
> > deepest state where it is set and if this is FFH,
> > acpi_idle_play_dead() will return an error. So after the change, the
> > code above will fall back to mwait_play_dead() then.
> >
> > Or am I missing anything?
>
> So it relies on there being a C2/C3 state enumerated and that being FFh.
> Otherwise it will find a 'working' state and we're up a creek.
>
> Typically I expect C2/C3 FFh states will be there on Intel stuff, but it
> seems awefully random to rely on this hole. AMD might unwittinly change
> the ACPI driver (they're the main user) and then we'd be up a creek.
AMD platforms won't be using FFH based states for offlined CPUs. We
prefer IO based states when available, and HLT otherwise.
>
> Robustly we'd teach the ACPI driver about FFh and set enter_dead on
> every state -- but we'd have to double check that with AMD.
Works for us as long as those FFh states aren't used for play_dead on
AMD platforms.
How about something like this (completely untested)
---------------------x8----------------------------------------------------
diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index f3ffd0a3a012..bd611771fa6c 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -215,6 +215,24 @@ void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
}
EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_enter);
+static int acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
+{
+ unsigned int cpu = smp_processor_id();
+ struct cstate_entry *percpu_entry;
+
+ /*
+ * This is ugly. But AMD processors don't prefer MWAIT based
+ * C-states when processors are offlined.
+ */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+ boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+ return -ENODEV;
+
+ percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
+ return mwait_play_dead_with_hints(percpu_entry->states[cx->index].eax);
+}
+EXPORT_SYMBOL_GPL(acpi_processor_ffh_play_dead);
+
static int __init ffh_cstate_init(void)
{
struct cpuinfo_x86 *c = &boot_cpu_data;
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 831fa4a12159..c535f5df9081 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -590,6 +590,8 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index)
raw_safe_halt();
else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
io_idle(cx->address);
+ } else if (cx->entry_method == ACPI_CSTATE_FFH) {
+ return acpi_procesor_ffh_play_dead(cx);
} else
return -ENODEV;
}
diff --git a/include/acpi/processor.h b/include/acpi/processor.h
index e6f6074eadbf..38329dcdd2b9 100644
--- a/include/acpi/processor.h
+++ b/include/acpi/processor.h
@@ -280,6 +280,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
struct acpi_processor_cx *cx,
struct acpi_power_register *reg);
void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cstate);
+int acpi_processor_ffh_dead(struct acpi_processor_cx *cstate);
#else
static inline void acpi_processor_power_init_bm_check(struct
acpi_processor_flags
@@ -300,6 +301,11 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx
{
return;
}
+
+static inline acpi_processor_ffh_dead(struct acpi_processor_cx *cstate)
+{
+ return -ENODEV;
+}
#endif
static inline int call_on_cpu(int cpu, long (*fn)(void *), void *arg,
---------------------x8--------------------------------------------------------
--
Thanks and Regards
gautham.
Powered by blists - more mailing lists