lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 14 Oct 2021 11:30:36 +0000
From:   "Huang, Ray" <Ray.Huang@....com>
To:     Giovanni Gherdovich <ggherdovich@...e.cz>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Shuah Khan <skhan@...uxfoundation.org>,
        Borislav Petkov <bp@...e.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
CC:     "Sharma, Deepak" <Deepak.Sharma@....com>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        "Limonciello, Mario" <Mario.Limonciello@....com>,
        "Fontenot, Nathan" <Nathan.Fontenot@....com>,
        "Su, Jinzhou (Joe)" <Jinzhou.Su@....com>,
        "Du, Xiaojian" <Xiaojian.Du@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>
Subject: RE: [PATCH v2 21/21] Documentation: amd-pstate: add amd-pstate driver
 introduction

[AMD Official Use Only]

> -----Original Message-----
> From: Giovanni Gherdovich <ggherdovich@...e.cz>
> Sent: Thursday, October 14, 2021 12:23 AM
> To: Huang, Ray <Ray.Huang@....com>; Rafael J . Wysocki
> <rafael.j.wysocki@...el.com>; Viresh Kumar <viresh.kumar@...aro.org>;
> Shuah Khan <skhan@...uxfoundation.org>; Borislav Petkov <bp@...e.de>;
> Peter Zijlstra <peterz@...radead.org>; Ingo Molnar <mingo@...nel.org>;
> linux-pm@...r.kernel.org
> Cc: Sharma, Deepak <Deepak.Sharma@....com>; Deucher, Alexander
> <Alexander.Deucher@....com>; Limonciello, Mario
> <Mario.Limonciello@....com>; Fontenot, Nathan
> <Nathan.Fontenot@....com>; Su, Jinzhou (Joe) <Jinzhou.Su@....com>;
> Du, Xiaojian <Xiaojian.Du@....com>; linux-kernel@...r.kernel.org;
> x86@...nel.org
> Subject: Re: [PATCH v2 21/21] Documentation: amd-pstate: add amd-pstate
> driver introduction
> 
> On Sun, 2021-09-26 at 17:06 +0800, Huang Rui wrote:
> > Introduce the amd-pstate driver design and implementation.
> >
> > Signed-off-by: Huang Rui <ray.huang@....com>
> > ---
> >  Documentation/admin-guide/pm/amd_pstate.rst   | 377
> ++++++++++++++++++
> >
> 
> [... snip ...]
> 
> > +
> > +AMD CPPC Performance Capability
> > +--------------------------------
> > +
> > +Highest Performance (RO)
> > +.........................
> > +
> > +It is the absolute maximum performance an individual processor may
> > +reach, assuming ideal conditions. This performance level may not be
> > +sustainable for long durations and may only be achievable if other
> > +platform components are in a specific state; for example, it may
> > +require other processors be in an idle state. This would be
> > +equivalent to the highest frequencies supported by the processor.
> > +
> > +Nominal (Guaranteed) Performance (RO)
> > +......................................
> > +
> > +It is the maximum sustained performance level of the processor,
> > +assuming ideal operating conditions. In absence of an external
> > +constraint (power, thermal, etc.) this is the performance level the
> > +processor is expected to be able to maintain continuously. All
> > +cores/processors are expected to be able to sustain their nominal
> performance state simultaneously.
> > +
> > +Lowest non-linear Performance (RO)
> > +...................................
> > +
> > +It is the lowest performance level at which nonlinear power savings
> > +are achieved, for example, due to the combined effects of voltage and
> > +frequency scaling. Above this threshold, lower performance levels
> > +should be generally more energy efficient than higher performance
> > +levels. This register effectively conveys the most efficient performance
> level to ``amd-pstate``.
> > +
> > +Lowest Performance (RO)
> > +........................
> > +
> > +It is the absolute lowest performance level of the processor.
> > +Selecting a performance level lower than the lowest nonlinear
> > +performance level may cause an efficiency penalty but should reduce
> > +the instantaneous power consumption of the processor.
> > +
> 
> Those above are the CPPC capabilities. All good so far. They're Read Only,
> and for each capability you have a file in sysfs. It makes sense to describe
> them in this Documentation folder ("admin-guide"). But the following
> section...
> 
> > +AMD CPPC Performance Control
> > +------------------------------
> > +
> > +``amd-pstate`` passes performance goals through these registers. The
> > +register drives the behavior of the desired performance target.
> > +
> > +Minimum requested performance (RW)
> > +...................................
> > +
> > +``amd-pstate`` specifies the minimum allowed performance level.
> > +
> > +Maximum requested performance (RW)
> > +...................................
> > +
> > +``amd-pstate`` specifies a limit the maximum performance that is
> > +expected to be supplied by the hardware.
> > +
> > +Desired performance target (RW)
> > +...................................
> > +
> > +``amd-pstate`` specifies a desired target in the CPPC performance
> > +scale as a relative number. This can be expressed as percentage of
> > +nominal performance (infrastructure max). Below the nominal sustained
> > +performance level, desired performance expresses the average
> > +performance level of the processor subject to hardware. Above the
> > +nominal performance level, processor must provide at least nominal
> > +performance requested and go higher if current operating conditions
> allow.
> > +
> > +Energy Performance Preference (EPP) (RW)
> > +.........................................
> > +
> > +Provides a hint to the hardware if software wants to bias toward
> > +performance
> > +(0x0) or energy efficiency (0xff).
> 
> The section above describes the CPPC "performance controls". They're
> marked "Read/Write", but you don't expose them to the user via sysfs, am I
> right?

Yes. Because we use the kernel governors to manage the "performance controls".

> 
> Do I understand correctly that with this driver, the AMD System Management
> Unit (SMU -- is it the right name?) is *not* working in autonomous mode, but
> is almost entirely under the OS control?
> 
> By "autonomous mode" I mean: you run a workload, the driver doesn't select
> any desired frequency, and the SMU does its thing and selects the CPU clock
> freq on its own. That's not what's happing here, AFAIU. I tried using amd-
> pstate using the "userspace" governor (very useful for testing ;), and set
> frequencies like
> 
>   echo 1200000 >
> /sys/devices/system/cpu/cpufreq/policy11/scaling_setspeed
> 
> and then, whatever the load on CPU#11, "cpupower monitor" would show
> me a constant clock of ~1.2GHz.
> 
> Don't get me wrong, this is a very good driver! I'm super happy that the
> kernel can finally see all the P-States, instead of just 3.
> 
> I'm just trying to clarify that we're using CPPC with autonomous selection
> disabled, so I don't think the documentation in admin-guide should describe
> features like the R/W "performance controls" that don't make sense in this
> context. Especially the "Energy Performance Preference (EPP)", that you
> would use to tell the SMU "do what you want, just push a little on the
> performance side".

No problem! 😊 Actually, we combine the kernel governor + AMD SMU Arbiter to manage the target frequency with this driver.
Kernel governor such as "schedutil" can predict the workload to calculate most reasonable desired performance value via Linux CPU CFS scheduler.
Then amd-pstate driver can leverage this governor to manage the "performance controls" to SMU CPU clock DPM Arbiter. Because SMU firmware can detect the MSR operations at the same time as well.
At last, the SMU will calculate the final target frequency in the hardware.

> 
> I can see that the driver, internally, is sending "lowest nonlinear" as minimum
> perf, 255 as maximum perf, and whatever the governor wants as desired perf.
> It just isn't exposed in sysfs so there isn't much point in documenting that.
> 

I will add more descriptions in the RST documentation in V3. Thank you for your suggestion!

> > [...]
> > Full MSR Support
> > -----------------
> >
> > Some new Zen3 processors such as Cezanne provide the MSR registers
> > directly while the :c:macro:`X86_FEATURE_AMD_CPPC_EXT` CPU feature
> flag is set.
> > ``amd-pstate`` can handle the MSR register to implement the fast
> > switch function in ``CPUFreq`` that can shrink latency of frequency
> > control on the interrupt context.
> 
> A-ha! Cezanne. I have an EPYC Milan, so that's probably why I can't get the
> "Full MSR Support". I'll test the "Shared Memory Support" then, and report
> my data.
> 

Looking forward to your result data. 😊

Thanks,
Ray

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ