linux-kernel - Re: [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid platforms without SMT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6ab0531a-d6d8-46ac-9afc-23cf87f37905@arm.com>
Date: Thu, 3 Apr 2025 11:47:22 +0100
From: Christian Loehle <christian.loehle@....com>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>,
 Linux PM <linux-pm@...r.kernel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Lukasz Luba <lukasz.luba@....com>,
 Peter Zijlstra <peterz@...radead.org>,
 Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Morten Rasmussen <morten.rasmussen@....com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
 Pierre Gondois <pierre.gondois@....com>
Subject: Re: [RFC][PATCH v0.3 0/6] cpufreq: intel_pstate: Enable EAS on hybrid
 platforms without SMT - alternative

On 3/7/25 19:12, Rafael J. Wysocki wrote:
> Hi Everyone,
> 
> This is a new take on the "EAS for intel_pstate" work:
> 
> https://lore.kernel.org/linux-pm/5861970.DvuYhMxLoT@rjwysocki.net/
> 
> with refreshed preparatory patches and a revised energy model design.
> 
> The following paragraph from the original cover letter still applies:
> 
> "The underlying observation is that on the platforms targeted by these changes,
> Lunar Lake at the time of this writing, the "small" CPUs (E-cores), when run at
> the same performance level, are always more energy-efficient than the "big" or
> "performance" CPUs (P-cores).  This means that, regardless of the scale-
> invariant utilization of a task, as long as there is enough spare capacity on
> E-cores, the relative cost of running it there is always lower."
> 
> However, this time perf domains are registered per CPU and in addition to the
> primary cost component, which is related to the CPU type, there is a small
> component proportional to performance whose role is to help balance the load
> between CPUs of the same type.
> 
> This is done to avoid migrating tasks too much between CPUs of the same type,
> especially between E-cores, which has been observed in tests of the previous
> iteration of this work.
> 
> The expected effect is still that the CPUs of the "low-cost" type will be
> preferred so long as there is enough spare capacity on any of them.
> 
> The first two patches in the series rearrange cpufreq checks related to EAS so
> that sched_is_eas_possible() doesn't have to access cpufreq internals directly
> and patch [3/6] changes those checks to also allow EAS to be used with cpufreq
> drivers that implement internal governors (like intel_pstate).
> 
> Patches [4-5/6] deal with the Energy Model code.  Patch [4/6] simply rearranges
> it so as to allow the next patch to be simpler and patch [5/6] adds a function
> that's used in the last patch.
> 
> Patch [6/6] is the actual intel_pstate modification which now is significantly
> simpler than before because it doesn't need to track the type of each CPU
> directly in order to put into the right perf domain.
> 
> Please refer to the individual patch changelogs for details.
> 
> For easier access, the series is available on the experimental/intel_pstate/eas-take2
> branch in linux-pm.git:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> experimental/intel_pstate/eas-take2
> 
> or
> 
> https://web.git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=experimental/intel_pstate/eas-take2
> 
> Thanks!
> 

Hi Rafael,
as promised I did the same tests as with v0.2, the results are better with v0.3,
hard to say though if that is because of the cache-affinity on the P-cores.

Interestingly our nosmt Raptor Lake 8+8 should be worse off with its 16 PDs now.
Maybe, if L2 is shared anyway, one PD for e-cores and per-CPU-PD for P-cores
could be experimented with too (so 4+1+1+1+1 for lunar lake).

Anyway these are the results, again 20 iterations of 5 minutes each:

Firefox YouTube 4K video playback:
EAS:
376.229 +-9.566835596650195
CAS:
661.323 +-18.951739322113248
(-43.1% energy used with EAS)
(cf -24.2% energy used with EAS v0.2)

Firefox Web Aquarium 500 fish.
EAS:
331.933 +-10.977847441299437
CAS:
515.594 +-16.997636567737562
(-35.6% energy used with EAS)
(Wasn't tested on v0.2, just to see if above was a lucky workload hit.)

Both don't show any performance hit with EAS (FPS are very stable for both).
v0.2 results:
https://lore.kernel.org/lkml/3861524b-b266-4e54-b7ab-fdccbb7b4177@arm.com/