linux-kernel - Re: [RFC][PATCH v1 3/3] cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef951861-2759-40ed-9d8c-d2eb92da632c@arm.com>
Date: Tue, 21 May 2024 14:51:04 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, x86 Maintainers
 <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
 Linux PM <linux-pm@...r.kernel.org>, Thomas Gleixner <tglx@...utronix.de>,
 Peter Zijlstra <peterz@...radead.org>,
 Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
 Ricardo Neri <ricardo.neri@...el.com>, Tim Chen <tim.c.chen@...el.com>
Subject: Re: [RFC][PATCH v1 3/3] cpufreq: intel_pstate: Set asymmetric CPU
 capacity on hybrid systems

On 06/05/2024 16:39, Rafael J. Wysocki wrote:
> On Thu, May 2, 2024 at 12:43 PM Dietmar Eggemann
> <dietmar.eggemann@....com> wrote:
>>
>> On 25/04/2024 21:06, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>

[...]

>> So cpu_capacity has a direct mapping to itmt prio. cpu_capacity is itmt
>> prio with max itmt prio scaled to 1024.
> 
> Right.
> 
> The choice to make the ITMT prio reflect the capacity is deliberate,
> although this code works with values retrieved via CPPC (which are the
> same as the HWP_CAP values in the majority of cases but not always).
> 
>> Running it on i7-13700K (while allowing SMT) gives:
>>
>> root@...liver:~# dmesg | grep sched_set_itmt_core_prio
>> [    3.957826] sched_set_itmt_core_prio() cpu=0 prio=68
>> [    3.990401] sched_set_itmt_core_prio() cpu=1 prio=68
>> [    4.015551] sched_set_itmt_core_prio() cpu=2 prio=68
>> [    4.040720] sched_set_itmt_core_prio() cpu=3 prio=68
>> [    4.065871] sched_set_itmt_core_prio() cpu=4 prio=68
>> [    4.091018] sched_set_itmt_core_prio() cpu=5 prio=68
>> [    4.116175] sched_set_itmt_core_prio() cpu=6 prio=68
>> [    4.141374] sched_set_itmt_core_prio() cpu=7 prio=68
>> [    4.166543] sched_set_itmt_core_prio() cpu=8 prio=69
>> [    4.196289] sched_set_itmt_core_prio() cpu=9 prio=69
>> [    4.214964] sched_set_itmt_core_prio() cpu=10 prio=69
>> [    4.239281] sched_set_itmt_core_prio() cpu=11 prio=69
> 
> CPUs 8 - 10 appear to be "favored cores" that can turbo up higher than
> the other P-cores.
> 
>> [    4.263438] sched_set_itmt_core_prio() cpu=12 prio=68
>> [    4.283790] sched_set_itmt_core_prio() cpu=13 prio=68
>> [    4.308905] sched_set_itmt_core_prio() cpu=14 prio=68
>> [    4.331751] sched_set_itmt_core_prio() cpu=15 prio=68
>> [    4.356002] sched_set_itmt_core_prio() cpu=16 prio=42
>> [    4.381639] sched_set_itmt_core_prio() cpu=17 prio=42
>> [    4.395175] sched_set_itmt_core_prio() cpu=18 prio=42
>> [    4.425625] sched_set_itmt_core_prio() cpu=19 prio=42
>> [    4.449670] sched_set_itmt_core_prio() cpu=20 prio=42
>> [    4.479681] sched_set_itmt_core_prio() cpu=21 prio=42
>> [    4.506319] sched_set_itmt_core_prio() cpu=22 prio=42
>> [    4.523774] sched_set_itmt_core_prio() cpu=23 prio=42

I wonder what the relation between this CPU capacity value based on
HWP_CAP is to the per-IPC class performance values of the 'HFI
performance and efficiency score' table is.

Running '[PATCH v3 00/24] sched: Introduce classes of tasks for load
balance' on i7-13700K w/ 'nosmt' I get:

			Score
CPUs	         Class  0 	1	2	3
			SSE	AVX2	VNNI	PAUSE		

0 2,4,6, 12, 14		68 	80	106	53
8, 10			69 	81	108	54
16-23			42 	42	42	42

Looks like the HWP_CAP values are in sync with the scores of IPP Class
0. I was expecting that the HWP_CAP values reflect more an average over
all classes? Or maybe I misinterpret this relation?

[...]

>>> If the driver's "no_trubo" sysfs attribute is updated, all of the CPU
>>> capacity information is computed from scratch to reflect the new turbo
>>> status.
>>
>> So if I do:
>>
>> echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>>
>> I get:
>>
>> [ 1692.801368] hybrid_update_cpu_scaling() called
>> [ 1692.801381] hybrid_update_cpu_scaling() max_cap_perf=44, max_perf_cpu=0
>> [ 1692.801389] hybrid_set_cpu_capacity() cpu=1 cap=1024
>> [ 1692.801395] hybrid_set_cpu_capacity() cpu=2 cap=1024
>> [ 1692.801399] hybrid_set_cpu_capacity() cpu=3 cap=1024
>> [ 1692.801402] hybrid_set_cpu_capacity() cpu=4 cap=1024
>> [ 1692.801405] hybrid_set_cpu_capacity() cpu=5 cap=1024
>> [ 1692.801408] hybrid_set_cpu_capacity() cpu=6 cap=1024
>> [ 1692.801410] hybrid_set_cpu_capacity() cpu=7 cap=1024
>> [ 1692.801413] hybrid_set_cpu_capacity() cpu=8 cap=1024
>> [ 1692.801416] hybrid_set_cpu_capacity() cpu=9 cap=1024
>> [ 1692.801419] hybrid_set_cpu_capacity() cpu=10 cap=1024
>> [ 1692.801422] hybrid_set_cpu_capacity() cpu=11 cap=1024
>> [ 1692.801425] hybrid_set_cpu_capacity() cpu=12 cap=1024
>> [ 1692.801428] hybrid_set_cpu_capacity() cpu=13 cap=1024
>> [ 1692.801431] hybrid_set_cpu_capacity() cpu=14 cap=1024
>> [ 1692.801433] hybrid_set_cpu_capacity() cpu=15 cap=1024
>> [ 1692.801436] hybrid_set_cpu_capacity() cpu=16 cap=605
>> [ 1692.801439] hybrid_set_cpu_capacity() cpu=17 cap=605
>> [ 1692.801442] hybrid_set_cpu_capacity() cpu=18 cap=605
>> [ 1692.801445] hybrid_set_cpu_capacity() cpu=19 cap=605
>> [ 1692.801448] hybrid_set_cpu_capacity() cpu=20 cap=605
>> [ 1692.801451] hybrid_set_cpu_capacity() cpu=21 cap=605
>> [ 1692.801453] hybrid_set_cpu_capacity() cpu=22 cap=605
>> [ 1692.801456] hybrid_set_cpu_capacity() cpu=23 cap=605
>>
>> Turbo on this machine stands only for the cpu_capacity diff 1009 vs 1024?
> 
> Not really.
> 
> The capacity of the fastest CPU is always 1024 and the capacities of
> all of the other CPUs are adjusted to that.
> 
> When turbo is disabled, the capacity of the "favored cores" is the
> same as for the other P-cores (i.e. 1024) and the capacity of E-cores
> is relative to that.
> 
> Of course, this means that task placement may be somewhat messed up
> after disabling or enabling turbo (which is a global switch), but I
> don't think that there is a way to avoid it.

I assume that this is OK. In task placement we don't deal with a system
of perfectly aligned values (including their sums) anyway.
And we recreate the sched domains (including updating the capacity sums
on sched groups) after this so the so load balance (smp nice etc) should
be fine too.