linux-kernel - Re: [RFC][PATCH v021 5/9] PM: EM: Introduce em_dev_expand_perf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f26fac2-787d-44a9-a0cd-c3035a91149c@arm.com>
Date: Mon, 6 Jan 2025 13:59:59 +0100
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>,
 Linux PM <linux-pm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
 Lukasz Luba <lukasz.luba@....com>, Peter Zijlstra <peterz@...radead.org>,
 Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
 Morten Rasmussen <morten.rasmussen@....com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
 Pierre Gondois <pierre.gondois@....com>
Subject: Re: [RFC][PATCH v021 5/9] PM: EM: Introduce
 em_dev_expand_perf_domain()

On 17/12/2024 21:40, Rafael J. Wysocki wrote:
> On Tue, Dec 17, 2024 at 10:38 AM Dietmar Eggemann
> <dietmar.eggemann@....com> wrote:
>>
>> On 29/11/2024 17:02, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>>
>>> Introduce a helper function for adding a CPU to an existing EM perf
>>> domain.
>>>
>>> Subsequently, this will be used by the intel_pstate driver to add new
>>> CPUs to existing perf domains when those CPUs go online for the first
>>> time after the initialization of the driver.
>>>
>>> No intentional functional impact.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>> ---
>>>
>>> v0.1 -> v0.2: No changes
>>
>> Could you add information why this new EM interface is needed?
> 
> There is some of it in the changelog already.
> 
> In fact, it is only needed in a corner case when the system starts
> with some CPUs offline and they only go online later (as a result of
> an explicit request from user space).  That is the only case when a
> new CPU may need to be added to an existing perf domain.

OK, I see. Arm doesn't need this since we derive the masks from the
CPUfreq policies so far.

I just verified, we both keep hotplugged-out CPU within the PD. That's
why we mask the PD cpus with cpu_online_mask in:

find_energy_efficient_cpu()

  ...

  for (; pd; pd = pd->next)

    cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);

>> IIRC, you can't use the existing way (cpufreq_driver::register_em) since
>> it gets called to early (3) for the PD cpumasks to be ready. This issue
>> will be there for any system in which uarch domains are not congruent
>> with clock domains which we hadn't have to deal with Arm's heterogeneous
>> CPUs so far.
> 
> Basically, yes.
> 
>> __init intel_pstate_init()
>>
>>   intel_pstate_register_driver()
>>
>>     cpufreq_register_driver()
>>
>>       subsys_interface_register()
>>
>>         sif->add_dev() -> cpufreq_add_dev()
>>
>>           cpufreq_online()
>>
>>             if (!new_policy && cpufreq_driver->online)
>>
>>             else
>>
>>               cpufreq_driver->init() -> intel_pstate_cpu_init()
>>
>>                 __intel_pstate_cpu_init()
>>
>>                   intel_pstate_init_cpu()
>>
>>                     intel_pstate_get_cpu_pstates()
>>
>>                       hybrid_add_to_domain()
>>
>>                         em_dev_expand_perf_domain()              <-- (1)
>>
>>                   intel_pstate_init_acpi_perf_limits()
>>
>>                     intel_pstate_set_itmt_prio()                 <-- (2)
>>
>>             if (new_policy)
>>
>>               cpufreq_driver->register_em()                      <-- (3)
>>
>>     hybrid_init_cpu_capacity_scaling()
>>
>>       hybrid_refresh_cpu_capacity_scaling()
>>
>>         __hybrid_refresh_cpu_capacity_scaling()                  <-- (4)
>>
>>         hybrid_register_all_perf_domains()
>>
>>           hybrid_register_perf_domain()
>>
>>             em_dev_register_perf_domain()                        <-- (5)
>>
>>       /* Enable EAS */
>>       sched_clear_itmt_support()                                 <-- (6)
>>
>> Debugging this on a 'nosmt' i7-13700K (online mask =
>> [0,2,4,6,8,10,12,14,16-23]
>>
>> (1) Add CPU to existing hybrid PD or create new hybrid PD.
> 
> Not exactly.
> 
> (1) is just to expand an existing perf domain if the CPU is new (was
> offline all the time previously).

OK.

> 
> Likewise, the direct hybrid_register_perf_domain() call in
> hybrid_add_to_domain() is to add a new perf domain if the given CPU is
> new (was offline all the time previously) and is the first one of the
> given type (say, the system is starting with all E-cores offline).
> It won't succeed before (4).
> 
> For CPUs that are online to start with, hybrid_add_to_domain() assigns
> them to hybrid domains and PDs are created for them in
> hybrid_register_all_perf_domains().

Understood.

> 
>> (2) Triggers sched domain rebuild (+ enabling EAS) already here during
>>     startup ?
> 
> This is just to enable ITMT which is the default mechanism for Intel
> hybrid platforms.  It also requires a rebuild of sched domains to be
> enabled.
> 
>>     IMHO, reason is that max_highest_perf > min_highest_perf because of
>>     different itmt prio
> 
> Yes (which means that the platform is at least not homogenous).
> 
> This really has been introduced for the handling of favored cores on
> otherwise non-hybrid platforms (say Tiger Lake).
> 
>>     Happens for CPU8 on my machine (after CPU8 is added to hybrid PD
>>     0,2,4,6,8) (itmt prio for CPU8=69 (1024) instead of 68 (1009)).
>>     So it looks like EAS is enabled before (6) ?
> 
> No, it is ITMT because CPU8 is a favored core.
> 
>> (3) ARM's way to do (5)
>> (4) Setting hybrid_max_perf_cpu
>> (5) Register EM here
>> (6) Actual call to initially triggers sched domain rebuild (+ enable
>>     EAS) (done already in (2) on my machine)
> 
> This is the second rebuild of sched domains to turn off ITMT and
> enable EAS.  The previous one is to enable ITMT.
> 
> The earlier enabling of ITMT could be avoided I think, but that would
> be a complication on platforms that contain favored cores but
> otherwise are not hybrid.

OK, the fact that EAS will already be enabled in (2) is not an issue IMHO.

> 
>> So (3) is not possible for Intel hybrid since the policy's cpumask(s)
> 
> It is possible in principle, but not particularly convenient because
> at that point it is not clear whether or not the platform is really
> hybrid and SMT is off and so whether or not EAS is to be used.
> 
>> contain only one CPUs, i.e. CPUs are not sharing clock.
>> And those cpumasks have to be build under (1) to be used in (5)?
> 
> They are built by the caller of (1) to be precise, but yes.

OK.

I still see an issue in putting all performance CPUs in one PD.

find_energy_efficient_cpu() assumes that all PD CPUs has the same CPU
capacity value.

  find_energy_efficient_cpu()

    ...

    for (; pd; pd = pd->next) {

      ...
      /* Account external pressure for the energy estimation */
      cpu = cpumask_first(cpus);
      cpu_actual_cap = get_actual_cpu_capacity(cpu); --> (1)


Even though X86 does not implement hw_load_avg() or
cpufreq_get_pressure() we would still assume that all PD CPUs have the
same cpu_capacity:


  get_actual_cpu_capacity() <-- (1)

    capacity = arch_scale_cpu_capacity(cpu)


Now the error introduced is small (1024 versus 1009) on my i7-13700K but
it's there. How big can those diffs based om itmt prio be?