linux-kernel - Re: [PATCHv3 1/9] sched: Add static_key for asymmetric cpu capacity optimizations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 28 Jun 2018 19:16:46 +0200
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Morten Rasmussen <morten.rasmussen@....com>
Cc:     Quentin Perret <quentin.perret@....com>, peterz@...radead.org,
        mingo@...hat.com, valentin.schneider@....com,
        vincent.guittot@...aro.org, gaku.inami.xh@...esas.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCHv3 1/9] sched: Add static_key for asymmetric cpu capacity
 optimizations

On 06/28/2018 10:48 AM, Morten Rasmussen wrote:
> On Wed, Jun 27, 2018 at 05:41:22PM +0200, Dietmar Eggemann wrote:
>> On 06/22/2018 04:36 PM, Morten Rasmussen wrote:
>>> On Fri, Jun 22, 2018 at 09:22:22AM +0100, Quentin Perret wrote:

[...]

>>>> What would happen if you hotplugged an entire cluster ? You'd loose the
>>>> SD_ASYM_CPUCAPACITY flag but keep the static key is that right ? Should
>>>> we care ?
>>>
>>> I don't think we should care. The static key enables additional checks
>>> and tweaks but AFAICT none of them requires the SD_ASYM_CPUCAPACITY to
>>> be set and they should all be have no effect if that is the case. I
>>> added the static key just avoid the overhead on systems where they would
>>> have no effect. At least that is intention, I could course have broken
>>> things by mistake.
>>
>> I tent to agree for misfit but it would be easy to just add an
>> static_branch_disable_cpuslocked() into the else path of if(enable).
> 
> Depending on how we address the exclusive cpuset mess Quentin pointed
> out, it isn't as simple as that. As it is with our current
> not-yet-posted patches we would never remove the SD_ASYM_CPUCAPACITY
> flag, so we would never do a static_branch_disable_cpuslocked().

I was referring rather to the 'hotplug out an entire cluster' mentioned 
above. Looking closer into the code, I see that this will only work for 
traditional big.LITTLE (one big, one little cluster) since on them the 
DIE level vanishes and so the SD_ASYM_CPUCAPACITY flag.
Currently we only detect the flags when the system comes up (in the 
init_cpu_capacity_callback()) and not when we hotplug cpus. That's why 
it doesn't work for your 'three cluster system where 0-3 and 4-7 little 
clusters, and 8-11 is a big cluster' example.
So we don't re-detect the flags every time we call from the scheduler 
into the arch code and so the the DIE level arch topology flag function 
will return SD_ASYM_CPUCAPACITY.

> However, I'm currently thinking that we should probably set the flag
> according to the cpus in each root_domain. If we do that, we can't just
> disable the static_key because one root_domain is SMP, so we would have
> to track if _any_ root_domain (exclusive cpuset topology) has the flag
> set.

This is then in sync with the energy model where the static key should 
be enabled if any root domain can do EAS. The static key would be system 
wide, not per root domain.

> Is it worth it to iterate over all the exclusive cpuset with all
> the locking implications to disable the static_key in the corner case
> where exclusive cpuset are set up for all cpu capacities in the system?

Don't know either? But if the code to do this would include 'exclusive 
cpusets' and platforms like your three cluster example then maybe ...

[...]

>>> I'm tempted to say we shouldn't care in this situation. Setting the
>>> flags correctly in the three cluster example would require knowledge
>>> about the cpuset configuration which we don't have in the arch code so
>>> SD_ASYM_CPUCAPACITY flag detection would have be done by the
>>> sched_domain build code. However, not setting the flag according to the
>>> actual members of the exclusive cpuset means that homogeneous
>>> sched_domains might have SD_ASYM_CPUCAPACITY set enabling potentially
>>> wrong scheduling decisions.
>>
>> We could easily pass the CPU as an argument to all these
>> sched_domain_flags_f functions.
>>
>> -typedef int (*sched_domain_flags_f)(void);
>> +typedef int (*sched_domain_flags_f)(int cpu);
>>
>> In this case, the arch specific flag functions on a sched domain (sd) level
>> could use the corresponding sched_domain_mask_f function to iterate over the
>> span of the sd seen by CPU instead of all online cpus.
> 
> Yeah, but I'm afraid that doesn't solve the problem. The
> sched_domain_mask_f function doesn't tell us anything about any
> exclusive cpusets. We need sched_domain_span(sd) which is

You're right, I checked again and even if we hotplug, the flag function 
tl->mask(cpu) like cpu_coregroup_mask() always return the initial set of 
cpus. So we are only save if the DIE sd vanishes and with it the 
SD_ASYM_CPUCAPACITY flag since nobody will call the appropriate arch 
topology flag function.

> sched_domain_mask_f & cpu_map where cpu_map is the cpumask of the
> exclusive cpuset. So we either need to pass the sched_domain_span() mask
> through sched_domain_flags_f or work our way back to that information
> based on the cpu id. I'm not sure if the latter is possible.

Agreed.

[...]

>> We could also say that systems with 2 clusters with the same uArch and same
>> max CPU frequency and additional clusters are insane, like we e.g. do with
>> the Energy Model and CPUs with different uArch within a frequency domain?
> 
> I fail to get the point here. Are you saying that SMP is insane? ;-)

The '... and additional clusters' is important, a system like your three 
cluster system you describe above.

But the problem that if you hotplug-out the big cluster, the DIE level 
is still there with the SD_ASYM_CPUCAPACITY flag is due to the fact that 
we don't start the flag detection mechanism during hotplug, right?

>>> We can actually end up with this problem just by hotplugging too. If you
>>> unplug the entire big cluster in the three cluster example above, you
>>> preserve DIE level which would have SD_ASYM_CPUCAPACITY set even though
>>> we only have little cpus left.

IMHO, again that's because we do flag detection only at system startup.

>>> As I see it, we have two choices: 1) Set the flags correctly for
>>> exclusive cpusets which means some additional "fun" in the sched_domain
>>> hierarchy set up, or 2) ignore it and make sure that setting
>>
>> I assume you refer to this cpu parameter for sched_domain_flags_f under 1).
> 
> No, what I had in mind was to let sd_init() set SD_ASYM_CPUCAPACITY
> flag as the arch-code doesn't know about the exclusive cpusets as
> discussed above. In the meantime I have thought about a different
> approach where sd_init() only disables the flag when it is no longer
> needed due to exclusive cpusets.

In this case sd_init() has to know the cpu capacity values. I assume 
that the arch still has to call rebuild_sched_domains() after cpufreq is 
up and running due to the max cpu frequency dependency for cpu capacity.