lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jhj5z9viw6p.mognet@arm.com>
Date:   Thu, 06 Aug 2020 17:19:26 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     linux-kernel@...r.kernel.org, Quentin Perret <qperret@...gle.com>,
        peterz@...radead.org, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, morten.rasmussen@....com
Subject: Re: [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards


On 06/08/20 15:20, Ingo Molnar wrote:
> * Valentin Schneider <valentin.schneider@....com> wrote:
>
>> We currently set this flag *only* on domains whose topology level exactly
>> match the level where we detect asymmetry (as returned by
>> asym_cpu_capacity_level()). This is rather problematic.
>>
>> Say there are two clusters in the system, one with a lone big CPU and the
>> other with a mix of big and LITTLE CPUs (as is allowed by DynamIQ):
>>
>> DIE [                ]
>> MC  [             ][ ]
>>      0   1   2   3  4
>>      L   L   B   B  B
>>
>> asym_cpu_capacity_level() will figure out that the MC level is the one
>> where all CPUs can see a CPU of max capacity, and we will thus set
>> SD_ASYM_CPUCAPACITY at MC level for all CPUs.
>>
>> That lone big CPU will degenerate its MC domain, since it would be alone in
>> there, and will end up with just a DIE domain. Since the flag was only set
>> at MC, this CPU ends up not seeing any SD with the flag set, which is
>> broken.
>>
>> Rather than clearing dflags at every topology level, clear it before
>> entering the topology level loop. This will properly propagate upwards
>> flags that are set starting from a certain level.
>>
>> Reviewed-by: Quentin Perret <qperret@...gle.com>
>> Signed-off-by: Valentin Schneider <valentin.schneider@....com>
>> ---
>>  kernel/sched/topology.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index 865fff3ef20a..42b89668e1e4 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -1985,11 +1985,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>>      /* Set up domains for CPUs specified by the cpu_map: */
>>      for_each_cpu(i, cpu_map) {
>>              struct sched_domain_topology_level *tl;
>> +		int dflags = 0;
>>
>>              sd = NULL;
>>              for_each_sd_topology(tl) {
>> -			int dflags = 0;
>> -
>>                      if (tl == tl_asym) {
>>                              dflags |= SD_ASYM_CPUCAPACITY;
>>                              has_asym = true;
>
> I'd suggest ordering all patches with potential side effects at the
> end, to make them easier to bisect.
>
> I.e. I'd reorder this series to do:
>
>  - Obviously correct renamings & cleanups
>
>  - Convert the code over to the new instrumented sd-flags method. This
>    will presumably spew a few warnings for problems the new debugging
>    checks catch in existing topologies.
>
>  - Do all the behavioral changes and fixes like this patch, even if we
>    think that they have no serious side effects.
>
> In that sense it might make sense to order the two ARM patches to the
> later stage as well - but I suppose it's OK to do those two first as
> well.
>

This does sound sensible; I can shuffle this around for v5.

FWIW the reason I had this very patch before the instrumentation is that
IMO it really wants to be propagated and could thus directly be tagged with
SDF_SHARED_PARENT when the instrumentation hits. It's a minor thing, but
having it after the instrumentation means that I'll first have to tag it
without any hierarchical metaflag, and then tag it with SDF_SHARED_PARENT
in the propagation fix.

If that sounds fine by you, I'll do just that.

> Nice series otherwise, these new checks look really useful and already
> caught bugs.
>

Thanks!

> Thanks,
>
>       Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ