linux-kernel - Re: [PATCH 4/4] sched/topology: the group balance cpu must be a cpu where the group is installed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <91317113-f1a7-a1c6-812e-cbda5284d404@redhat.com>
Date:   Tue, 25 Apr 2017 12:56:23 -0300
From:   Lauro Venancio <lvenanci@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     lwang@...hat.com, riel@...hat.com, Mike Galbraith <efault@....de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] sched/topology: the group balance cpu must be a cpu
 where the group is installed

On 04/25/2017 12:39 PM, Peter Zijlstra wrote:
> On Tue, Apr 25, 2017 at 05:27:03PM +0200, Peter Zijlstra wrote:
>> On Tue, Apr 25, 2017 at 05:22:36PM +0200, Peter Zijlstra wrote:
>>> On Tue, Apr 25, 2017 at 05:12:00PM +0200, Peter Zijlstra wrote:
>>>> But I'll first try and figure out why I'm not having empty masks.
>>> Ah, so this is before all the degenerate stuff, so there's a bunch of
>>> redundant domains below that make it work -- and there always will be,
>>> unless FORCE_SD_OVERLAP.
>>>
>>> Now I wonder what triggered it.. let me put it back.
>> Ah! the asymmetric setup, where @sibling is entirely uninitialized for
>> the top domain.
>>
> And it still works correctly too:
>
>
> [    0.078756] XXX 1 NUMA 
> [    0.079005] XXX 2 NUMA 
> [    0.080003] XXY 0-2:0
> [    0.081007] XXX 1 NUMA 
> [    0.082005] XXX 2 NUMA 
> [    0.083003] XXY 1-3:3
> [    0.084032] XXX 1 NUMA 
> [    0.085003] XXX 2 NUMA 
> [    0.086003] XXY 1-3:3
> [    0.087015] XXX 1 NUMA 
> [    0.088003] XXX 2 NUMA 
> [    0.089002] XXY 0-2:0
>
>
> [    0.090007] CPU0 attaching sched-domain:
> [    0.091002]  domain 0: span 0-2 level NUMA
> [    0.092002]   groups: 0 (mask: 0), 1, 2
> [    0.093002]   domain 1: span 0-3 level NUMA
> [    0.094002]    groups: 0-2 (mask: 0) (cpu_capacity: 3072), 1-3 (cpu_capacity: 3072)
> [    0.095005] CPU1 attaching sched-domain:
> [    0.096003]  domain 0: span 0-3 level NUMA
> [    0.097002]   groups: 1 (mask: 1), 2, 3, 0
> [    0.098004] CPU2 attaching sched-domain:
> [    0.099002]  domain 0: span 0-3 level NUMA
> [    0.100002]   groups: 2 (mask: 2), 3, 0, 1
> [    0.101004] CPU3 attaching sched-domain:
> [    0.102002]  domain 0: span 1-3 level NUMA
> [    0.103002]   groups: 3 (mask: 3), 1, 2
> [    0.104002]   domain 1: span 0-3 level NUMA
> [    0.105002]    groups: 1-3 (mask: 3) (cpu_capacity: 3072), 0-2 (cpu_capacity: 3072)
>
>
> static void
> build_group_mask(struct sched_domain *sd, struct sched_group *sg, struct cpumask *mask)
> {
>         const struct cpumask *sg_span = sched_group_cpus(sg);
>         struct sd_data *sdd = sd->private;
>         struct sched_domain *sibling;
>         int i, funny = 0;
>
>         cpumask_clear(mask);
>
>         for_each_cpu(i, sg_span) {
>                 sibling = *per_cpu_ptr(sdd->sd, i);
>
>                 if (!sibling->child) {
>                         funny = 1;
>                         printk("XXX %d %s %*pbl\n", i, sd->name, cpumask_pr_args(sched_domain_span(sibling)));
>                         continue;
>                 }
>
>                 /* If we would not end up here, we can't continue from here */
>                 if (!cpumask_equal(sg_span, sched_domain_span(sibling->child)))
>                         continue;
>
>                 cpumask_set_cpu(i, mask);
>         }
>
>         if (funny) {
>                 printk("XXY %*pbl:%*pbl\n",
>                                 cpumask_pr_args(sg_span),
>                                 cpumask_pr_args(mask));
>         }
> }
>
>
> So that will still get the right balance cpu and thus sgc.
>
> Another thing I've been thinking about; I think we can do away with the
> kzalloc() in build_group_from_child_sched_domain() and use the sdd->sg
> storage.
I considered this too. I decided to do not change this because I was not
sure if the kzalloc() was there for performance reasons. Currently, all
groups are allocated in the NUMA node they are used.
If we use sdd->sg storage, we may have groups allocated in one NUMA node
being used in another node.
>
> I just didn't want to move too much code around again, and ideally put
> more assertions in place to catch bad stuff; I just haven't had a good
> time thinking of good assertions :/