lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 3 Feb 2021 10:27:40 +0000
From:   "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To:     Valentin Schneider <valentin.schneider@....com>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "mgorman@...e.de" <mgorman@...e.de>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "dietmar.eggemann@....com" <dietmar.eggemann@....com>,
        "morten.rasmussen@....com" <morten.rasmussen@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     "linuxarm@...neuler.org" <linuxarm@...neuler.org>,
        "xuwei (O)" <xuwei5@...wei.com>,
        "Liguozhu (Kenneth)" <liguozhu@...ilicon.com>,
        "tiantao (H)" <tiantao6@...ilicon.com>,
        wanghuiqiang <wanghuiqiang@...wei.com>,
        "Zengtao (B)" <prime.zeng@...ilicon.com>,
        Jonathan Cameron <jonathan.cameron@...wei.com>,
        "guodong.xu@...aro.org" <guodong.xu@...aro.org>,
        Meelis Roos <mroos@...ux.ee>
Subject: RE: [PATCH] sched/topology: fix the issue groups don't span
 domain->span for NUMA diameter > 2



> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Wednesday, February 3, 2021 11:18 PM
> To: 'Valentin Schneider' <valentin.schneider@....com>;
> vincent.guittot@...aro.org; mgorman@...e.de; mingo@...nel.org;
> peterz@...radead.org; dietmar.eggemann@....com; morten.rasmussen@....com;
> linux-kernel@...r.kernel.org
> Cc: linuxarm@...neuler.org; xuwei (O) <xuwei5@...wei.com>; Liguozhu (Kenneth)
> <liguozhu@...ilicon.com>; tiantao (H) <tiantao6@...ilicon.com>; wanghuiqiang
> <wanghuiqiang@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>; Jonathan
> Cameron <jonathan.cameron@...wei.com>; guodong.xu@...aro.org; Meelis Roos
> <mroos@...ux.ee>
> Subject: RE: [PATCH] sched/topology: fix the issue groups don't span
> domain->span for NUMA diameter > 2
> 
> 
> 
> > -----Original Message-----
> > From: Valentin Schneider [mailto:valentin.schneider@....com]
> > Sent: Wednesday, February 3, 2021 4:17 AM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>;
> > vincent.guittot@...aro.org; mgorman@...e.de; mingo@...nel.org;
> > peterz@...radead.org; dietmar.eggemann@....com; morten.rasmussen@....com;
> > linux-kernel@...r.kernel.org
> > Cc: linuxarm@...neuler.org; xuwei (O) <xuwei5@...wei.com>; Liguozhu
> (Kenneth)
> > <liguozhu@...ilicon.com>; tiantao (H) <tiantao6@...ilicon.com>;
> wanghuiqiang
> > <wanghuiqiang@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>; Jonathan
> > Cameron <jonathan.cameron@...wei.com>; guodong.xu@...aro.org; Song Bao Hua
> > (Barry Song) <song.bao.hua@...ilicon.com>; Meelis Roos <mroos@...ux.ee>
> > Subject: Re: [PATCH] sched/topology: fix the issue groups don't span
> > domain->span for NUMA diameter > 2
> >
> > On 01/02/21 16:38, Barry Song wrote:
> > > @@ -964,6 +941,12 @@ static void init_overlap_sched_group(struct
> sched_domain
> > *sd,
> > >
> > >       build_balance_mask(sd, sg, mask);
> > >       cpu = cpumask_first_and(sched_group_span(sg), mask);
> > > +	/*
> > > +	 * for the group generated by grandchild, use the sgc of 2nd cpu
> > > +	 * because the 1st cpu might be used by another sched_group
> > > +	 */
> > > +	if (from_grandchild && cpumask_weight(mask) > 1)
> > > +		cpu = cpumask_next_and(cpu, sched_group_span(sg), mask);
> > >
> > >       sg->sgc = *per_cpu_ptr(sdd->sgc, cpu);
> >
> > So you are getting a (hopefully) unique ID for this group span at this
> > given topology level (i.e. sd->private) but as I had stated in that list of
> > issues, this creates an sgc that isn't attached to the local group of any
> > sched_domain, and thus won't get its capacity values updated.
> >
> > This can actually be seen via the capacity values you're getting at build
> > time:
> >
> > > [    0.868907] CPU0 attaching sched-domain(s):
> > ...
> > > [    0.869542]    domain-2: span=0-5 level=NUMA
> > > [    0.869559]     groups: 0:{ span=0-3 cap=4002 }, 5:{ span=4-5 cap=2048 }
> >                                                           ^^^^^^^^^^^^^^^^
> > > [    0.871177] CPU4 attaching sched-domain(s):
> > ...
> > > [    0.871200]   groups: 4:{ span=4 cap=977 }, 5:{ span=5 cap=1001 }
> > > [    0.871243]   domain-1: span=4-7 level=NUMA
> > > [    0.871257]    groups: 4:{ span=4-5 cap=1978 }, 6:{ span=6-7 cap=1968 }
> >                                 ^^^^^^^^^^^^^^^^
> >
> 
> Yes. I could see this issue.  We could hack update_group_capacity to let
> it scan both local_group  and sched_group generated by grandchild, but it
> seems your edit is much better.
> 
> > IMO what we want to do here is to hook this CPU0-domain-2-group5 to the sgc
> > of CPU4-domain1-group4. I've done that in the below diff - this gives us
> > groups with sgc's owned at lower topology levels, but this will only ever
> > be true for non-local groups. This has the added benefit of working with
> > single-CPU nodes. Briefly tested on your topology and the sunfire's (via
> > QEMU), and I didn't get screamed at.
> >
> > Before the fun police comes and impounds my keyboard, I'd like to point out
> > that we could leverage this cross-level sgc referencing hack to further
> > change the NUMA domains and pretty much get rid of overlapping groups
> > (that's what I was fumbling with in [1]).
> >
> > [1]: http://lore.kernel.org/r/jhjwnw11ak2.mognet@arm.com
> >
> > That is, rather than building overlapping groups and fixing them whenever
> > that breaks (distance > 2), we could have:
> > - the local group being the child domain's span (as always)
> > - all non-local NUMA groups spanning a single node each, with the right sgc
> >   cross-referencing.
> >
> > Thoughts?
> 
> I guess the original purpose of overlapping groups is creating as few groups
> as possible. If we totally remove overlapping groups, it seems we will create
> much more groups?
> For example, while node0 begins to build sched_domain for distance 20, it will
> add node2, since the distance between node2 and node3 is 15, so while node2
> is
> added, node3 is also added as node2's lower domain has covered node3. So we
> need
> two groups only for node0's sched_domain of distance level 20.
> +-------+                  +--------+
>  |       |      15          |        |
>  |  node0+----------------+ | node1  |
>  |       |                  |        |
>  +----+--+                XXX--------+
>       |                 XXX
>       |                XX
> 20    |         15   XX
>       |            XXX
>       |       X XXX
>  +----+----XXX               +-------+
>  |         |     15          |  node3|
>  | node2   +-----------------+       |
>  |         |                 +-------+
>  +---------+
> 

Sorry for missing a line:

node0-node3: 20

                             20
                 X XX X  X X  X  X  X
             XXXX                       X   X  X XX
           XX                                     XXX
         XX                                          X
       XX                                              XX
 +-----X-+                  +--------+                  XX
 |       |      15          |        |                   X
 |  node0+----------------+ | node1  |                   X
 |       |                  |        |                   X
 +----+--+                XXX--------+                   X
      |                 XXX                             XX
      |                XX                              XX
20    |         15   XX                             XXXX
      |            XXX                         XXXX
      |       X XXX                        XXXX
 +----+----XXX               +-------+ XXXX
 |         |     15          |  node3|XX
 | node2   +-----------------+       |
 |         |                 +-------+
 +---------+

> If we remove overlapping group, we will add a group for node2, another
> group for node3. Then we get three groups.
> 
> I am not sure if it is always positive for performance.
> 
> >
> > --->8---
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index b748999c9e11..ef43abb6b1fb 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -932,21 +932,15 @@ build_group_from_child_sched_domain(struct
> sched_domain
> > *sd, int cpu)
> >
> >  static void init_overlap_sched_group(struct sched_domain *sd,
> >  				     struct sched_group *sg,
> > -				     int from_grandchild)
> > +				     struct sched_domain *grandchild)
> >  {
> >  	struct cpumask *mask = sched_domains_tmpmask2;
> > -	struct sd_data *sdd = sd->private;
> > +	struct sd_data *sdd = grandchild ? grandchild->private : sd->private;
> >  	struct cpumask *sg_span;
> >  	int cpu;
> >
> >  	build_balance_mask(sd, sg, mask);
> >  	cpu = cpumask_first_and(sched_group_span(sg), mask);
> > -	/*
> > -	 * for the group generated by grandchild, use the sgc of 2nd cpu
> > -	 * because the 1st cpu might be used by another sched_group
> > -	 */
> > -	if (from_grandchild && cpumask_weight(mask) > 1)
> > -		cpu = cpumask_next_and(cpu, sched_group_span(sg), mask);
> >
> >  	sg->sgc = *per_cpu_ptr(sdd->sgc, cpu);
> >  	if (atomic_inc_return(&sg->sgc->ref) == 1)
> > @@ -979,7 +973,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int
> > cpu)
> >
> >  	for_each_cpu_wrap(i, span, cpu) {
> >  		struct cpumask *sg_span;
> > -		int from_grandchild = 0;
> > +		bool from_grandchild = false;
> >
> >  		if (cpumask_test_cpu(i, covered))
> >  			continue;
> > @@ -1033,7 +1027,7 @@ build_overlap_sched_groups(struct sched_domain *sd,
> int
> > cpu)
> >  		       !cpumask_subset(sched_domain_span(sibling->child),
> >  				       span)) {
> >  			sibling = sibling->child;
> > -			from_grandchild = 1;
> > +			from_grandchild = true;
> >  		}
> >
> >  		sg = build_group_from_child_sched_domain(sibling, cpu);
> > @@ -1043,7 +1037,7 @@ build_overlap_sched_groups(struct sched_domain *sd,
> int
> > cpu)
> >  		sg_span = sched_group_span(sg);
> >  		cpumask_or(covered, covered, sg_span);
> >
> > -		init_overlap_sched_group(sd, sg, from_grandchild);
> > +		init_overlap_sched_group(sd, sg, from_grandchild ? sibling : NULL);
> >
> Nice edit!
> Will merge your edit into v1 and send v2.
> 
> >  		if (!first)
> >  			first = sg;
> 
Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ