linux-kernel - Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220204070654.GF618915@linux.vnet.ibm.com>
Date:   Fri, 4 Feb 2022 12:36:54 +0530
From:   Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Mike Galbraith <efault@....de>,
        Gautham Shenoy <gautham.shenoy@....com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when
 SD_NUMA spans multiple LLCs

* Mel Gorman <mgorman@...hsingularity.net> [2022-02-03 14:46:52]:

> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index d201a7052a29..e6cd55951304 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2242,6 +2242,59 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>  		}
>  	}
> 
> +	/*
> +	 * Calculate an allowed NUMA imbalance such that LLCs do not get
> +	 * imbalanced.
> +	 */

We seem to adding this hunk before the sched_domains may be degenerated.
Wondering if we really want to do it before degeneration.

Let say we have 3 sched domains and we calculated the sd->imb_numa_nr for
all the 3 domains, then lets say the middle sched_domain gets degenerated. 
Would the sd->imb_numa_nr's still be relevant?


> +	for_each_cpu(i, cpu_map) {
> +		unsigned int imb = 0;
> +		unsigned int imb_span = 1;
> +
> +		for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
> +			struct sched_domain *child = sd->child;
> +
> +			if (!(sd->flags & SD_SHARE_PKG_RESOURCES) && child &&
> +			    (child->flags & SD_SHARE_PKG_RESOURCES)) {
> +				struct sched_domain *top, *top_p;
> +				unsigned int nr_llcs;
> +
> +				/*
> +				 * For a single LLC per node, allow an
> +				 * imbalance up to 25% of the node. This is an
> +				 * arbitrary cutoff based on SMT-2 to balance
> +				 * between memory bandwidth and avoiding
> +				 * premature sharing of HT resources and SMT-4
> +				 * or SMT-8 *may* benefit from a different
> +				 * cutoff.
> +				 *
> +				 * For multiple LLCs, allow an imbalance
> +				 * until multiple tasks would share an LLC
> +				 * on one node while LLCs on another node
> +				 * remain idle.
> +				 */
> +				nr_llcs = sd->span_weight / child->span_weight;
> +				if (nr_llcs == 1)
> +					imb = sd->span_weight >> 2;
> +				else
> +					imb = nr_llcs;
> +				sd->imb_numa_nr = imb;
> +
> +				/* Set span based on the first NUMA domain. */
> +				top = sd;
> +				top_p = top->parent;
> +				while (top_p && !(top_p->flags & SD_NUMA)) {
> +					top = top->parent;
> +					top_p = top->parent;
> +				}
> +				imb_span = top_p ? top_p->span_weight : sd->span_weight;

I am getting confused by imb_span.
Let say we have a topology of SMT -> MC -> DIE -> NUMA -> NUMA, with SMT and
MC domains having SD_SHARE_PKG_RESOURCES flag set.
We come here only for DIE domain.

imb_span set here is being used for both the subsequent sched domains
most likely they will be NUMA domains. Right?

> +			} else {
> +				int factor = max(1U, (sd->span_weight / imb_span));
> +
> +				sd->imb_numa_nr = imb * factor;

For SMT, (or any sched domains below the llcs) factor would be
sd->span_weight but imb_numa_nr and imb would be 0.
For NUMA (or any sched domain just above DIE), factor would be
sd->imb_numa_nr would be nr_llcs.
For subsequent sched_domains, the sd->imb_numa_nr would be some multiple of
nr_llcs. Right?


> +			}
> +		}
> +	}
> +
>  	/* Calculate CPU capacity for physical packages and nodes */
>  	for (i = nr_cpumask_bits-1; i >= 0; i--) {
>  		if (!cpumask_test_cpu(i, cpu_map))
> -- 
> 2.31.1
> 

-- 
Thanks and Regards
Srikar Dronamraju