lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 13 Feb 2018 11:45:41 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Mike Galbraith <efault@....de>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] sched/fair: Consider SD_NUMA when selecting the most
 idle group to schedule on

On Mon, Feb 12, 2018 at 05:11:30PM +0000, Mel Gorman wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 50442697b455..0192448e43a2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5917,6 +5917,18 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
>  	if (!idlest)
>  		return NULL;
>  
> +	/*
> +	 * When comparing groups across NUMA domains, it's possible for the
> +	 * local domain to be very lightly loaded relative to the remote
> +	 * domains but "imbalance" skews the comparison making remote CPUs
> +	 * look much more favourable. When considering cross-domain, add
> +	 * imbalance to the runnable load on the remote node and consider
> +	 * staying local.
> +	 */
> +	if ((sd->flags & SD_NUMA) &&
> +	    min_runnable_load + imbalance >= this_runnable_load)
> +		return NULL;
> +
>  	if (min_runnable_load > (this_runnable_load + imbalance))
>  		return NULL;

So this is basically a spread vs group decision, which we typically do
using SD_PREFER_SIBLNG. Now that flag is a bit awkward in that its set
on the child domain.

Now, we set it for SD_SHARE_PKG_RESOURCES (aka LLC), which means that for
our typical modern NUMA system we indicate we want to spread between the
lowest NUMA level. And regular load balancing will do so.

Now you modify the idlest code for initial placement to go against the
stable behaviour, which is unfortunate.

However, if we have numa balancing enabled, that will counteract
the normal spreading across nodes, so in that regard it makes sense, but
the above code is not conditional on numa balancing.

I'm torn and confused...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ