linux-kernel - Re: [PATCH 6/6] sched/numa: Delay retrying placement for automatic NUMA balance after wake

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 13 Feb 2018 15:43:26 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Ingo Molnar <mingo@...nel.org>, Mike Galbraith <efault@....de>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Giovanni Gherdovich <ggherdovich@...e.cz>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 6/6] sched/numa: Delay retrying placement for automatic
 NUMA balance after wake_affine

On Tue, Feb 13, 2018 at 02:18:12PM +0000, Mel Gorman wrote:
> On Tue, Feb 13, 2018 at 03:01:37PM +0100, Peter Zijlstra wrote:
> > On Tue, Feb 13, 2018 at 01:37:30PM +0000, Mel Gorman wrote:
> > > +static void
> > > +update_wa_numa_placement(struct task_struct *p, int prev_cpu, int target)
> > > +{
> > > +	unsigned long interval;
> > > +
> > > +	if (!static_branch_likely(&sched_numa_balancing))
> > > +		return;
> > > +
> > > +	/* If balancing has no preference then continue gathering data */
> > > +	if (p->numa_preferred_nid == -1)
> > > +		return;
> > > +
> > > +	/*
> > > +	 * If the wakeup is not affecting locality then it is neutral from
> > > +	 * the perspective of NUMA balacing so continue gathering data.
> > > +	 */
> > > +	if (cpus_share_cache(prev_cpu, target))
> > > +		return;
> > 
> > Dang, I wanted to mention this before, but it slipped my mind. The
> > comment and code don't match.
> > 
> > Did you want to write:
> > 
> > 	if (cpu_to_node(prev_cpu) == cpu_to_node(target))
> > 		return;
> > 
> 
> Well, it was deliberate. While it's possible to be on the same memory
> node and not sharing cache, the scheduler typically is more concerned with
> the LLC than NUMA per-se. If they share LLC, then I also assume that they
> share memory locality.

True, but the remaining code only has effect for numa balance, which is
concerned with nodes. So I don't see the point of using something
potentially smaller.

Suppose someone did hardware where a node has 2 cache clusters, then
we'd still set a wake_affine back-off for numa-balance, even though it
remains on the same node.

How would that be useful?