linux-kernel - Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing destination

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55522005.1080705@redhat.com>
Date:	Tue, 12 May 2015 11:45:09 -0400
From:	Rik van Riel <riel@...hat.com>
To:	dedekind1@...il.com
CC:	linux-kernel@...r.kernel.org, mgorman@...e.de,
	peterz@...radead.org, jhladky@...hat.com
Subject: Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing
 destination

On 05/12/2015 09:50 AM, Artem Bityutskiy wrote:
> On Fri, 2015-05-08 at 16:03 -0400, Rik van Riel wrote:
>> Currently the load balancer has a preference for moving
>> tasks to their preferred nodes (NUMA_FAVOUR_HIGHER, true),
>> but there is no resistance to moving tasks away from their
>> preferred nodes (NUMA_RESIST_LOWER, false).  That setting
>> was arrived at after a fair amount of experimenting, and
>> is probably correct.
> 
> FYI, (NUMA_RESIST_LOWER, true) does not make any difference for me.

I am not surprised by this.

The idle balancing code will simply take a runnable-but-not-running
task off the run queue of the busiest CPU in the system. On a system
with some idle time, it is likely there are only one or two tasks
available on the run queue of the busiest CPU, which leaves little or
no choice to the NUMA_FAVOUR_HIGHER and NUMA_RESIST_LOWER code.

The idle balancing code, through find_busiest_queue() already tries
to select a CPU where at least one of the runnable tasks is on the
wrong NUMA node.

However, that task may well be the current task, leading us to steal
the other (runnable but on the queue) task instead, moving that one
to the wrong NUMA node.

I have a few poorly formed ideas on what could be done about that:

1) have fbq_classify_rq take the current task on the rq into account,
   and adjust the fbq classification if all the runnable-but-queued
   tasks are on the right node

2) ensure that rq->nr_numa_running and rq->nr_preferred_running also
   get incremented for kernel threads that are bound to a particular
   CPU - currently CPU-bound kernel threads will cause the NUMA
   statistics to look like a CPU has tasks that do not belong on that
   NUMA node

3) have detach_tasks take env->fbq_type into account when deciding
   whether to look at NUMA affinity at all

4) maybe have detach_tasks fail if env->fbq_type is regular or remote,
   but no !numa or on-the-wrong-node tasks were found ?  not sure if
   that would cause problems, or what kind...

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/