linux-kernel - Re: [PATCH] sched: wake up task on prev_cpu if not in SD_WAKE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1400040489.5157.44.camel@marge.simpson.net>
Date:	Wed, 14 May 2014 06:08:09 +0200
From:	Mike Galbraith <umgwanakikbuti@...il.com>
To:	Rik van Riel <riel@...hat.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, morten.rasmussen@....com,
	mingo@...nel.org, george.mccollister@...il.com,
	ktkhai@...allels.com, Mel Gorman <mgorman@...e.de>,
	"Vinod, Chegu" <chegu_vinod@...com>,
	Suresh Siddha <suresh.b.siddha@...el.com>
Subject: Re: [PATCH] sched: wake up task on prev_cpu if not in
 SD_WAKE_AFFINE domain with cpu

On Tue, 2014-05-13 at 10:08 -0400, Rik van Riel wrote:

> OK, after doing some other NUMA stuff, and then looking at the scheduler
> again with a fresh mind, I have drawn some more conclusions about what
> the scheduler does, and how it breaks NUMA locality :)
> 
> 1) If the node_distance between nodes on a NUMA system is
>    <= RECLAIM_DISTANCE, we will call select_idle_sibling for
>    a wakeup of a previously existing task (SD_BALANCE_WAKE)
> 
> 2) If the node distance exceeds RECLAIM_DISTANCE, we will
>    wake up a task on prev_cpu, even if it is not currently
>    idle
> 
>    This behaviour only happens on certain large NUMA systems,
>    and is different from the behaviour on small systems.
>    I suspect we will want to call select_idle_sibling with
>    prev_cpu in case target and prev_cpu are not in the same
>    SD_WAKE_AFFINE domain.

Sometimes.  It's the same can of worms remote as it is local.. latency
gain may or may not outweigh cache miss pain.

> 3) If wake_wide is false, we call select_idle_sibling with
>    the CPU number of the code that is waking up the task
> 
> 4) If wake_wide is true, we call select_idle_sibling with
>    the CPU number the task was previously running on (prev_cpu)
> 
>    In effect, the "wake task on waking task's CPU" behaviour
>    is the default, regardless of how frequently a task wakes up
>    its wakee, and regardless of impact on NUMA locality.
> 
>    This may need to be changed.

That behavior also improves the odds of communicating tasks sharing a
cache though.

> Am I overlooking anything?

No, I think you're seeing where the worms live. 

> What benchmarks should I run to test any changes I make?

Mixed bag, it'll affects all, bursty, static, ramp up/down.

-Mike



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/