[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1400040489.5157.44.camel@marge.simpson.net>
Date: Wed, 14 May 2014 06:08:09 +0200
From: Mike Galbraith <umgwanakikbuti@...il.com>
To: Rik van Riel <riel@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, morten.rasmussen@....com,
mingo@...nel.org, george.mccollister@...il.com,
ktkhai@...allels.com, Mel Gorman <mgorman@...e.de>,
"Vinod, Chegu" <chegu_vinod@...com>,
Suresh Siddha <suresh.b.siddha@...el.com>
Subject: Re: [PATCH] sched: wake up task on prev_cpu if not in
SD_WAKE_AFFINE domain with cpu
On Tue, 2014-05-13 at 10:08 -0400, Rik van Riel wrote:
> OK, after doing some other NUMA stuff, and then looking at the scheduler
> again with a fresh mind, I have drawn some more conclusions about what
> the scheduler does, and how it breaks NUMA locality :)
>
> 1) If the node_distance between nodes on a NUMA system is
> <= RECLAIM_DISTANCE, we will call select_idle_sibling for
> a wakeup of a previously existing task (SD_BALANCE_WAKE)
>
> 2) If the node distance exceeds RECLAIM_DISTANCE, we will
> wake up a task on prev_cpu, even if it is not currently
> idle
>
> This behaviour only happens on certain large NUMA systems,
> and is different from the behaviour on small systems.
> I suspect we will want to call select_idle_sibling with
> prev_cpu in case target and prev_cpu are not in the same
> SD_WAKE_AFFINE domain.
Sometimes. It's the same can of worms remote as it is local.. latency
gain may or may not outweigh cache miss pain.
> 3) If wake_wide is false, we call select_idle_sibling with
> the CPU number of the code that is waking up the task
>
> 4) If wake_wide is true, we call select_idle_sibling with
> the CPU number the task was previously running on (prev_cpu)
>
> In effect, the "wake task on waking task's CPU" behaviour
> is the default, regardless of how frequently a task wakes up
> its wakee, and regardless of impact on NUMA locality.
>
> This may need to be changed.
That behavior also improves the odds of communicating tasks sharing a
cache though.
> Am I overlooking anything?
No, I think you're seeing where the worms live.
> What benchmarks should I run to test any changes I make?
Mixed bag, it'll affects all, bursty, static, ramp up/down.
-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists