[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53633645.9090308@redhat.com>
Date: Fri, 02 May 2014 02:08:05 -0400
From: Rik van Riel <riel@...hat.com>
To: Mike Galbraith <umgwanakikbuti@...il.com>
CC: linux-kernel@...r.kernel.org, morten.rasmussen@....com,
mingo@...nel.org, peterz@...radead.org,
george.mccollister@...il.com, ktkhai@...allels.com
Subject: Re: [PATCH RFC/TEST] sched: make sync affine wakeups work
On 05/02/2014 01:58 AM, Mike Galbraith wrote:
> On Fri, 2014-05-02 at 07:32 +0200, Mike Galbraith wrote:
>> On Fri, 2014-05-02 at 00:42 -0400, Rik van Riel wrote:
>>> Currently sync wakeups from the wake_affine code cannot work as
>>> designed, because the task doing the sync wakeup from the target
>>> cpu will block its wakee from selecting that cpu.
>>>
>>> This is despite the fact that whether or not the wakeup is sync
>>> determines whether or not we want to do an affine wakeup...
>>
>> If the sync hint really did mean we ARE going to schedule RSN, waking
>> local would be a good thing. It is all too often a big fat lie.
>
> One example of that is say pgbench. The mother of all work (server
> thread) for that load wakes with sync hint. Let the server wake the
> first of a small herd CPU affine, and that first wakee then preempt the
> server (mother of all work) that drives the entire load.
>
> Byebye throughput.
>
> When there's only one wakee, and there's really not enough overlap to at
> least break even, waking CPU affine is a great idea. Even when your
> wakees only run for a short time, if you wake/get_preempted repeat, the
> load will serialize.
I see a similar issue with specjbb2013, with 4 backend and
4 frontend JVMs on a 4 node NUMA system.
The NUMA balancing code nicely places the memory of each JVM
on one NUMA node, but then the wake_affine code will happily
run all of the threads anywhere on the system, totally ruining
memory locality.
The front end and back end only exchange a few hundred messages
a second, over loopback tcp, so the switching rate between
threads is quite low...
I wonder if it would make sense for wake_affine to be off by
default, and only switch on when the right conditions are
detected, instead of having it on by default like we have now?
I have some ideas on that, but I should probably catch some
sleep before trying to code them up :)
Meanwhile, the test patch that I posted may help us figure out
whether the "sync" option in the current wake_affine code does
anything useful.
--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists