linux-kernel - Re: [PATCH RFC/TEST] sched: make sync affine wakeups work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <53633645.9090308@redhat.com>
Date:	Fri, 02 May 2014 02:08:05 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Mike Galbraith <umgwanakikbuti@...il.com>
CC:	linux-kernel@...r.kernel.org, morten.rasmussen@....com,
	mingo@...nel.org, peterz@...radead.org,
	george.mccollister@...il.com, ktkhai@...allels.com
Subject: Re: [PATCH RFC/TEST] sched: make sync affine wakeups work

On 05/02/2014 01:58 AM, Mike Galbraith wrote:
> On Fri, 2014-05-02 at 07:32 +0200, Mike Galbraith wrote: 
>> On Fri, 2014-05-02 at 00:42 -0400, Rik van Riel wrote: 
>>> Currently sync wakeups from the wake_affine code cannot work as
>>> designed, because the task doing the sync wakeup from the target
>>> cpu will block its wakee from selecting that cpu.
>>>
>>> This is despite the fact that whether or not the wakeup is sync
>>> determines whether or not we want to do an affine wakeup...
>>
>> If the sync hint really did mean we ARE going to schedule RSN, waking
>> local would be a good thing.  It is all too often a big fat lie.
> 
> One example of that is say pgbench.  The mother of all work (server
> thread) for that load wakes with sync hint.  Let the server wake the
> first of a small herd CPU affine, and that first wakee then preempt the
> server (mother of all work) that drives the entire load.
> 
> Byebye throughput.
> 
> When there's only one wakee, and there's really not enough overlap to at
> least break even, waking CPU affine is a great idea.  Even when your
> wakees only run for a short time, if you wake/get_preempted repeat, the
> load will serialize.

I see a similar issue with specjbb2013, with 4 backend and
4 frontend JVMs on a 4 node NUMA system.

The NUMA balancing code nicely places the memory of each JVM
on one NUMA node, but then the wake_affine code will happily
run all of the threads anywhere on the system, totally ruining
memory locality.

The front end and back end only exchange a few hundred messages
a second, over loopback tcp, so the switching rate between
threads is quite low...

I wonder if it would make sense for wake_affine to be off by
default, and only switch on when the right conditions are
detected, instead of having it on by default like we have now?

I have some ideas on that, but I should probably catch some
sleep before trying to code them up :)

Meanwhile, the test patch that I posted may help us figure out
whether the "sync" option in the current wake_affine code does
anything useful.

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/