[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51A9BCDA.9040006@colorfullife.com>
Date: Sat, 01 Jun 2013 11:20:26 +0200
From: Manfred Spraul <manfred@...orfullife.com>
To: Rik van Riel <riel@...hat.com>
CC: LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Davidlohr Bueso <davidlohr.bueso@...com>, hhuang@...hat.com,
Linus Torvalds <torvalds@...ux-foundation.org>,
Mike Galbraith <efault@....de>
Subject: Re: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks into
seperate queues
Hi Rik,
On 05/27/2013 07:57 PM, Rik van Riel wrote:
> On 05/26/2013 05:08 AM, Manfred Spraul wrote:
>> Introduce seperate queues for operations that do not modify the
>> semaphore values.
>> Advantages:
>> - Simpler logic in check_restart().
>> - Faster update_queue(): Right now, all wait-for-zero operations
>> are always tested, even if the semaphore value is not 0.
>> - wait-for-zero gets again priority, as in linux <=3.0.9
>
> Whether this complexity is wanted is not for
> me to decide, as I am not the ipc/sem.c
> maintainer. I'll leave that up to Andrew and Linus.
>
We can have only one: Either more logic or unoptimized loops.
But I don't think that the complexity increases that much, e.g. some
parts (e.g. check_restart()) get much simpler.
But:
Mike Galbraith ran 3.10-rc3 with and without my changes on a 4-socket
64-core system, and for me the results appears to be quite slow:
- semop-multi 256 64: around 600.000 ops/sec, both with and without my
additional patches [difference around 1%]
That is slower than my 1.4 GHz i3 with 3.9 - I get around 1.000.000
ops/sec
Is that expected?
My only idea would be trashing from writing sma->sem_otime.
- osim [i.e.: with reschedules] is much slower: around 21 us per schedule.
Perhaps the scheduler didn't pair the threads optimally: intra-cpu
reschedules
take around 2 us on my i3, inter-cpu reschedules around 16 us.
Thus I have attached my test apps.
- psem: psem tests sleeping semaphore operations.
Pairs of two threads perform ping-pong operations, starting with 1
semaphore and increasing up to the given max.
Either bound to the same cpu ("intra-cpu") or bound to different
cpus ("inter-cpu").
Inter-cpu is hardcoded, probably always a different socket
(distance max_cpus/2).
- semscale performs operations that never block, i.e. like your
semop-multi.c
It does:
- delays in user space to figure out what is the maximum number of
operations possible taking into account that user space will do something.
- use interleaving, to force the threads to different cores/sockets.
Perhaps something in 3.0.10-rc3 breaks the scalability?
--
Manfred
View attachment "psem.cpp" of type "text/x-c++src" (7123 bytes)
View attachment "semscale.cpp" of type "text/x-c++src" (7758 bytes)
Powered by blists - more mailing lists