linux-kernel - Re: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks into seperate queues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1370080999.7268.5.camel@marge.simpson.net>
Date:	Sat, 01 Jun 2013 12:03:19 +0200
From:	Mike Galbraith <efault@....de>
To:	Manfred Spraul <manfred@...orfullife.com>
Cc:	Rik van Riel <riel@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Davidlohr Bueso <davidlohr.bueso@...com>, hhuang@...hat.com,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks
 into seperate queues

On Sat, 2013-06-01 at 11:20 +0200, Manfred Spraul wrote: 
> Hi Rik,
> 
> On 05/27/2013 07:57 PM, Rik van Riel wrote:
> > On 05/26/2013 05:08 AM, Manfred Spraul wrote:
> >> Introduce seperate queues for operations that do not modify the
> >> semaphore values.
> >> Advantages:
> >> - Simpler logic in check_restart().
> >> - Faster update_queue(): Right now, all wait-for-zero operations
> >>    are always tested, even if the semaphore value is not 0.
> >> - wait-for-zero gets again priority, as in linux <=3.0.9
> >
> > Whether this complexity is wanted is not for
> > me to decide, as I am not the ipc/sem.c
> > maintainer. I'll leave that up to Andrew and Linus.
> >
> We can have only one: Either more logic or unoptimized loops.
> But I don't think that the complexity increases that much, e.g. some 
> parts (e.g. check_restart()) get much simpler.
> 
> But:
> Mike Galbraith ran 3.10-rc3 with and without my changes on a 4-socket 
> 64-core system, and for me the results appears to be quite slow:
> - semop-multi 256 64: around 600.000 ops/sec, both with and without my 
> additional patches [difference around 1%]
>      That is slower than my 1.4 GHz i3 with 3.9 - I get around 1.000.000 
> ops/sec
>      Is that expected?
>      My only idea would be trashing from writing sma->sem_otime.
> 
> - osim [i.e.: with reschedules] is much slower: around 21 us per schedule.
>      Perhaps the scheduler didn't pair the threads optimally: intra-cpu 
> reschedules
>      take around 2 us on my i3, inter-cpu reschedules around 16 us.

I got -rt backports working, and it's faster at semop-multi, but one
hell of a lot slower at osim.  The goto again loop in sem_lock() is a
livelock in -rt, I had to do that differently.

-rtm is with our patches added, -rt is backport without.

vogelweide:/abuild/mike/:[0]# uname -r
3.8.13-rt9-rtm
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 33553800, ops/sec 1118460
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 33344598, ops/sec 1111486
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 33655348, ops/sec 1121844
vogelweide:/abuild/mike/:[0]# 
vogelweide:/abuild/mike/:[130]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 12.296215 seconds for 1000192 loops
per loop execution time: 12.293 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 11.613663 seconds for 1000192 loops
per loop execution time: 11.611 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 13.755537 seconds for 1000192 loops
per loop execution time: 13.752 usec

vogelweide:/abuild/mike/:[0]# uname -r
3.8.13-rt9-rt
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 37343656, ops/sec 1244788
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 37226496, ops/sec 1240883
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 36730816, ops/sec 1224360
vogelweide:/abuild/mike/:[0]# 
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 12.676632 seconds for 1000192 loops
per loop execution time: 12.674 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 14.166756 seconds for 1000192 loops
per loop execution time: 14.164 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 15.116200 seconds for 1000192 loops
per loop execution time: 15.113 use

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/