lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51A9BCDA.9040006@colorfullife.com>
Date:	Sat, 01 Jun 2013 11:20:26 +0200
From:	Manfred Spraul <manfred@...orfullife.com>
To:	Rik van Riel <riel@...hat.com>
CC:	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Davidlohr Bueso <davidlohr.bueso@...com>, hhuang@...hat.com,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mike Galbraith <efault@....de>
Subject: Re: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks into
 seperate queues

Hi Rik,

On 05/27/2013 07:57 PM, Rik van Riel wrote:
> On 05/26/2013 05:08 AM, Manfred Spraul wrote:
>> Introduce seperate queues for operations that do not modify the
>> semaphore values.
>> Advantages:
>> - Simpler logic in check_restart().
>> - Faster update_queue(): Right now, all wait-for-zero operations
>>    are always tested, even if the semaphore value is not 0.
>> - wait-for-zero gets again priority, as in linux <=3.0.9
>
> Whether this complexity is wanted is not for
> me to decide, as I am not the ipc/sem.c
> maintainer. I'll leave that up to Andrew and Linus.
>
We can have only one: Either more logic or unoptimized loops.
But I don't think that the complexity increases that much, e.g. some 
parts (e.g. check_restart()) get much simpler.

But:
Mike Galbraith ran 3.10-rc3 with and without my changes on a 4-socket 
64-core system, and for me the results appears to be quite slow:
- semop-multi 256 64: around 600.000 ops/sec, both with and without my 
additional patches [difference around 1%]
     That is slower than my 1.4 GHz i3 with 3.9 - I get around 1.000.000 
ops/sec
     Is that expected?
     My only idea would be trashing from writing sma->sem_otime.

- osim [i.e.: with reschedules] is much slower: around 21 us per schedule.
     Perhaps the scheduler didn't pair the threads optimally: intra-cpu 
reschedules
     take around 2 us on my i3, inter-cpu reschedules around 16 us.

Thus I have attached my test apps.
- psem: psem tests sleeping semaphore operations.
     Pairs of two threads perform ping-pong operations, starting with 1 
semaphore and increasing up to the given max.
     Either bound to the same cpu ("intra-cpu") or bound to different 
cpus ("inter-cpu").
     Inter-cpu is hardcoded, probably always a different socket 
(distance max_cpus/2).

- semscale performs operations that never block, i.e. like your 
semop-multi.c
     It does:
     - delays in user space to figure out what is the maximum number of 
operations possible taking into account that user space will do something.
     - use interleaving, to force the threads to different cores/sockets.

Perhaps something in 3.0.10-rc3 breaks the scalability?

--
     Manfred

View attachment "psem.cpp" of type "text/x-c++src" (7123 bytes)

View attachment "semscale.cpp" of type "text/x-c++src" (7758 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ