linux-kernel - Re: [PATCH 1/2] ipc semaphores: reduce ipc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100414173319.GA3228@think>
Date:	Wed, 14 Apr 2010 13:33:19 -0400
From:	Chris Mason <chris.mason@...cle.com>
To:	Manfred Spraul <manfred@...orfullife.com>
Cc:	Nick Piggin <npiggin@...e.de>, zach.brown@...cle.com,
	jens.axboe@...cle.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] ipc semaphores: reduce ipc_lock contention in
 semtimedop

On Wed, Apr 14, 2010 at 06:16:53PM +0200, Manfred Spraul wrote:
> On 04/13/2010 08:19 PM, Chris Mason wrote:
> >On Wed, Apr 14, 2010 at 04:09:45AM +1000, Nick Piggin wrote:
> >>On Tue, Apr 13, 2010 at 01:39:41PM -0400, Chris Mason wrote:
> >>The other thing I don't know if your patch gets right is requeueing on
> >>of the operations. When you requeue from one list to another, then you
> >>seem to lose ordering with other pending operations, so that would
> >>seem to break the API as well (can't remember if the API strictly
> >>mandates FIFO, but anyway it can open up starvation cases).
> >I don't see anything in the docs about the FIFO order.  I could add an
> >extra sort on sequence number pretty easily, but is the starvation case
> >really that bad?
> >
> How do you want to determine the sequence number?
> Is atomic_inc_return() on a per-semaphore array counter sufficiently fast?

I haven't tried yet, but hopefully it won't be a problem.  A later patch
does atomics on the reference count and it doesn't show up in the
profiles.

> 
> >>I was looking at doing a sequence number to be able to sort these, but
> >>it ended up getting over complex (and SAP was only using simple ops so
> >>it didn't seem to need much better).
> >>
> >>We want to be careful not to change semantics at all. And it gets
> >>tricky quickly :( What about Zach's simpler wakeup API?
> >Yeah, that's why my patches include code to handle userland sending
> >duplicate semids.  Zach's simpler API is cooking too, but if I can get
> >this done without insane complexity it helps with more than just the
> >post/wait oracle workload.
> >
> What is the oracle workload, which multi-sembuf operations does it use?
> How many semaphores are in one array?
> 
> When the last optimizations were written, I've searched a bit:
> - postgres uses per-process semaphores, with small semaphore arrays.
>     [process sleeps on it's own semaphore and is woken up by someone
> else when it can make progress]

This is similar to Oracle (and the sembench program).  Each process has
a semaphore and when it is waiting for a commit it goes to sleep on it.
They are woken up in bulk with semtimedop calls from a single process.

But oracle also uses semaphores for locking in a traditional sense.

Putting the waiters into a per-semaphore list is really only part of the
speedup.  The real boost comes from the patch to break up the locks into
a per semaphore lock.

We gain another 10-15% from a later patch that gets uses atomics on the
refcount, which lets us do sem_putref without a lock (meaning we're
lockless once we get woken up).

I'm cleaning up fixes based on suggestions here and will repost.

> - with google, I couldn't find anything relevant that uses
> multi-sembuf semop() calls.
> 

I think this should help any workload that has more than one semaphore
per array, even if they only do one sem per call.

> And I agree with Nick: We should be careful about changing the API.

Definitely, thanks for reading through it.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/