[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BC84957.9080608@colorfullife.com>
Date: Fri, 16 Apr 2010 13:26:15 +0200
From: Manfred Spraul <manfred@...orfullife.com>
To: Chris Mason <chris.mason@...cle.com>
CC: zach.brown@...cle.com, jens.axboe@...cle.com,
linux-kernel@...r.kernel.org, Nick Piggin <npiggin@...e.de>
Subject: Re: [PATCH 1/2] ipc semaphores: reduce ipc_lock contention in semtimedop
On 04/12/2010 08:49 PM, Chris Mason wrote:
> I have a microbenchmark to test how quickly we can post and wait in
> bulk. With this change, semtimedop is able do to more than twice
> as much work in the same run. On a large numa machine, it brings
> the IPC lock system time (reported by perf) down from 85% to 15%.
>
>
Looking at the current code:
- update_queue() can be O(N^2) if only some of the waiting tasks are
woken up.
Actually: all non-woken up tasks are rescanned after a task that can be
woken up is found.
- Your test app tests the best case for the current code:
You wake up the tasks in the same order as the called semop().
If you invert the order (i.e.: worklist_add() adds to head instead of
tail), I would expect an even worse performance of the current code.
The O(N^2) is simple to fix, I've attached a patch.
For your micro-benchmark, the patch does not change much: you wake-up
in-order, thus the current code does not misbehave.
Do you know how Oracle wakes up the tasks?
FIFO, LIFO, un-ordered?
> while(unlikely(error == IN_WAKEUP)) {
> cpu_relax();
> error = queue.status;
> }
>
> - if (error != -EINTR) {
> + /*
> + * we are lock free right here, and we could have timed out or
> + * gotten a signal, so we need to be really careful with how we
> + * play with queue.status. It has three possible states:
> + *
> + * -EINTR, which means nobody has changed it since we slept. This
> + * means we woke up on our own.
> + *
> + * IN_WAKEUP, someone is currently waking us up. We need to loop
> + * here until they change it to the operation error value. If
> + * we don't loop, our process could exit before they are done waking us
> + *
> + * operation error value: we've been properly woken up and can exit
> + * at any time.
> + *
> + * If queue.status is currently -EINTR, we are still being processed
> + * by the semtimedop core. Someone either has us on a list head
> + * or is currently poking our queue struct. We need to find that
> + * reference and remove it, which is what remove_queue_from_lists
> + * does.
> + *
> + * We always check for both -EINTR and IN_WAKEUP because we have no
> + * locks held. Someone could change us from -EINTR to IN_WAKEUP at
> + * any time.
> + */
> + if (error != -EINTR&& error != IN_WAKEUP) {
> /* fast path: update_queue already obtained all requested
> * resources */
No: The code accesses a local variable. The loop above the comment
guarantees that the error can't be IN_WAKEUP.
> +
> +out_putref:
> + sem_putref(sma);
> + goto out_free;
>
Is it possible to move the sem_putref into wakeup_sem_queue()?
Right now, the exit path of semtimedop doesn't touch the spinlock.
You remove that optimization.
--
Manfred
View attachment "patch-ipc-optimize_bulkwakeup" of type "text/plain" (4404 bytes)
Powered by blists - more mailing lists