linux-kernel - Re: [PATCH 1/2] ipc semaphores: reduce ipc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4BC84957.9080608@colorfullife.com>
Date:	Fri, 16 Apr 2010 13:26:15 +0200
From:	Manfred Spraul <manfred@...orfullife.com>
To:	Chris Mason <chris.mason@...cle.com>
CC:	zach.brown@...cle.com, jens.axboe@...cle.com,
	linux-kernel@...r.kernel.org, Nick Piggin <npiggin@...e.de>
Subject: Re: [PATCH 1/2] ipc semaphores: reduce ipc_lock contention in semtimedop

On 04/12/2010 08:49 PM, Chris Mason wrote:
> I have a microbenchmark to test how quickly we can post and wait in
> bulk.  With this change, semtimedop is able do to more than twice
> as much work in the same run.  On a large numa machine, it brings
> the IPC lock system time (reported by perf) down from 85% to 15%.
>
>    
Looking at the current code:
- update_queue() can be O(N^2) if only some of the waiting tasks are 
woken up.
Actually: all non-woken up tasks are rescanned after a task that can be 
woken up is found.

- Your test app tests the best case for the current code:
You wake up the tasks in the same order as the called semop().
If you invert the order (i.e.: worklist_add() adds to head instead of 
tail), I would expect an even worse performance of the current code.

The O(N^2) is simple to fix, I've attached a patch.
For your micro-benchmark, the patch does not change much: you wake-up 
in-order, thus the current code does not misbehave.

Do you know how Oracle wakes up the tasks?
FIFO, LIFO, un-ordered?

> 	while(unlikely(error == IN_WAKEUP)) {
>   		cpu_relax();
>   		error = queue.status;
>   	}
>
> -	if (error != -EINTR) {
> +	/*
> +	 * we are lock free right here, and we could have timed out or
> +	 * gotten a signal, so we need to be really careful with how we
> +	 * play with queue.status.  It has three possible states:
> +	 *
> +	 * -EINTR, which means nobody has changed it since we slept.  This
> +	 * means we woke up on our own.
> +	 *
> +	 * IN_WAKEUP, someone is currently waking us up.  We need to loop
> +	 * here until they change it to the operation error value.  If
> +	 * we don't loop, our process could exit before they are done waking us
> +	 *
> +	 * operation error value: we've been properly woken up and can exit
> +	 * at any time.
> +	 *
> +	 * If queue.status is currently -EINTR, we are still being processed
> +	 * by the semtimedop core.  Someone either has us on a list head
> +	 * or is currently poking our queue struct.  We need to find that
> +	 * reference and remove it, which is what remove_queue_from_lists
> +	 * does.
> +	 *
> +	 * We always check for both -EINTR and IN_WAKEUP because we have no
> +	 * locks held.  Someone could change us from -EINTR to IN_WAKEUP at
> +	 * any time.
> +	 */
> +	if (error != -EINTR&&  error != IN_WAKEUP) {
>   		/* fast path: update_queue already obtained all requested
>   		 * resources */
No: The code accesses a local variable. The loop above the comment 
guarantees that the error can't be IN_WAKEUP.

> +
> +out_putref:
> +	sem_putref(sma);
> +	goto out_free;
>    
Is it possible to move the sem_putref into wakeup_sem_queue()?
Right now, the exit path of semtimedop doesn't touch the spinlock.
You remove that optimization.

--
     Manfred

View attachment "patch-ipc-optimize_bulkwakeup" of type "text/plain" (4404 bytes)