linux-kernel - Re: [PATCH v2] ipc/sem.c: fix lockup, restore FIFO behavior

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <51A11851.2010303@colorfullife.com>
Date:	Sat, 25 May 2013 22:00:17 +0200
From:	Manfred Spraul <manfred@...orfullife.com>
To:	Davidlohr Bueso <davidlohr.bueso@...com>
CC:	Rik van Riel <riel@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>, hhuang@...hat.com,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v2] ipc/sem.c: fix lockup, restore FIFO behavior

On 05/25/2013 08:32 PM, Davidlohr Bueso wrote:
> Yep, could you please explain what benefits you see in keeping FIFO order?
a) It's user space visible.

b) It's a well-defined behavior that might even make sense for some 
applications.
     Right now, a 2 semop operation with "+1, then -2" is priorized over 
a semop with "-1".

And: It doesn't cost much:
- no impact for users that use only single-op operations.
- no impact for users that use only multi-op operations
- for users that use both types: In the worst case some linked list 
splicing.

Actually, the code is probably faster because wait-for-zero ops are only 
scanned when the semaphore values are 0.

>> Acked-by: Rik van Riel <riel@...hat.com>
>>
>>> - simpler check_restart logic.
>>> - Efficient handling of wait-for-zero semops, both simple and complex.
>>> - Fewer restarts in update_queue(), because pending wait-for-zero do not
>>>     force a restart anymore.
>>>
>>> Other changes:
>>> - try_atomic_semop() also performs the semop. Thus rename the function.
>>>
>>> It passes tests with qemu, but not boot-tested due to EFI problems.
> I think this still needs a *lot* of testing - I don't have my Oracle
> workload available right now, but I will definitely see how this patch
> behaves on it. That said, I believe Oracle is are already quite happy
> with the sem improvements.
Ah, ok.
The change is good for one application and the risk of breaking other 
apps is considered as negligible.

>
> Furthermore, this patch is way too invasive for considering it for 3.10
> - I like Rik's patch better because it simply addresses the issue and
> nothing more.
I would disagree:
My patch is testable - with it applied, linux-3.0.10 should behave 
exactly as linux-3.0.9.
Except the scalability - the new sem_lock from Rik is great.

My problem with Rik's patch is that it is untestable:
It changes the behavior and we must hope that nothing breaks.

Actually, the latest patch makes it a bit worse:
> @@ -720,16 +718,11 @@ static int update_queue(struct sem_array *sma, int semnum, struct list_head *pt)
>   
>   		unlink_queue(sma, q);
>   
> -		if (error) {
> -			restart = 0;
> -		} else {
> -			semop_completed = 1;
> -			restart = check_restart(sma, q);
> -		}
> +		semop_completed = 1;
> +		if (check_restart(sma, q))
> +			*restart = 1;
>   
>   		wake_up_sem_queue_prepare(pt, q, error);
> -		if (restart)
> -			goto again;
If check_restart returns "1", then the current (3.0.10-rc1) code 
restarts immediately ("goto a again").
Now the rest of the queue is processed completely and only afterwards it 
is scanned again.

This means that wait-for-zero now succeeds only if a semaphore value 
stays zero.
For 3.0.9, it was sufficient if the value was temporarily zero.
Before the change, complex wait-for-zero would work, only simple 
wait-for-zero would be starved.
Now all operations are starved.

I've attached a test case:
     ./test5.sh
linux-3.0.9 completes all operations
With Rik's patch, the wait-for-zero remains running.

--
     Manfred

P.S.:
Btw, I found some code that uses a semop with 2 ops:
http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=%2Fapis%2Fapiexusmem.htm

View attachment "change.c" of type "text/plain" (1862 bytes)

View attachment "createary.c" of type "text/plain" (899 bytes)

View attachment "Makefile" of type "text/plain" (261 bytes)

View attachment "removeary.c" of type "text/plain" (900 bytes)

Download attachment "test5.sh" of type "application/x-shellscript" (514 bytes)