linux-kernel - Re: [PATCH v2 -mm -next] ipc,sem: fix lockdep false positive

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51558407.5040005@surriel.com>
Date:	Fri, 29 Mar 2013 08:07:35 -0400
From:	Rik van Riel <riel@...riel.com>
To:	Michel Lespinasse <walken@...gle.com>
CC:	Peter Zijlstra <peterz@...radead.org>,
	Sasha Levin <sasha.levin@...cle.com>,
	torvalds@...ux-foundation.org, davidlohr.bueso@...com,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	hhuang@...hat.com, jason.low2@...com, lwoodman@...hat.com,
	chegu_vinod@...com, Dave Jones <davej@...hat.com>,
	benisty.e@...il.com, Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH v2 -mm -next] ipc,sem: fix lockdep false positive

On 03/28/2013 10:50 PM, Michel Lespinasse wrote:
> On Thu, Mar 28, 2013 at 1:23 PM, Rik van Riel <riel@...riel.com> wrote:
>> Subject: [PATCH -mm -next] ipc,sem: change locking scheme to make lockdep happy
>>
>> Unfortunately the locking scheme originally proposed has false positives
>> with lockdep.  This can be fixed by changing the code to only ever take
>> one lock, and making sure that other relevant locks are not locked, before
>> entering a critical section.
>>
>> For the "global lock" case, this is done by taking the sem_array lock,
>> and then (potentially) waiting for all the semaphore's spinlocks to be
>> unlocked.
>>
>> For the "local lock" case, we wait on the sem_array's lock to be free,
>> before taking the semaphore local lock. To prevent races, we need to
>> check again after we have taken the local lock.
>>
>> Suggested-by: Peter Zijlstra <peterz@...radead.org>
>> Reported-by: Sasha Levin <sasha.levin@...cle.com>
>> Signed-off-by: Rik van Riel <riel@...hat.com>
>
> TL;DR: The locking algorithm is not familiar for me, but it seems
> sound. There are some implementation details I don't like. More
> below...
>
>> ---
>>   ipc/sem.c |   55 ++++++++++++++++++++++++++++++++++++++++---------------
>>   1 files changed, 40 insertions(+), 15 deletions(-)
>>
>> diff --git a/ipc/sem.c b/ipc/sem.c
>> index 36500a6..87b74d5 100644
>> --- a/ipc/sem.c
>> +++ b/ipc/sem.c
>> @@ -320,24 +320,39 @@ void __init sem_init (void)
>>   }
>>
>>   /*
>> - * If the sem_array contains just one semaphore, or if multiple
>> - * semops are performed in one syscall, or if there are complex
>> - * operations pending, the whole sem_array is locked.
>> - * If one semop is performed on an array with multiple semaphores,
>> - * get a shared lock on the array, and lock the individual semaphore.
>> + * If the request contains only one semaphore operation, and there are
>> + * no complex transactions pending, lock only the semaphore involved.
>> + * Otherwise, lock the entire semaphore array, since we either have
>> + * multiple semaphores in our own semops, or we need to look at
>> + * semaphores from other pending complex operations.
>>    *
>>    * Carefully guard against sma->complex_count changing between zero
>>    * and non-zero while we are spinning for the lock. The value of
>>    * sma->complex_count cannot change while we are holding the lock,
>>    * so sem_unlock should be fine.
>> + *
>> + * The global lock path checks that all the local locks have been released,
>> + * checking each local lock once. This means that the local lock paths
>> + * cannot start their critical sections while the global lock is held.
>>    */
>>   static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
>>                                int nsops)
>>   {
>>          int locknum;
>> + again:
>>          if (nsops == 1 && !sma->complex_count) {
>>                  struct sem *sem = sma->sem_base + sops->sem_num;
>>
>> +               /*
>> +                * Another process is holding the global lock on the
>> +                * sem_array. Wait for that process to release the lock,
>> +                * before acquiring our lock.
>> +                */
>> +               if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
>> +                       spin_unlock_wait(&sma->sem_perm.lock);
>> +                       goto again;
>> +               }
>> +
>
> So, there are a few things I don't like about spin_unlock_wait():
>
> 1- From a lock ordering point of view, it is strictly equivalent to
> taking the lock and then releasing it - and yet, lockdep won't catch
> any deadlocks that involve spin_unlock_wait. (Not your fault here,
> this should be fixed as a separate change in lockdep. I manually
> looked at the lock ordering here and found it safe).
>
> 2- With the current ticket lock implementation, a stream of lockers
> can starve spin_unlock_wait() forever. Once again, not your fault and
> I suspect this could be fixed - I expect spin_unlock_wait() callers
> actually only want to know that the lock has been passed on, not that
> it actually got to an unlocked state.
>
> 3- Regarding your actual use here - I find it confusing to call
> spin_unlock_wait() before holding any other lock. The pattern I expect
> to see is that people take one lock, then see that the other lock they
> want is already taken, so they release the first lock and wait on the
> second. So, I'd suggest we remove the sem_perm.lock checks here and
> deal with this in a retry path later down.
>
>>                  /* Lock just the semaphore we are interested in. */
>>                  spin_lock(&sem->lock);
>>
>> @@ -347,17 +362,33 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
>>                   */
>>                  if (unlikely(sma->complex_count)) {
>>                          spin_unlock(&sem->lock);
>> -                       goto lock_all;
>> +                       goto lock_array;
>> +               }
>> +
>> +               /*
>> +                * Another process is holding the global lock on the
>> +                * sem_array; we cannot enter our critical section,
>> +                * but have to wait for the global lock to be released.
>> +                */
>> +               if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
>> +                       spin_unlock(&sem->lock);
>> +                       goto again;
>
> This is IMO where the spin_unlock_wait(&sma->sem_perm.lock) would
> belong - right before the goto again.

That is where I had it initially. I may have gotten too clever
and worked on keeping more accesses read-only. If you want, I
can move it back here and re-submit the patch :)

> Also - I think there is a risk that an endless stream of complex
> semops could starve a simple semop here, as it would always find the
> sem_perm.lock to be locked ??? One easy way to guarantee progress
> would be to goto lock_array instead; however there is then the issue
> that a complex semop could force an endless stream of following simple
> semops to take the lock_array path. I'm not sure which of these
> problems is preferable to have...

If starvation turns out to be an issue, there is an even
simpler solution:

	if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
		spin_unlock(&sem->lock);
		spin_lock(&sma->sem_perm.lock);
		spin_lock(&sem->lock);
		spin_unlock(&sma->sem_perm.lock);
	}

Followed by unconditionally doing the critical section for
holding a single semaphore's lock, because we know that a
subsequent taker of sma->sem_perm.lock will either grab a
different semaphore's spinlock, or wait on the same semaphore's
spinlock, or wait for us to unlock our spinlock.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/