linux-kernel - Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140731115759.GS19379@twins.programming.kicks-ass.net>
Date:	Thu, 31 Jul 2014 13:57:59 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Ilya Dryomov <ilya.dryomov@...tank.com>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
	ceph-devel@...r.kernel.org, davidlohr@...com, jason.low2@...com
Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra
 reschedule point"

On Thu, Jul 31, 2014 at 02:16:37PM +0400, Ilya Dryomov wrote:
> This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.
> 
> This commit can lead to deadlocks by way of what at a high level
> appears to look like a missing wakeup on mutex_unlock() when
> CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
> their kernels.  In particular, it causes reproducible deadlocks in
> libceph/rbd code under higher than moderate loads with the evidence
> actually pointing to the bowels of mutex_lock().
> 
> kernel/locking/mutex.c, __mutex_lock_common():
> 476         osq_unlock(&lock->osq);
> 477 slowpath:
> 478         /*
> 479          * If we fell out of the spin path because of need_resched(),
> 480          * reschedule now, before we try-lock the mutex. This avoids getting
> 481          * scheduled out right after we obtained the mutex.
> 482          */
> 483         if (need_resched())
> 484                 schedule_preempt_disabled(); <-- never returns
> 485 #endif
> 486         spin_lock_mutex(&lock->wait_lock, flags);
> 
> We started bumping into deadlocks in QA the day our branch has been
> rebased onto 3.15 (the release this commit went in) but then as part of
> debugging effort I enabled all locking debug options, which also
> disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
> which is why it hasn't been looked into until now.  Revert makes the
> problem go away, confirmed by our users.

This doesn't make sense and you fail to explain how this can possibly
deadlock.

Content of type "application/pgp-signature" skipped