[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140731115759.GS19379@twins.programming.kicks-ass.net>
Date: Thu, 31 Jul 2014 13:57:59 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Ilya Dryomov <ilya.dryomov@...tank.com>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
ceph-devel@...r.kernel.org, davidlohr@...com, jason.low2@...com
Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra
reschedule point"
On Thu, Jul 31, 2014 at 02:16:37PM +0400, Ilya Dryomov wrote:
> This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.
>
> This commit can lead to deadlocks by way of what at a high level
> appears to look like a missing wakeup on mutex_unlock() when
> CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
> their kernels. In particular, it causes reproducible deadlocks in
> libceph/rbd code under higher than moderate loads with the evidence
> actually pointing to the bowels of mutex_lock().
>
> kernel/locking/mutex.c, __mutex_lock_common():
> 476 osq_unlock(&lock->osq);
> 477 slowpath:
> 478 /*
> 479 * If we fell out of the spin path because of need_resched(),
> 480 * reschedule now, before we try-lock the mutex. This avoids getting
> 481 * scheduled out right after we obtained the mutex.
> 482 */
> 483 if (need_resched())
> 484 schedule_preempt_disabled(); <-- never returns
> 485 #endif
> 486 spin_lock_mutex(&lock->wait_lock, flags);
>
> We started bumping into deadlocks in QA the day our branch has been
> rebased onto 3.15 (the release this commit went in) but then as part of
> debugging effort I enabled all locking debug options, which also
> disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
> which is why it hasn't been looked into until now. Revert makes the
> problem go away, confirmed by our users.
This doesn't make sense and you fail to explain how this can possibly
deadlock.
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists