[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140514092203.GE30445@twins.programming.kicks-ass.net>
Date: Wed, 14 May 2014 11:22:03 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Carlos O'Donell <carlos@...hat.com>
Cc: Darren Hart <dvhart@...ux.intel.com>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Dave Jones <davej@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Darren Hart <darren@...art.com>,
Davidlohr Bueso <davidlohr@...com>,
Ingo Molnar <mingo@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Clark Williams <williams@...hat.com>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Roland McGrath <roland@...k.frob.com>,
Jakub Jelinek <jakub@...hat.com>,
Michael Kerrisk <mtk.manpages@...il.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity
On Wed, May 14, 2014 at 02:58:05AM -0400, Carlos O'Donell wrote:
> >> The handling of -EDEADLOCK is even more impressive. Instead of
> >> propagating it to the caller something in the guts of glibc calls pause().
> >>
> >> futex(0x601300, FUTEX_LOCK_PI_PRIVATE, 1) = -1 EDEADLK (Resource deadlock avoided)
> >> pause(
> >>
> >
> > Gotta love comments like these though - such trust!:
> >
> > /* The mutex is locked. The kernel will now take care of
> > everything. */
> >
> > IIRC, glibc takes the approach that if this operation fails, there is no way for
> > it to recovery "properly", and so it chooses to:
> >
> > /* Delay the thread indefinitely. */
> >
> > I believe the thinking goes that if we get to here, then the lock is in an
> > inconsistent state (between kernel and userspace). I don't have an answer for
> > why pausing forever would be preferable to returning an error however...
>
> What error would we return?
EDEADLK is a valid user return for pthread_mutex_lock() as per:
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_lock.html
> This particular case is a serious error for which we have no good error code
> to return to userspace. It's an implementation defect, a bug, we should probably
> assert instead of pausing.
No, its perfectly fine to have a lock sequence abort with -EDEADLK.
Userspace should release its locks and re-attempt.
You can implement usable locking schemes using this error, like
wound/wait locking.
> We can't cancel the stuck thread because pthread_mutex_lock is not a cancellation
> point.
>
> In practice the rest of the application can make forward progress with a single
> thread stuck. You can attach the debugger and inspect state, so it's useful
> from that perspective.
That's just totally braindead. Return EDEADLK to userspace already, let
the user deal with it.
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists