linux-kernel - Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140514092203.GE30445@twins.programming.kicks-ass.net>
Date:	Wed, 14 May 2014 11:22:03 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Carlos O'Donell <carlos@...hat.com>
Cc:	Darren Hart <dvhart@...ux.intel.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Darren Hart <darren@...art.com>,
	Davidlohr Bueso <davidlohr@...com>,
	Ingo Molnar <mingo@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Clark Williams <williams@...hat.com>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Roland McGrath <roland@...k.frob.com>,
	Jakub Jelinek <jakub@...hat.com>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity

On Wed, May 14, 2014 at 02:58:05AM -0400, Carlos O'Donell wrote:
> >>    The handling of -EDEADLOCK is even more impressive. Instead of
> >>    propagating it to the caller something in the guts of glibc calls pause().
> >>
> >>      futex(0x601300, FUTEX_LOCK_PI_PRIVATE, 1) = -1 EDEADLK (Resource deadlock avoided)
> >>      pause(
> >>
> > 
> > Gotta love comments like these though - such trust!:
> > 
> > 	/* The mutex is locked.  The kernel will now take care of
> >            everything. */
> > 
> > IIRC, glibc takes the approach that if this operation fails, there is no way for
> > it to recovery "properly", and so it chooses to:
> > 
> > 	/* Delay the thread indefinitely. */
> > 
> > I believe the thinking goes that if we get to here, then the lock is in an
> > inconsistent state (between kernel and userspace). I don't have an answer for
> > why pausing forever would be preferable to returning an error however...
> 
> What error would we return?

EDEADLK is a valid user return for pthread_mutex_lock() as per:

  http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_lock.html

> This particular case is a serious error for which we have no good error code
> to return to userspace. It's an implementation defect, a bug, we should probably
> assert instead of pausing.

No, its perfectly fine to have a lock sequence abort with -EDEADLK.
Userspace should release its locks and re-attempt.

You can implement usable locking schemes using this error, like
wound/wait locking.

> We can't cancel the stuck thread because pthread_mutex_lock is not a cancellation
> point.
> 
> In practice the rest of the application can make forward progress with a single
> thread stuck. You can attach the debugger and inspect state, so it's useful
> from that perspective.

That's just totally braindead. Return EDEADLK to userspace already, let
the user deal with it.

Content of type "application/pgp-signature" skipped