[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53731608.3010803@redhat.com>
Date: Wed, 14 May 2014 03:06:48 -0400
From: "Carlos O'Donell" <carlos@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Darren Hart <dvhart@...ux.intel.com>
CC: LKML <linux-kernel@...r.kernel.org>, Dave Jones <davej@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Darren Hart <darren@...art.com>,
Davidlohr Bueso <davidlohr@...com>,
Ingo Molnar <mingo@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Clark Williams <williams@...hat.com>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Roland McGrath <roland@...k.frob.com>,
Jakub Jelinek <jakub@...hat.com>,
Michael Kerrisk <mtk.manpages@...il.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [patch 0/3] futex/rtmutex: Fix issues exposed by trinity
On 05/13/2014 05:08 AM, Thomas Gleixner wrote:
> On Mon, 12 May 2014, Darren Hart wrote:
>> On Mon, May 12, 2014 at 08:45:32PM -0000, Thomas Gleixner wrote:
>>> strace tells me:
>>>
>>> futex(0x600e00, FUTEX_LOCK_PI_PRIVATE, 1) = -1 EINVAL (Invalid argument)
>>>
>>> but the return value of pthread_mutex_lock() is 0
>>
>> So something is clearly wrong there - however, were you looking at the comments
>> (sorry, I mean the C code), or the implementation (all the ASM)? The only way
>> I've been able to be sure in the past is to delete the ASM files and recompile
>> using the C files. Hopefully we'll be able to drop all the ASM in the pthread
>> calls soonish (measured in years in glibc development time scales).... sigh.
>
> The C implementation does:
>
> if (INTERNAL_SYSCALL_ERROR_P (e, __err)
> && (INTERNAL_SYSCALL_ERRNO (e, __err) == ESRCH
> || INTERNAL_SYSCALL_ERRNO (e, __err) == EDEADLK))
> {
> assert (INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK
> || (kind != PTHREAD_MUTEX_ERRORCHECK_NP
> && kind != PTHREAD_MUTEX_RECURSIVE_NP));
> /* ESRCH can happen only for non-robust PI mutexes where
> the owner of the lock died. */
> assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
>
> /* Delay the thread indefinitely. */
> while (1)
> pause_not_cancel ();
> }
>
> So anything else than ESRCH and EDEADLK is ignored and then the thing
> happily returns 0 at the end. Unlock is the same:
The code is valid so long as you expect only ESRCH and EDEADLK
to be the only errors the kernel returns.
What other error codes are returned and under what conditions?
Are those other errors actionable by the user?
I'm inclined to abort() on anything but some agreed upon set of error returns.
Since anything other than the agreed upon set of error returns is a failure
in the coordinated implementation between glibc and the kernel.
> {
> int robust = mutex->__data.__kind & PTHREAD_MUTEX_ROBUST_NORMAL_NP;
> int private = (robust
> ? PTHREAD_ROBUST_MUTEX_PSHARED (mutex)
> : PTHREAD_MUTEX_PSHARED (mutex));
> INTERNAL_SYSCALL_DECL (__err);
> INTERNAL_SYSCALL (futex, __err, 2, &mutex->__data.__lock,
> __lll_private_flag (FUTEX_UNLOCK_PI, private));
> }
Isn't the the same issue as before? This code presumes that because the
atomic operation completed successfully that the kernel state is OK.
Assuming otherwise slows down the fast path in the unlock just to check
for kernel bugs. Is the returned error in any way actionable by the user?
Cheers,
Carlos.
Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists