[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1902031718170.8200@nanos.tec.linutronix.de>
Date: Sun, 3 Feb 2019 17:30:39 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: Heiko Carstens <heiko.carstens@...ibm.com>
cc: Sebastian Sewior <bigeasy@...utronix.de>,
"Paul E. McKenney" <paulmck@...ux.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
LKML <linux-kernel@...r.kernel.org>, linux-s390@...r.kernel.org,
Stefan Liebler <stli@...ux.ibm.com>
Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede
On Sat, 2 Feb 2019, Heiko Carstens wrote:
> On Sat, Feb 02, 2019 at 11:14:27AM +0100, Thomas Gleixner wrote:
> > On Sat, 2 Feb 2019, Heiko Carstens wrote:
> > So after the unlock @timestamp 337.215675 the kernel does not deal with
> > that futex at all until the failed lock attempt where it rightfully rejects
> > the attempt due to the alleged owner being gone.
> >
> > So this looks more like user space doing something stupid...
> >
> > As we talked about the missing barriers before, I just looked at
> > pthread_mutex_trylock() and that does still:
> >
> > if (robust)
> > {
> > ENQUEUE_MUTEX_PI (mutex);
> > THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
> > }
> >
> > So it's missing the barriers which pthread_mutex_lock() has. Grasping for
> > straws obviously....
Looks more like a solid tree than a straw now. :)
> Excellent! Taking a look into the disassembly of nptl/pthread_mutex_trylock.o
> reveals this part:
>
> 140: a5 1b 00 01 oill %r1,1
> 144: e5 48 a0 f0 00 00 mvghi 240(%r10),0 <--- THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
> 14a: e3 10 a0 e0 00 24 stg %r1,224(%r10) <--- last THREAD_SETMEM of ENQUEUE_MUTEX_PI
Awesome.
> I added a barrier between those two and now the code looks like this:
>
> 140: a5 1b 00 01 oill %r1,1
> 144: e3 10 a0 e0 00 24 stg %r1,224(%r10)
> 14a: e5 48 a0 f0 00 00 mvghi 240(%r10),0
>
> Looks like this was a one instruction race...
Fun. JFYI, I said that I reversed the stores in glibc and on my x86 test VM
it took more than _3_ days to trigger. But the good news is, that the trace
looks exactly like the ones you provided. So it looks we are on the right
track.
> I'll try to reproduce with the patch below (sprinkling compiler
> barriers just like the other files have).
Looks about right.
Thanks,
tglx
Powered by blists - more mailing lists