lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1902031718170.8200@nanos.tec.linutronix.de>
Date:   Sun, 3 Feb 2019 17:30:39 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Heiko Carstens <heiko.carstens@...ibm.com>
cc:     Sebastian Sewior <bigeasy@...utronix.de>,
        "Paul E. McKenney" <paulmck@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        LKML <linux-kernel@...r.kernel.org>, linux-s390@...r.kernel.org,
        Stefan Liebler <stli@...ux.ibm.com>
Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede

On Sat, 2 Feb 2019, Heiko Carstens wrote:

> On Sat, Feb 02, 2019 at 11:14:27AM +0100, Thomas Gleixner wrote:
> > On Sat, 2 Feb 2019, Heiko Carstens wrote:
> > So after the unlock @timestamp 337.215675 the kernel does not deal with
> > that futex at all until the failed lock attempt where it rightfully rejects
> > the attempt due to the alleged owner being gone.
> > 
> > So this looks more like user space doing something stupid...
> > 
> > As we talked about the missing barriers before, I just looked at
> > pthread_mutex_trylock() and that does still:
> > 
> > 	if (robust)
> >           {
> >             ENQUEUE_MUTEX_PI (mutex);
> >             THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
> >           }
> > 
> > So it's missing the barriers which pthread_mutex_lock() has. Grasping for
> > straws obviously....

Looks more like a solid tree than a straw now. :)

> Excellent! Taking a look into the disassembly of nptl/pthread_mutex_trylock.o
> reveals this part:
> 
> 140:   a5 1b 00 01             oill    %r1,1
> 144:   e5 48 a0 f0 00 00       mvghi   240(%r10),0   <--- THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
> 14a:   e3 10 a0 e0 00 24       stg     %r1,224(%r10) <--- last THREAD_SETMEM of ENQUEUE_MUTEX_PI

Awesome.

> I added a barrier between those two and now the code looks like this:
> 
> 140:   a5 1b 00 01             oill    %r1,1
> 144:   e3 10 a0 e0 00 24       stg     %r1,224(%r10)
> 14a:   e5 48 a0 f0 00 00       mvghi   240(%r10),0
> 
> Looks like this was a one instruction race...

Fun. JFYI, I said that I reversed the stores in glibc and on my x86 test VM
it took more than _3_ days to trigger. But the good news is, that the trace
looks exactly like the ones you provided. So it looks we are on the right
track.

> I'll try to reproduce with the patch below (sprinkling compiler
> barriers just like the other files have).

Looks about right.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ