linux-kernel - Re: WARN_ON_ONCE(!new_owner) within wake_futex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20190131165228.GA32680@osiris>
Date:   Thu, 31 Jan 2019 17:52:28 +0100
From:   Heiko Carstens <heiko.carstens@...ibm.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     "Paul E. McKenney" <paulmck@...ux.ibm.com>,
        Sebastian Sewior <bigeasy@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        LKML <linux-kernel@...r.kernel.org>, linux-s390@...r.kernel.org,
        Stefan Liebler <stli@...ux.ibm.com>
Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede

On Thu, Jan 31, 2019 at 01:27:25AM +0100, Thomas Gleixner wrote:
> On Thu, 31 Jan 2019, Thomas Gleixner wrote:
> 
> > On Wed, 30 Jan 2019, Paul E. McKenney wrote:
> > > On Thu, Jan 31, 2019 at 12:13:51AM +0100, Thomas Gleixner wrote:
> > > > I might be wrong as usual, but this would definitely explain the fail very
> > > > well.
> > > 
> > > On recent versions of GCC, the fix would be to put this between the two
> > > stores that need ordering:
> > > 
> > > 	__atomic_thread_fence(__ATOMIC_RELEASE);
> > > 
> > > I must defer to Heiko on whether s390 GCC might tear the stores.  My
> > > guess is "probably not".  ;-)
> > 
> > So I just checked the latest glibc code. It has:
> > 
> >             /* We must not enqueue the mutex before we have acquired it.
> >                Also see comments at ENQUEUE_MUTEX.  */
> >             __asm ("" ::: "memory");
> >             ENQUEUE_MUTEX_PI (mutex);
> >             /* We need to clear op_pending after we enqueue the mutex.  */
> >             __asm ("" ::: "memory");
> >             THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
> > 
> > 8f9450a0b7a9 ("Add compiler barriers around modifications of the robust mutex list.")
> > 
> > in the glibc repository, There since Dec 24 2016 ...
> 
> And of course, I'm using the latest greatest glibc for testing that, so I'm
> not at all surprised that it just does not reproduce on my tests.

As discussed on IRC: I used plain vanilla glibc version 2.28 for my
tests. This version already contains the commit you mentioned above.

> I just hacked the ordering and restarted the test. If the theory holds,
> then this should die sooner than later.

...nevertheless Stefan and I looked through the lovely disassembly of
_pthread_mutex_lock_full() to verify if the compiler barriers are
actually doing what they are supposed to do. The generated code
however does look correct.
So, it must be something different.