linux-kernel - Re: WARN_ON_ONCE(!new_owner) within wake_futex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1901302314590.8200@nanos.tec.linutronix.de>
Date:   Thu, 31 Jan 2019 00:13:51 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Sebastian Sewior <bigeasy@...utronix.de>
cc:     Heiko Carstens <heiko.carstens@...ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        LKML <linux-kernel@...r.kernel.org>, linux-s390@...r.kernel.org,
        Stefan Liebler <stli@...ux.ibm.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede

On Wed, 30 Jan 2019, Sebastian Sewior wrote:

> On 2019-01-30 18:56:54 [+0100], Thomas Gleixner wrote:
> > TBH, no clue. Below are some more traceprintks which hopefully shed some
> > light on that mystery. See kernel/futex.c line 30  ...
> 
> The robust list it somehow buggy. In the last trace we had the
> handle_futex_death() of uaddr 3ff9e880140 as the last action. That means
> it was an entry in 56496's ->list_op_pending entry. This makes sense
> because it tried to acquire the lock, failed, got killed.

The robust list of the failing task seems to be correct. 

> According to uaddr pid 56956 is the owner. So 56956 invoked one of
> pthread_mutex_lock() / pthread_mutex_timedlock() /
> pthread_mutex_trylock() and should have obtained the lock in userland.
> Depending on where it got killed, that mutex should be either recorded in
> ->list_op_pending or the robust_list (or both if it didn't clear
> ->list_op_pending yet). But it is not.
> Similar for pthread_mutex_unlock().

> We don't have a trace_point if we abort processing the list.

The only reason why it would abort is due a page fault because that cannot
be handled in the exit code anymore.

> On the other hand, it didn't trigger on x86 for hours. Could the atomic

s/hours/days/ ....

> ops be the culprit?

The glibc code does:

       THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending,
       		     (void *) (((uintptr_t) &mutex->__data.__list.__next)
                                   | 1));

       ....
       lock in user space

       or

       lock in kernel space

       ENQUEUE_MUTEX_PI (mutex);
       THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);

       ENQUEUE_MUTEX_PI() resolves to a THREAD_GETMEM() which reads the
       list head from TLS, some list manipulation operations and the final
       THREAD_SETMEM() which stores the new list head

Now on x86 THREAD_GETMEM() and THREAD_SETMEM() are resolving to

    asm volatile ("movX .....")

on s390 they are

   descr->member

based operations.

Now the important part of the robust list is the store sequence, i.e. the
list head and final update to the TLS visible part need to come _before_
list_op_pending is cleared.

I might be missing something, but there is no compiler barrier in that code
which would prevent the compiler from reordering the stores. It can
rightfully do so because there is no compiler visible dependency of these
two operations.

On x8664 the asm volatile might prevent it by chance, but it does not have
a 'memory' specified which would guarantee a compiler barrier.

On s390 there is certainly nothing.

So assumed that clearing list_op_pending comes before the list head update,
then the robust exit code in the kernel will fail to see either of
them. FAIL.

I might be wrong as usual, but this would definitely explain the fail very
well.

Thanks,

	tglx