linux-kernel - Re: [patch 00/12] futex: Cure robust/PI futex exit races

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191107084136.GH30739@gmail.com>
Date:   Thu, 7 Nov 2019 09:41:36 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Darren Hart <darren@...art.com>,
        Yi Wang <wang.yi59@....com.cn>,
        Yang Tao <yang.tao172@....com.cn>,
        Oleg Nesterov <oleg@...hat.com>,
        Florian Weimer <fweimer@...hat.com>,
        Carlos O'Donell <carlos@...hat.com>,
        Alexander Viro <viro@...iv.linux.org.uk>
Subject: Re: [patch 00/12] futex: Cure robust/PI futex exit races


* Thomas Gleixner <tglx@...utronix.de> wrote:

> This series addresses a couple of robust/PI futex exit races:
> 
>  1) The unlock races debugged and fixed by Yi and Yang
> 
>     These races are really subtle and I'm still puzzled how to trigger them
>     reliably enough to decode them.
> 
>     The basic issue is that:
> 
>     A) An unlocking task can be killed between clearing the user space
>        futex value and calling futex(FUTEX_WAKE).
> 
>     B) A woken up waiter can be killed before it can acquire the futex
>        after returning to user space.
> 
>     In both cases the futex value is 0 and due to that the robust list exit
>     code refuses to wake up waiters as the futex is not owned by the
>     exiting task. As a consequence all other waiters might be blocked
>     forever.
> 
>  2) Oleg provided a test case which causes an infinite loop in the
>     futex_lock_pi() code.
> 
>     The problem there is that an exiting task might be preempted by a
>     waiter in a state which makes the waiter busy wait for the exiting task
>     to complete the robust/PI exit cleanup code.
> 
>     That's obviously impossible when the waiter has higher priority than
>     the exiting task and both are pinned on the same CPU resulting in a
>     live lock.
> 
> #1 is a straight forward and simple fix 
> 
>     The solution Yi and Yang provided looks solid and in the worst case
>     causes a spurious wakeup of a waiter which is nothing to worry about
>     as all waiter code has to be prepared for that anyway.
> 
> #2 is more complex
> 
>    In the current implementation there is no way to block until the exiting
>    task has finished the cleanup.
> 
>    To fix this there is quite some code reshuffling required which at the
>    same time is a valuable cleanup.
> 
>    The final solution is to guard the futex exit handling with a per task
>    mutex and make the waiter block on that mutex until the exiting task has
>    the cleanup completed.
> 
>    Details why a simpler solution is not feasible can be found here:
> 
>    https://lore.kernel.org/r/20191105152728.GA5666@redhat.com
> 
>    Ignore my confusion of fork vs. vfork at the beginning of the thread.
>    Futexes do that to human brains. :)
> 
> The following series addresses both issues.
> 
> Patch 1 is a slightly polished version of the original Yi and Yang
> submission. It is included for completeness sake and because it
> creates conflicts with the larger surgery which fixes issue #2. 
> 
> Aside of that a few eyeballs more on that subtlety are definitely not
> a bad thing especially as this has a user space component in it.
> 
> The rest of the series addresses issue #2 which is more or less a kernel
> only problem, but extra eyeballs are appreciated.
> 
> I'm certainly not proud about the solution for #2 but it's the best I could
> come up with without violating the user/kernel state consistency
> constraints.

I really like the whole series - this is how it should have been 
implemented originally, but the exit scenarios 'looked' so simple so it 
was just open-coded ... Mea culpa. :-)

As to ->futex_exit_mutex: that's really just a consequence of the ABI, 
and a lot cleaner than all the previous pretense that these exit ops are 
atomic - which they fundamentally aren't.

Haven't tested the series beyond build coverage, but the high level 
principles behind the whole series look very sound to me:

Reviewed-by: Ingo Molnar <mingo@...nel.org>

Thanks,

	Ingo