linux-kernel - Re: [RFC PATCH for 4.17 02/21] rseq: Introduce restartable sequences system call (v12)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180329142338.GD4043@hirez.programming.kicks-ass.net>
Date:   Thu, 29 Mar 2018 16:23:38 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Andy Lutomirski <luto@...capital.net>,
        Dave Watson <davejwatson@...com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-api <linux-api@...r.kernel.org>,
        Paul Turner <pjt@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Russell King <linux@....linux.org.uk>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Andrew Hunter <ahh@...gle.com>,
        Andi Kleen <andi@...stfloor.org>, Chris Lameter <cl@...ux.com>,
        Ben Maurer <bmaurer@...com>, rostedt <rostedt@...dmis.org>,
        Josh Triplett <josh@...htriplett.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Alexander Viro <viro@...iv.linux.org.uk>
Subject: Re: [RFC PATCH for 4.17 02/21] rseq: Introduce restartable sequences
 system call (v12)

On Thu, Mar 29, 2018 at 09:54:01AM -0400, Mathieu Desnoyers wrote:
> Let's say we disallow system calls from rseq critical sections. A few points
> arise:
> 
> - We still need to allow traps (page faults, breakpoints, ...) within rseq c.s.,
> 
> - We still need to allow interrupts within rseq c.s.,

Sure, but all those are different entry points, so that shouldn't be a
problem.

> - We need to decide whether we just document that syscalls within rseq c.s.
>   are not supported, or we enforce a behavior if this happens (e.g. SIGSEGV).
>   If we enforce a SIGSEGV, we'd have to figure out whether it's worth it to
>   add extra branches to the system call fast path to validate this.

Without enforcement someone will eventually do this :/ We might (maybe)
get away with it being a debug option somewhere, but even that sounds
like trouble.

> - We need to carefully consider the case of system calls issued within signal
>   handlers nested on top of rseq. When RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL is
>   _not_ set, neither in the rseq c.s. descriptor nor in the TLS @flags,
>   it's pretty much straightforward: upon signal delivery, the kernel moves the
>   ip to abort, and clears the tls @rseq_cs pointer. This means that any system
>   call issued within the signal handler is not actually within the rseq c.s.
>   upon which the signal is nested.
> 
>   The case I worry about is if a thread sets the RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
>   flag in its TLS @flags field (useful in a debugging scenario where we want a
>   debugger to single-step through the rseq c.s. and observe registers at each step).
>   Arguably, this is only ever used in development. However, it does allow a situation
>   where a system call executed within a signal handler can nest over a rseq c.s..
>   So if we choose to be very strict and SIGSEGV any syscall nested over rseq
>   c.s., we may very well end up killing the process for no good reason in this
>   scenario.

Yes, that needs a little thought; but when we run the signal handler,
the IP would no longer be inside the active RSEQ, right?

> - We need to decide whether all syscalls are disallowed, or if we want to pick
>   specific ones (e.g. fork()).

All.