linux-kernel - Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+Ziom32P0fWVG_mWM5Vrw9Y6Xem=mqH0Jt21VVVaNEODA@mail.gmail.com>
Date: Thu, 6 Feb 2025 19:06:15 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: aruna.ramakrishna@...cle.com, mathieu.desnoyers@...icios.com, 
	peterz@...radead.org, paulmck@...nel.org, boqun.feng@...il.com
Cc: dave.hansen@...ux.intel.com, jannh@...gle.com, jeffxu@...omium.org, 
	jorgelo@...omium.org, keescook@...omium.org, keith.lucas@...cle.com, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, mingo@...nel.org, 
	rick.p.edgecombe@...el.com, sroettger@...gle.com, tglx@...utronix.de, 
	x86@...nel.org
Subject: Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE

On Tue, 4 Feb 2025 at 11:02, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>
> Re commit 70044df250d022572e26cd301bddf75eac1fe50e:
> https://lore.kernel.org/all/20240802061318.2140081-4-aruna.ramakrishna@oracle.com/
>
> > If the alternate signal stack is protected by a different pkey than the
> > current execution stack, copying xsave data to the sigaltstack will fail
> > if its pkey is not enabled in the PKRU register.
> >
> > We do not know which pkey was used by the application for the altstack,
> > so enable all pkeys before xsave.
> >
> > But this updated PKRU value is also pushed onto the sigframe, which
> > means the register value restored from sigcontext will be different from
> > the user-defined one, which is unexpected. Fix that by overwriting the
> > PKRU value on the sigframe with the original, user-defined PKRU.
>
> Hi,
>
> This unfortunatly seems to be broken for rseq user-space writes.
> If the signal is caused by rseq struct being inaccessible due to PKEYs,
> we try to write to rseq again at setup_rt_frame->rseq_signal_deliver,
> which happens _before_ sig_prepare_pkru and won't succeed
> (PKEY is still inaccessible, hard kills the process).
> Any PKEY sandbox would want to restict untrusted access to rseq
> as well (otherwise allows easy sandbox escapes).
>
> If we do sig_prepare_pkru before rseq_signal_deliver (and generally
> before any copy_to_userpace), then user-space handler gets SIGSEGV
> and could unregister rseq and retry.
>
> However, I am not sure if it's the best solution performance-
> and complexity-wise (for user-space). A better solution may be to
> change __rseq_handle_notify_resume to temporary switch to default
> PKEY if user accesses fail.
> Rseq is similar to signals in this respect. Since rseq updates
> happen asynchronously with respect to user-space control flow,
> if a program uses rseq and ever makes rseq inaccessible with PKEYs,
> it's in trouble and will be randomly killed.
> Since rseq updates are asynchronous as signals, they shouldn't
> assume PKEY is set to default value that allows access
> to rseq descriptor.
>
> Thoughts?

Another question about switching to pkey 0 and not switching back on all errors.
Can it create security problems by allowing sandboxed code to escape?

Namely, here:

+        /* Update PKRU to enable access to the alternate signal stack. */
+        pkru = sig_prepare_pkru();
         /* save i387 and extended state */
-        if (!copy_fpstate_to_sigframe(*fpstate, (void __user
*)buf_fx, math_size, pkru))
+        if (!copy_fpstate_to_sigframe(*fpstate, (void __user
*)buf_fx, math_size, pkru)) {
+                /*
+                 * Restore PKRU to the original, user-defined value; disable
+                 * extra pkeys enabled for the alternate signal stack, if any.
+                 */
+                write_pkru(pkru);
                 return (void __user *)-1L;
+        }

we restore to the original pkru on this error, but there are other
failure paths later, e.g.:
https://elixir.bootlin.com/linux/v6.13.1/source/arch/x86/kernel/signal_64.c#L199

on these errors paths we will eventually get here to force_sig(SIGSEGV):
https://elixir.bootlin.com/linux/v6.13.1/source/kernel/signal.c#L1685
which just sends SIGSEGV and is not fatal.

So hypothetically, if there is a SIGUSR1 handler without SA_ONSTACK,
which fails, but SIGSEGV handler has SA_ONSTACK and doesn't fail, this
will result in resetting PKRU to 0 without restoring it back.
Or sandboxed code somehow arranges for the first signal setup for other reasons.

This is, of course, a tricky attack vector, and the program must
resume after SIGSEGV somehow (there are some such cases, e.g. mmaping
something lazily and retrying), but with security you never know how
creative an attacker can get and what you are missing that they are
not missing. So it looks safer to restore to the original PKRU on all
errors.