linux-kernel - Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACT4Y+ZYKpZAu0UtZ9N4EjC+qkes5oenwNPceJqkk7XrJD7g=A@mail.gmail.com>
Date: Tue, 11 Feb 2025 07:47:10 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Jeff Xu <jeffxu@...omium.org>
Cc: aruna.ramakrishna@...cle.com, mathieu.desnoyers@...icios.com, 
	peterz@...radead.org, paulmck@...nel.org, boqun.feng@...il.com, 
	dave.hansen@...ux.intel.com, jannh@...gle.com, jorgelo@...omium.org, 
	keescook@...omium.org, keith.lucas@...cle.com, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, mingo@...nel.org, rick.p.edgecombe@...el.com, 
	sroettger@...gle.com, tglx@...utronix.de, x86@...nel.org
Subject: Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE

On Mon, 10 Feb 2025 at 23:46, Jeff Xu <jeffxu@...omium.org> wrote:
>
> Hi Dmitry
>
> On Thu, Feb 6, 2025 at 10:06 AM Dmitry Vyukov <dvyukov@...gle.com> wrote:
> >
> > On Tue, 4 Feb 2025 at 11:02, Dmitry Vyukov <dvyukov@...gle.com> wrote:
> > >
> > > Re commit 70044df250d022572e26cd301bddf75eac1fe50e:
> > > https://lore.kernel.org/all/20240802061318.2140081-4-aruna.ramakrishna@oracle.com/
> > >
> > > > If the alternate signal stack is protected by a different pkey than the
> > > > current execution stack, copying xsave data to the sigaltstack will fail
> > > > if its pkey is not enabled in the PKRU register.
> > > >
> > > > We do not know which pkey was used by the application for the altstack,
> > > > so enable all pkeys before xsave.
> > > >
> > > > But this updated PKRU value is also pushed onto the sigframe, which
> > > > means the register value restored from sigcontext will be different from
> > > > the user-defined one, which is unexpected. Fix that by overwriting the
> > > > PKRU value on the sigframe with the original, user-defined PKRU.
> > >
> > > Hi,
> > >
> > > This unfortunatly seems to be broken for rseq user-space writes.
> > > If the signal is caused by rseq struct being inaccessible due to PKEYs,
> > > we try to write to rseq again at setup_rt_frame->rseq_signal_deliver,
> > > which happens _before_ sig_prepare_pkru and won't succeed
> > > (PKEY is still inaccessible, hard kills the process).
> > > Any PKEY sandbox would want to restict untrusted access to rseq
> > > as well (otherwise allows easy sandbox escapes).
> > >
> > > If we do sig_prepare_pkru before rseq_signal_deliver (and generally
> > > before any copy_to_userpace), then user-space handler gets SIGSEGV
> > > and could unregister rseq and retry.
> > >
> > > However, I am not sure if it's the best solution performance-
> > > and complexity-wise (for user-space). A better solution may be to
> > > change __rseq_handle_notify_resume to temporary switch to default
> > > PKEY if user accesses fail.
> > > Rseq is similar to signals in this respect. Since rseq updates
> > > happen asynchronously with respect to user-space control flow,
> > > if a program uses rseq and ever makes rseq inaccessible with PKEYs,
> > > it's in trouble and will be randomly killed.
> > > Since rseq updates are asynchronous as signals, they shouldn't
> > > assume PKEY is set to default value that allows access
> > > to rseq descriptor.
> > >
> > > Thoughts?
> >
> > Another question about switching to pkey 0 and not switching back on all errors.
> > Can it create security problems by allowing sandboxed code to escape?
> >
> Sandbox escape would be bad , we wouldn't want the calling thread to
> get PKRU = 0 in any error path.
>
> > Namely, here:
> >
> > +        /* Update PKRU to enable access to the alternate signal stack. */
> > +        pkru = sig_prepare_pkru();
> >          /* save i387 and extended state */
> > -        if (!copy_fpstate_to_sigframe(*fpstate, (void __user
> > *)buf_fx, math_size, pkru))
> > +        if (!copy_fpstate_to_sigframe(*fpstate, (void __user
> > *)buf_fx, math_size, pkru)) {
> > +                /*
> > +                 * Restore PKRU to the original, user-defined value; disable
> > +                 * extra pkeys enabled for the alternate signal stack, if any.
> > +                 */
> > +                write_pkru(pkru);
> >                  return (void __user *)-1L;
> > +        }
> >
> > we restore to the original pkru on this error, but there are other
> > failure paths later, e.g.:
> > https://elixir.bootlin.com/linux/v6.13.1/source/arch/x86/kernel/signal_64.c#L199
> >
> > on these errors paths we will eventually get here to force_sig(SIGSEGV):
> > https://elixir.bootlin.com/linux/v6.13.1/source/kernel/signal.c#L1685
> > which just sends SIGSEGV and is not fatal.
> >
> > So hypothetically, if there is a SIGUSR1 handler without SA_ONSTACK,
> > which fails, but SIGSEGV handler has SA_ONSTACK and doesn't fail, this
> > will result in resetting PKRU to 0 without restoring it back.
> > Or sandboxed code somehow arranges for the first signal setup for other reasons.
> >
> Can you walk me through the setup and steps that led to this situation?

I don't anything more concrete steps, just the observation that PKRU
is restored only on 1 out of N error paths.

> > This is, of course, a tricky attack vector, and the program must
> > resume after SIGSEGV somehow (there are some such cases, e.g. mmaping
> > something lazily and retrying), but with security you never know how
> > creative an attacker can get and what you are missing that they are
> > not missing. So it looks safer to restore to the original PKRU on all
> > errors.
>
> Thanks
> -Jeff