lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Ym7=9mLS8b=Rq6cyHMgyboMqh15nqkRfgru-qFVTx_0A@mail.gmail.com>
Date: Tue, 18 Feb 2025 08:55:39 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: peterz@...radead.org, boqun.feng@...il.com, tglx@...utronix.de, 
	mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com, 
	aruna.ramakrishna@...cle.com, elver@...gle.com, 
	"Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] rseq: Make rseq work with protection keys

On Mon, 17 Feb 2025 at 21:21, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
>
> On 2025-02-17 06:07, Dmitry Vyukov wrote:
> > If an application registers rseq, and ever switches to another pkey
> > protection (such that the rseq becomes inaccessible), then any
> > context switch will cause failure in __rseq_handle_notify_resume()
> > attempting to read/write struct rseq and/or rseq_cs. Since context
> > switches are asynchronous and are outside of the application control
> > (not part of the restricted code scope), temporarily switch to
> > premissive pkey register to read/write rseq/rseq_cs, similarly
>
> permissive
>
> > to signal delivery accesses to altstack.
> >
> > Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> > Cc: Peter Zijlstra <peterz@...radead.org>
> > Cc: "Paul E. McKenney" <paulmck@...nel.org>
> > Cc: Boqun Feng <boqun.feng@...il.com>
> > Cc: Thomas Gleixner <tglx@...utronix.de>
> > Cc: Ingo Molnar <mingo@...hat.com>
> > Cc: Borislav Petkov <bp@...en8.de>
> > Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> > Cc: "H. Peter Anvin" <hpa@...or.com>
> > Cc: Aruna Ramakrishna <aruna.ramakrishna@...cle.com>
> > Cc: x86@...nel.org
> > Cc: linux-kernel@...r.kernel.org
> > ---
> >   kernel/rseq.c | 36 ++++++++++++++++++++++++++++++++++++
> >   1 file changed, 36 insertions(+)
> >
> > diff --git a/kernel/rseq.c b/kernel/rseq.c
> > index 442aba29bc4cf..31cd94b370ef3 100644
> > --- a/kernel/rseq.c
> > +++ b/kernel/rseq.c
> > @@ -10,6 +10,7 @@
> >
> >   #include <linux/sched.h>
> >   #include <linux/uaccess.h>
> > +#include <linux/pkeys.h>
> >   #include <linux/syscalls.h>
> >   #include <linux/rseq.h>
> >   #include <linux/types.h>
> > @@ -403,10 +404,13 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
> >   {
> >       struct task_struct *t = current;
> >       int ret, sig;
> > +     pkey_reg_t saved;
> > +     bool switched_pkey_reg = false;
> >
> >       if (unlikely(t->flags & PF_EXITING))
> >               return;
> >
> > +retry:
> >       /*
> >        * regs is NULL if and only if the caller is in a syscall path.  Skip
> >        * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
> > @@ -419,9 +423,41 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
> >       }
> >       if (unlikely(rseq_update_cpu_node_id(t)))
> >               goto error;
> > +     if (switched_pkey_reg)
> > +             write_pkey_reg(saved);
> >       return;
> >
> >   error:
> > +     /*
> > +      * If the application registers rseq, and ever switches to another
> > +      * pkey protection (such that the rseq becomes inaccessible), then
> > +      * any context switch will cause failure here attempting to read/write
> > +      * struct rseq and/or rseq_cs. Since context switches are
> > +      * asynchronous and are outside of the application control
> > +      * (not part of the restricted code scope), we temporarily switch
>
> Remove "we".
>
> > +      * to premissive pkey register to read/write rseq/rseq_cs,
>
> permissive
>
> > +      * similarly to signal delivery accesses to altstack.
> > +      *
> > +      * We don't bother to check if the failure really happened due to
>
> Remove "We".
>
> > +      * pkeys or not, since it does not matter (performance-wise and
> > +      * otherwise).
> > +      *
> > +      * If the restricted code installs rseq_cs in inaccessible to it
> > +      * due to pkeys memory,
>
> This sentence should be reworded.
>
>   we still let this function read the rseq_cs.
> > +      * It's unclear what benefits the resticted code gets by doing this
>
> restricted
>
> > +      * (it probably already hijacked control flow at this point), and
> > +      * presumably any sane sandbox should prohibit restricted code
> > +      * from accessing struct rseq, and this is still better than
> > +      * terminating the app unconditionally (it always has a choice
> > +      * of not using rseq and pkeys together).
>
> Note that because userspace can complete an rseq critical section
> without clearing the rseq_cs pointer, this could happen simply because
> the kernel is preempting the task after it has:
>
> 1) completed an rseq critical section, without clearing rseq_cs,
> 2) changed pkey.
>
> So allowing this is important, and I would remove the comment about
> hijacked control flow and such. This can happen with normal use of the
> ABI.

Thanks for the review!

I've addressed all comments in the series in v2.

I've reworded this paragraph to simplify sentences, but I still kept
the note aboud malicious rseq_cs.

If we would not be circumventing normal protection, then, yes, these
cases would be the same. But since we are circumventing protection
that otherwise exists, I think it's important to think about
potentially malicious cases. In this context inaccessible rseq_cs
values that resulted from normal execution are very different from
malicious onces. Normal ones will point to a fixed set of real
well-formed rseq_cs objects, while malicious ones may point to
god-knows-where in an attempt of an attacker to do things we can't
even imagine right now (e.g. rseq_cs overlapping with protected crypto
keys).

It's as if a particular instance of copy_to_user would allow
user-space to write arbitrary kernel memory, and memory of other
processes circumventing all normal protections. In that context we
would need to be very careful regarding what we actually allow.


> Thanks,
>
> Mathieu
>
>
> > +      */
> > +     if (!switched_pkey_reg) {
> > +             switched_pkey_reg = true;
> > +             saved = switch_to_permissive_pkey_reg();
> > +             goto retry;
> > +     } else {
> > +             write_pkey_reg(saved);
> > +     }
> >       sig = ksig ? ksig->sig : 0;
> >       force_sigsegv(sig);
> >   }
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ