[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+bu3WtdMLEOj0qFC_MK4G20Joq52Cr8W86Xx4xK3MsB9A@mail.gmail.com>
Date: Tue, 18 Feb 2025 16:37:41 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: peterz@...radead.org, boqun.feng@...il.com, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
aruna.ramakrishna@...cle.com, elver@...gle.com,
"Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] rseq: Make rseq work with protection keys
On Tue, 18 Feb 2025 at 16:27, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
>
> On 2025-02-18 10:10, Dmitry Vyukov wrote:
> > On Tue, 18 Feb 2025 at 15:57, Mathieu Desnoyers
> > <mathieu.desnoyers@...icios.com> wrote:
> >>
> >> On 2025-02-18 02:55, Dmitry Vyukov wrote:
> >>> On Mon, 17 Feb 2025 at 21:21, Mathieu Desnoyers
> >> [...]
> >>>>
> >>>> we still let this function read the rseq_cs.
> >>>>> + * It's unclear what benefits the resticted code gets by doing this
> >>>>
> >>>> restricted
> >>>>
> >>>>> + * (it probably already hijacked control flow at this point), and
> >>>>> + * presumably any sane sandbox should prohibit restricted code
> >>>>> + * from accessing struct rseq, and this is still better than
> >>>>> + * terminating the app unconditionally (it always has a choice
> >>>>> + * of not using rseq and pkeys together).
> >>>>
> >>>> Note that because userspace can complete an rseq critical section
> >>>> without clearing the rseq_cs pointer, this could happen simply because
> >>>> the kernel is preempting the task after it has:
> >>>>
> >>>> 1) completed an rseq critical section, without clearing rseq_cs,
> >>>> 2) changed pkey.
> >>>>
> >>>> So allowing this is important, and I would remove the comment about
> >>>> hijacked control flow and such. This can happen with normal use of the
> >>>> ABI.
> >>>
> >>> Thanks for the review!
> >>>
> >>> I've addressed all comments in the series in v2.
> >>>
> >>> I've reworded this paragraph to simplify sentences, but I still kept
> >>> the note aboud malicious rseq_cs.
> >>>
> >>> If we would not be circumventing normal protection, then, yes, these
> >>> cases would be the same. But since we are circumventing protection
> >>> that otherwise exists, I think it's important to think about
> >>> potentially malicious cases. In this context inaccessible rseq_cs
> >>> values that resulted from normal execution are very different from
> >>> malicious onces. Normal ones will point to a fixed set of real
> >>> well-formed rseq_cs objects, while malicious ones may point to
> >>> god-knows-where in an attempt of an attacker to do things we can't
> >>> even imagine right now (e.g. rseq_cs overlapping with protected crypto
> >>> keys).
> >>>
> >>> It's as if a particular instance of copy_to_user would allow
> >>> user-space to write arbitrary kernel memory, and memory of other
> >>> processes circumventing all normal protections. In that context we
> >>> would need to be very careful regarding what we actually allow.
> >>
> >> I'm considering that we should clear the rseq_cs pointer whenever
> >> userspace issues pkey_mprotect.
> >>
> >> This would ensure that no legitimate scenario can trigger a load
> >> from a rseq_cs area which has the wrong pkey, and therefore we
> >> could accept read/write from/to a struct rseq which has the wrong
> >> pkey, but kill the process if trying to read/write from a
> >> struct rseq_cs with the wrong key. This would prevent userspace
> >> from making the kernel read/write from/to memory with the wrong
> >> pkey through a pointer it controls (rseq_cs pointer).
> >>
> >> Thoughts ?
> >
> > I am not following.
> >
> > There are pkey_mprotect calls, then independently installs on rseq_cs
> > pointers that happen concurrently and after pkey_mprotect, and
> > independent set of pkey_set calls that happens concurrently and after
> > the previous 2.
> > I don't see how doing something at the pkey_mprotect call for the
> > single thread avoids any scenarios.
>
> Hrm. Sorry, I mixed up pkey_set() vs pkey_mprotect(). What I had in mind
> was actually pkey_set(). And that would need to clear rseq_cs for all
> threads belonging to the process, which may not be straightforward
> because those could legitimately be inside a rseq critical section.
>
> OK, let's try another approach: rather than kill the process if
> read/write of the rseq_cs area with the wrong key fails, could we simply
> clear the rseq_cs pointer in that case ? Technically there would be no
> legitimate use of this except for the case where it is meant to be lazily
> cleared.
This may work, but 2 concerns with this:
1. We don't know if the failure happened due to pkeys or not (at least
not easily), and I am afraid of touching the logic for other failures.
If the rseq_cs was a bogus pointer, or protected with normal mprotect,
what does it mean? Are we masking a programming bug? Are we
circumventing some other protections that were supposed to lead to the
process termination?
2. This will complicate __rseq_handle_notify_resume() logic as it
would need to handle failures when accessing rseq and rseq_cs
differently (+plus there is signature check). The more complex the
logic, the higher chances of adding a bug now or in future.
> Thanks,
>
> Mathieu
>
> > Moreover, pkey 0 is preinstalled for all pages, but access to it can
> > be revoked in future.
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
Powered by blists - more mailing lists