linux-kernel - Re: [PATCH 3/4] rseq: Make rseq work with protection keys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+bKde42MqWsxXD99RZhDZkkNs58kv1H9dxPAh13QLqNEw@mail.gmail.com>
Date: Fri, 21 Feb 2025 12:22:16 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: peterz@...radead.org, boqun.feng@...il.com, tglx@...utronix.de, 
	mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com, 
	aruna.ramakrishna@...cle.com, elver@...gle.com, 
	"Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] rseq: Make rseq work with protection keys

On Tue, 18 Feb 2025 at 16:37, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>
> On Tue, 18 Feb 2025 at 16:27, Mathieu Desnoyers
> <mathieu.desnoyers@...icios.com> wrote:
> >
> > On 2025-02-18 10:10, Dmitry Vyukov wrote:
> > > On Tue, 18 Feb 2025 at 15:57, Mathieu Desnoyers
> > > <mathieu.desnoyers@...icios.com> wrote:
> > >>
> > >> On 2025-02-18 02:55, Dmitry Vyukov wrote:
> > >>> On Mon, 17 Feb 2025 at 21:21, Mathieu Desnoyers
> > >> [...]
> > >>>>
> > >>>>     we still let this function read the rseq_cs.
> > >>>>> +      * It's unclear what benefits the resticted code gets by doing this
> > >>>>
> > >>>> restricted
> > >>>>
> > >>>>> +      * (it probably already hijacked control flow at this point), and
> > >>>>> +      * presumably any sane sandbox should prohibit restricted code
> > >>>>> +      * from accessing struct rseq, and this is still better than
> > >>>>> +      * terminating the app unconditionally (it always has a choice
> > >>>>> +      * of not using rseq and pkeys together).
> > >>>>
> > >>>> Note that because userspace can complete an rseq critical section
> > >>>> without clearing the rseq_cs pointer, this could happen simply because
> > >>>> the kernel is preempting the task after it has:
> > >>>>
> > >>>> 1) completed an rseq critical section, without clearing rseq_cs,
> > >>>> 2) changed pkey.
> > >>>>
> > >>>> So allowing this is important, and I would remove the comment about
> > >>>> hijacked control flow and such. This can happen with normal use of the
> > >>>> ABI.
> > >>>
> > >>> Thanks for the review!
> > >>>
> > >>> I've addressed all comments in the series in v2.
> > >>>
> > >>> I've reworded this paragraph to simplify sentences, but I still kept
> > >>> the note aboud malicious rseq_cs.
> > >>>
> > >>> If we would not be circumventing normal protection, then, yes, these
> > >>> cases would be the same. But since we are circumventing protection
> > >>> that otherwise exists, I think it's important to think about
> > >>> potentially malicious cases. In this context inaccessible rseq_cs
> > >>> values that resulted from normal execution are very different from
> > >>> malicious onces. Normal ones will point to a fixed set of real
> > >>> well-formed rseq_cs objects, while malicious ones may point to
> > >>> god-knows-where in an attempt of an attacker to do things we can't
> > >>> even imagine right now (e.g. rseq_cs overlapping with protected crypto
> > >>> keys).
> > >>>
> > >>> It's as if a particular instance of copy_to_user would allow
> > >>> user-space to write arbitrary kernel memory, and memory of other
> > >>> processes circumventing all normal protections. In that context we
> > >>> would need to be very careful regarding what we actually allow.
> > >>
> > >> I'm considering that we should clear the rseq_cs pointer whenever
> > >> userspace issues pkey_mprotect.
> > >>
> > >> This would ensure that no legitimate scenario can trigger a load
> > >> from a rseq_cs area which has the wrong pkey, and therefore we
> > >> could accept read/write from/to a struct rseq which has the wrong
> > >> pkey, but kill the process if trying to read/write from a
> > >> struct rseq_cs with the wrong key. This would prevent userspace
> > >> from making the kernel read/write from/to memory with the wrong
> > >> pkey through a pointer it controls (rseq_cs pointer).
> > >>
> > >> Thoughts ?
> > >
> > > I am not following.
> > >
> > > There are pkey_mprotect calls, then independently installs on rseq_cs
> > > pointers that happen concurrently and after pkey_mprotect, and
> > > independent set of pkey_set calls that happens concurrently and after
> > > the previous 2.
> > > I don't see how doing something at the pkey_mprotect call for the
> > > single thread avoids any scenarios.
> >
> > Hrm. Sorry, I mixed up pkey_set() vs pkey_mprotect(). What I had in mind
> > was actually pkey_set(). And that would need to clear rseq_cs for all
> > threads belonging to the process, which may not be straightforward
> > because those could legitimately be inside a rseq critical section.
> >
> > OK, let's try another approach: rather than kill the process if
> > read/write of the rseq_cs area with the wrong key fails, could we simply
> > clear the rseq_cs pointer in that case ? Technically there would be no
> > legitimate use of this except for the case where it is meant to be lazily
> > cleared.
>
> This may work, but 2 concerns with this:
> 1. We don't know if the failure happened due to pkeys or not (at least
> not easily), and I am afraid of touching the logic for other failures.
> If the rseq_cs was a bogus pointer, or protected with normal mprotect,
> what does it mean? Are we masking a programming bug? Are we
> circumventing some other protections that were supposed to lead to the
> process termination?
> 2. This will complicate __rseq_handle_notify_resume() logic as it
> would need to handle failures when accessing rseq and rseq_cs
> differently (+plus there is signature check). The more complex the
> logic, the higher chances of adding a bug now or in future.


Do you see any problem with the current code? What exactly is it?
If not, can we merge it as is?