linux-kernel - Re: [PATCH 3/4] rseq: Make rseq work with protection keys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CACT4Y+Z=q8CX5x01o2vZA3n+VjuQD2uUKsd-osCEf8LziDUZ1Q@mail.gmail.com>
Date: Mon, 24 Feb 2025 14:35:24 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, peterz@...radead.org, 
	boqun.feng@...il.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, 
	dave.hansen@...ux.intel.com, hpa@...or.com, aruna.ramakrishna@...cle.com, 
	elver@...gle.com, "Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] rseq: Make rseq work with protection keys

On Fri, 21 Feb 2025 at 22:45, Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 2/21/25 13:36, Mathieu Desnoyers wrote:
> >>>
> >>
> >> Because the rseq return to userspace handler is called on every return
> >> to userspace after a task is scheduled back after preemption, I am
> >> concerned about the overhead that would be added by a WRPKRU on the
> >> fast-path, given that it acts as as barrier against speculation. Issuing
> >> WRPKRU only after checking that pkey-0 is not accessible appears to be
> >> moving the overhead to a much less common case.
> >
> > Actually, we should distinguish between two accesses here:
> >
> > A) loads/stores from/to struct rseq
> >
> > B) loads from struct rseq_cs (only happens on rseq abort)
> >
> > (A) is a fast-path executed on return to userspace after a preemption.
> > In order to make it fast, we could require that struct rseq is pkey-0
> > and typically skip any WRPKRU for this access when pkey-0 is already
> > accessible. We can add a check on rseq registration to make sure that
> > struct rseq is indeed pkey-0, and reject it with an error if not. This
> > should help make the ABI robust and less error-prone.
> >
> > Now for (B), it's a slow path. When we observe that rseq->rseq_cs is
> > not NULL, we can simply override with a permissive pkey to make sure
> > the rseq_cs access will work.
> >
> > Thoughts ?
> I think this will be the first ABI which is explicitly pkey-0-only. I
> suspect there are a few more of these that are implicit but we just
> haven't found them yet.
>
> I wouldn't have any objections about doing this, especially given
> sanity checking at rseq registration.


Thanks for the thoughtful review.

I've tried to incorporate all suggestions in v4:
https://lore.kernel.org/all/68427864e0ca38af06482c96728216c3e0973418.1740403209.git.dvyukov@google.com/T/#m431c13b6140e447d41d228fb942d9e4b8a89874a

Yes, the intention for doing the switch only on the error path was to
avoid WRPKRU on the hot path.

For my current use case (at least how I currently have it implemented)
struct rseq and altstack are indeed protected with pkey 0. So I added
a new enable_zero_pkey_val() function that lazily enables only pkey 0.

Also added Fixes tag (for everything except for signal.c refactoring).
The "fixed" commit seems to be the original rseq patch d7822b1e24f2
("rseq: Introduce restartable sequences system call") b/c PKEYs
introduction is older.

I also had to significantly rework the test to make it work with rseq
protected by pkey 0 (which means we need to revoke access to pkey 0,
which means stack/tls/errno also become inaccessible).

Please take another look at v4.

I don't have preference as to how this should get into the Linus tree.
Hopefully the changes are not too pervasive for all subsystems to
cause massive conflicts when merging.