linux-kernel - Re: [PATCH v7 3/4] rseq: Make rseq work with protection keys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <lhuy0ns3971.fsf@oldenburg.str.redhat.com>
Date: Wed, 26 Nov 2025 20:06:26 +0100
From: Florian Weimer <fweimer@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Kevin Brodsky <kevin.brodsky@....com>,  Dmitry Vyukov
 <dvyukov@...gle.com>,  mathieu.desnoyers@...icios.com,
  peterz@...radead.org,  boqun.feng@...il.com,  mingo@...hat.com,
  bp@...en8.de,  dave.hansen@...ux.intel.com,  hpa@...or.com,
  aruna.ramakrishna@...cle.com,  elver@...gle.com,  "Paul E. McKenney"
 <paulmck@...nel.org>,  x86@...nel.org,  linux-kernel@...r.kernel.org,
  Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH v7 3/4] rseq: Make rseq work with protection keys

* Thomas Gleixner:

>> I'm less concerned about the impact on restart of restartable sequences
>> because by design, it's a non-modular feature: syscalls and function
>> calls are already banned.  If the code wants to restart, it has to make
>> sure that the access rights at the restart point are correct.  But
>> that's like any other register contents, I think.
>
> It's not only restart. RSEQ is also accessed by the kernel for storing
> CPUID, NODEID, CID. Some of that is used in glibc today, no?

But glibc code cannot run from within an rseq critical section.  And I
think it's not reasonable to expect that if you revoke access to all
allocated protection keys, it's well-defined t o call library code.

>> Would it help to allocate a dedicated key for rseq and specify that
>> userspace must always include this access in the accessible set?
>
> That would definitely be helpful to avoid switching PKRU in rseq
> handling code on exit to user space.
>
> Though with the reworked RSEQ code the extra overhead might not be
> horrible. See below.

We might have to dedicate an extra page, too.  So I prefer to avoid it
possible.

I think I missed the below part?

> But like with signals just blindly enabling key0 and hope that it works
> is not really a solution. Nothing prevents me from disabling RSEQ for
> glibc. Then install my own RSEQ page and mprotect it. When that key
> becomes disabled in PKRU and the code section is interrupted then exit
> to user space will fault and die in exactly the same way as
> today. That's progress...

But does that matter?  If I mprotect the stack and a signal arrives,
that results in a crash, too.  Some things just don't work.

> So we really need to sit down and actually define a proper programming
> model first instead of trying to duct tape the current ill defined mess
> forever.
>
> What do we have to take into account:
>
>    1) signals
>
>       Broken as we know already.
>
>       IMO, the proper solution is to provide a mechanism to register a
>       set of permissions which are used for signal delivery. The
>       resulting hardware value should expand the permission, but keep
>       the current active ones enabled.
>
>       That can be kinda kept backwards compatible as the signal perms
>       would default to PKEY0.

I had validated at one point that this works (although the patch that
enables internal pkeys usage in glibc did not exist back then).

  pkeys: Support setting access rights for signal handlers
  <https://lore.kernel.org/linux-mm/5fee976a-42d4-d469-7058-b78ad8897219@redhat.com/>

>    2) rseq
>
>       The option of having a separate key which needs to be always
>       enabled is definitely simple, but it wastes a key just for
>       that. There are only 16 of them :(
>
>       If we solve the signal case with an explicit permission set, we
>       can just reuse those signal permissions. They are maybe wider than
>       what's required to access RSEQ, but the signal permissions have to
>       include the TLS/RSEQ area to actually work.

Would it address the use case for single-colored memory access?  Or
would that still crash if the process gets descheduled while the access
rights register is set to the restricted value?

Thanks,
Florian