[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <873460h5yb.ffs@tglx>
Date: Wed, 26 Nov 2025 21:52:44 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Florian Weimer <fweimer@...hat.com>
Cc: Kevin Brodsky <kevin.brodsky@....com>, Dmitry Vyukov
<dvyukov@...gle.com>, mathieu.desnoyers@...icios.com,
peterz@...radead.org, boqun.feng@...il.com, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
aruna.ramakrishna@...cle.com, elver@...gle.com, "Paul E. McKenney"
<paulmck@...nel.org>, x86@...nel.org, linux-kernel@...r.kernel.org, Jens
Axboe <axboe@...nel.dk>
Subject: Re: [PATCH v7 3/4] rseq: Make rseq work with protection keys
On Wed, Nov 26 2025 at 20:06, Florian Weimer wrote:
> * Thomas Gleixner:
>> But like with signals just blindly enabling key0 and hope that it works
>> is not really a solution. Nothing prevents me from disabling RSEQ for
>> glibc. Then install my own RSEQ page and mprotect it. When that key
>> becomes disabled in PKRU and the code section is interrupted then exit
>> to user space will fault and die in exactly the same way as
>> today. That's progress...
>
> But does that matter? If I mprotect the stack and a signal arrives,
> that results in a crash, too. Some things just don't work.
They can be made work when we have a dedicated permission setting for
signals, which can be used for rseq access too. And having the explicit
signal permissions make a lot of sense independent of the above absurd
use case which I just used for illustration.
>> So we really need to sit down and actually define a proper programming
>> model first instead of trying to duct tape the current ill defined mess
>> forever.
>>
>> What do we have to take into account:
>>
>> 1) signals
>>
>> Broken as we know already.
>>
>> IMO, the proper solution is to provide a mechanism to register a
>> set of permissions which are used for signal delivery. The
>> resulting hardware value should expand the permission, but keep
>> the current active ones enabled.
>>
>> That can be kinda kept backwards compatible as the signal perms
>> would default to PKEY0.
>
> I had validated at one point that this works (although the patch that
> enables internal pkeys usage in glibc did not exist back then).
>
> pkeys: Support setting access rights for signal handlers
> <https://lore.kernel.org/linux-mm/5fee976a-42d4-d469-7058-b78ad8897219@redhat.com/>
That looks about right and what I had in mind. Seems I missed that back
in the days and that discussion unfortunately ran into a dead end :(
>> 2) rseq
>>
>> The option of having a separate key which needs to be always
>> enabled is definitely simple, but it wastes a key just for
>> that. There are only 16 of them :(
>>
>> If we solve the signal case with an explicit permission set, we
>> can just reuse those signal permissions. They are maybe wider than
>> what's required to access RSEQ, but the signal permissions have to
>> include the TLS/RSEQ area to actually work.
>
> Would it address the use case for single-colored memory access? Or
> would that still crash if the process gets descheduled while the access
> rights register is set to the restricted value?
It would just work the same way as signals. Assume
signal_perms = [PK0=RW, PK1=R, PK2=RW]
set_pkey(PK0..6=NONE, PK7=R)
access() <- can fault
<- or interrupt can happen
set_pkey(normal)
So when the fault or interrupt results in a signal and/or the return to
user space needs to access RSEQ we have in signal delivery:
cur = pkey_extend(signal_perms);
--> Perms are now [PK0=RW, PK1=R, PK2=RW, PK7=R]
access_user_stack();
....
// Return with the extended permissions to deliver the signal
// Will be restored on sigreturn
and in rseq:
cur = pkey_extend(signal_perms);
--> Perms are now [PK0=RW, PK1=R, PK2=RW, PK7=R]
access_user_rseq();
pkey_set(cur);
If the RSEQ access is nested in the signal delivery return then nothing
happens as the permissions are not changing because they are already
extended: A | A = A :).
The kernel does not care about the PKEY permissions when the user to
kernel transition is due to an interrupt/exception except for the signal
and rseq case.
In fact the above also works with my made up example. Just assume the
RSEQ page is protected by PK2. :)
Syscalls are a different story as copy_to/from_user() obviously requires
the proper permissions and the kernel can rightfully expect that stack
and rseq are accessible, but that's not what we are debating here.
Thanks,
tglx
Powered by blists - more mailing lists