[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b42dc8d7-2f2e-466f-bdca-0d532a0d5928@efficios.com>
Date: Tue, 25 Feb 2025 09:28:44 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: peterz@...radead.org, boqun.feng@...il.com, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
aruna.ramakrishna@...cle.com, elver@...gle.com,
"Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 3/4] rseq: Make rseq work with protection keys
On 2025-02-25 09:07, Dmitry Vyukov wrote:
> On Mon, 24 Feb 2025 at 20:18, Mathieu Desnoyers
> <mathieu.desnoyers@...icios.com> wrote:
>>
>> On 2025-02-24 08:20, Dmitry Vyukov wrote:
>>> If an application registers rseq, and ever switches to another pkey
>>> protection (such that the rseq becomes inaccessible), then any
>>> context switch will cause failure in __rseq_handle_notify_resume()
>>> attempting to read/write struct rseq and/or rseq_cs. Since context
>>> switches are asynchronous and are outside of the application control
>>> (not part of the restricted code scope), temporarily switch to
>>> pkey value that allows access to the 0 (default) PKEY.
>>
>> This is a good start, but the plan Dave and I discussed went further
>> than this. Those additions are needed:
>>
>> 1) Add validation at rseq registration that the struct rseq is indeed
>> pkey-0 memory (return failure if not).
>
> I don't think this is worth it for multiple reasons:
> - a program may first register it and then assign a key, which means
> we also need to check in pkey_mprotect
> - pkey_mprotect may be applied to rseq of another thread, so ensuring
> that will require complex code with non-trivial synchronization and
> will add considerable overhead to pkey_mprotect call
> - a program may assign non-0 pkey but have it always accessible, such
> programs will break by the new check
> - the misuse is already detected by rseq code, and UNIX errno-based
> reporting is not very informative and does not add much value on top
> of existing reporting
> - this is not different from registering rseq and then unmap'ing the
> memory, checking that does not look like a good idea, and checking
> only subset of misuses is inconsistent
>
> Based on my experience with rseq, what would be useful is reporting a
> meaningful siginfo for access errors (address/unique code) and fixing
> signal delivery. That would solve all of the above problems, and
> provide useful info for the user (not just confusing EINVAL from
> mprotect/munmap).
>
> But I would prefer to not mix these unrelated usability improvements
> and bug fixes with this change. That's not related to this change.
I agree with your arguments. If Dave is OK with it, I'd be fine with
leaving out the pkey-0 validation on rseq registration, and eventually
bring meaningful siginfo access errors as future improvements.
So the new behavior would be that both rseq and rseq_cs are required
to be pkey-0. If they are not and their pkey is not accessible in the
current context, it would trigger a segmentation fault. Ideally we'd
want to document this somewhere in the UAPI header.
>
>
>> 2) The pkey-0 requirement is only for struct rseq, which we can check
>> for at rseq registration, and happens to be the fast path. For struct
>> rseq_cs, this is not the same tradeoff: we cannot easily check its
>> associated pkey because the rseq_cs pointer is updated by userspace
>> when entering a critical section. But the good news is that reading
>> the content of struct rseq_cs is *not* a fast-path: it's only done
>> when preempting/delivering a signal over a thread which has a
>> non-NULL rseq_cs pointer.
>
> rseq_cs is usually accessed on a hot path since rseq_cs pointer is not
> cleared on critical section exit (at least that's what we do).
Fair point.
>
>> Therefore reading the struct rseq_cs content should be done with
>> write_permissive_pkey_val(), giving access to all pkeys.
>
> You just asked me to redo the code to simplify it, won't this
> complicate it back again? ;)
I'm fine with the pkey-0 approach for both rseq and rseq_cs if Dave is
also OK with it.
Thanks,
Mathieu
>
>
>> Thanks,
>>
>> Mathieu
>>
>>>
>>> Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
>>> Cc: Peter Zijlstra <peterz@...radead.org>
>>> Cc: "Paul E. McKenney" <paulmck@...nel.org>
>>> Cc: Boqun Feng <boqun.feng@...il.com>
>>> Cc: Thomas Gleixner <tglx@...utronix.de>
>>> Cc: Ingo Molnar <mingo@...hat.com>
>>> Cc: Borislav Petkov <bp@...en8.de>
>>> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
>>> Cc: "H. Peter Anvin" <hpa@...or.com>
>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@...cle.com>
>>> Cc: x86@...nel.org
>>> Cc: linux-kernel@...r.kernel.org
>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>>>
>>> ---
>>> Changes in v4:
>>> - Added Fixes tag
>>>
>>> Changes in v3:
>>> - simplify control flow to always enable access to 0 pkey
>>>
>>> Changes in v2:
>>> - fixed typos and reworded the comment
>>> ---
>>> kernel/rseq.c | 11 +++++++++++
>>> 1 file changed, 11 insertions(+)
>>>
>>> diff --git a/kernel/rseq.c b/kernel/rseq.c
>>> index 2cb16091ec0ae..9d9c976d3b78c 100644
>>> --- a/kernel/rseq.c
>>> +++ b/kernel/rseq.c
>>> @@ -10,6 +10,7 @@
>>>
>>> #include <linux/sched.h>
>>> #include <linux/uaccess.h>
>>> +#include <linux/pkeys.h>
>>> #include <linux/syscalls.h>
>>> #include <linux/rseq.h>
>>> #include <linux/types.h>
>>> @@ -402,11 +403,19 @@ static int rseq_ip_fixup(struct pt_regs *regs)
>>> void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
>>> {
>>> struct task_struct *t = current;
>>> + pkey_reg_t saved_pkey;
>>> int ret, sig;
>>>
>>> if (unlikely(t->flags & PF_EXITING))
>>> return;
>>>
>>> + /*
>>> + * Enable access to the default (0) pkey in case the thread has
>>> + * currently disabled access to it and struct rseq/rseq_cs has
>>> + * 0 pkey assigned (the only supported value for now).
>>> + */
>>> + saved_pkey = enable_zero_pkey_val();
>>> +
>>> /*
>>> * regs is NULL if and only if the caller is in a syscall path. Skip
>>> * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
>>> @@ -419,9 +428,11 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
>>> }
>>> if (unlikely(rseq_update_cpu_node_id(t)))
>>> goto error;
>>> + write_pkey_val(saved_pkey);
>>> return;
>>>
>>> error:
>>> + write_pkey_val(saved_pkey);
>>> sig = ksig ? ksig->sig : 0;
>>> force_sigsegv(sig);
>>> }
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists