linux-kernel - Re: [PATCH v4 3/4] rseq: Make rseq work with protection keys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47797927-4d87-4101-9834-eac84d814114@efficios.com>
Date: Tue, 25 Feb 2025 09:53:50 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: peterz@...radead.org, boqun.feng@...il.com, tglx@...utronix.de,
 mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
 aruna.ramakrishna@...cle.com, elver@...gle.com,
 "Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 3/4] rseq: Make rseq work with protection keys

On 2025-02-25 09:51, Dmitry Vyukov wrote:
> On Tue, 25 Feb 2025 at 15:28, Mathieu Desnoyers
> <mathieu.desnoyers@...icios.com> wrote:
>>
>> On 2025-02-25 09:07, Dmitry Vyukov wrote:
>>> On Mon, 24 Feb 2025 at 20:18, Mathieu Desnoyers
>>> <mathieu.desnoyers@...icios.com> wrote:
>>>>
>>>> On 2025-02-24 08:20, Dmitry Vyukov wrote:
>>>>> If an application registers rseq, and ever switches to another pkey
>>>>> protection (such that the rseq becomes inaccessible), then any
>>>>> context switch will cause failure in __rseq_handle_notify_resume()
>>>>> attempting to read/write struct rseq and/or rseq_cs. Since context
>>>>> switches are asynchronous and are outside of the application control
>>>>> (not part of the restricted code scope), temporarily switch to
>>>>> pkey value that allows access to the 0 (default) PKEY.
>>>>
>>>> This is a good start, but the plan Dave and I discussed went further
>>>> than this. Those additions are needed:
>>>>
>>>> 1) Add validation at rseq registration that the struct rseq is indeed
>>>>       pkey-0 memory (return failure if not).
>>>
>>> I don't think this is worth it for multiple reasons:
>>>    - a program may first register it and then assign a key, which means
>>> we also need to check in pkey_mprotect
>>>    - pkey_mprotect may be applied to rseq of another thread, so ensuring
>>> that will require complex code with non-trivial synchronization and
>>> will add considerable overhead to pkey_mprotect call
>>>    - a program may assign non-0 pkey but have it always accessible, such
>>> programs will break by the new check
>>>    - the misuse is already detected by rseq code, and UNIX errno-based
>>> reporting is not very informative and does not add much value on top
>>> of existing reporting
>>>    - this is not different from registering rseq and then unmap'ing the
>>> memory, checking that does not look like a good idea, and checking
>>> only subset of misuses is inconsistent
>>>
>>> Based on my experience with rseq, what would be useful is reporting a
>>> meaningful siginfo for access errors (address/unique code) and fixing
>>> signal delivery. That would solve all of the above problems, and
>>> provide useful info for the user (not just confusing EINVAL from
>>> mprotect/munmap).
>>>
>>> But I would prefer to not mix these unrelated usability improvements
>>> and bug fixes with this change. That's not related to this change.
>>
>> I agree with your arguments. If Dave is OK with it, I'd be fine with
>> leaving out the pkey-0 validation on rseq registration, and eventually
>> bring meaningful siginfo access errors as future improvements.
>>
>> So the new behavior would be that both rseq and rseq_cs are required
>> to be pkey-0. If they are not and their pkey is not accessible in the
>> current context, it would trigger a segmentation fault. Ideally we'd
>> want to document this somewhere in the UAPI header.
> 
> Makes sense. I will wait for Dave comments/ack before sending v6. But
> to save a round-trip, does this look reasonable?
> 
> --- a/include/uapi/linux/rseq.h
> +++ b/include/uapi/linux/rseq.h
> @@ -58,6 +58,10 @@ struct rseq_cs {
>    * contained within a single cache-line.
>    *
>    * A single struct rseq per thread is allowed.
> + *
> + * If struct rseq or struct rseq_cs is used with Memory Protection Keys,
> + * then the assigned pkey should either be accessible whenever these structs
> + * are registered/installed, or they should be protected with pkey 0.

The wording is OK with me.

Thanks,

Mathieu

>    */
>   struct rseq {
> 
> 
> 
>>>> 2) The pkey-0 requirement is only for struct rseq, which we can check
>>>>       for at rseq registration, and happens to be the fast path. For struct
>>>>       rseq_cs, this is not the same tradeoff: we cannot easily check its
>>>>       associated pkey because the rseq_cs pointer is updated by userspace
>>>>       when entering a critical section. But the good news is that reading
>>>>       the content of struct rseq_cs is *not* a fast-path: it's only done
>>>>       when preempting/delivering a signal over a thread which has a
>>>>       non-NULL rseq_cs pointer.
>>>
>>> rseq_cs is usually accessed on a hot path since rseq_cs pointer is not
>>> cleared on critical section exit (at least that's what we do).
>>
>> Fair point.
>>
>>>
>>>>       Therefore reading the struct rseq_cs content should be done with
>>>>       write_permissive_pkey_val(), giving access to all pkeys.
>>>
>>> You just asked me to redo the code to simplify it, won't this
>>> complicate it back again? ;)
>>
>> I'm fine with the pkey-0 approach for both rseq and rseq_cs if Dave is
>> also OK with it.
> 
> It should work for my current use case, at least how I currently see
> it. Ways people use pkeys are pretty unique, so it's hard to
> extrapolate. But there is one more possibility: when a program
> switches PKEYs, it may also clear stale rseq_cs pointer from rseq.
> This way rseq_cs may have non-0 keys assigned, but they are always
> accessible while installed.
> 
> 
> 
>> Thanks,
>>
>> Mathieu
>>
>>>
>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>>
>>>>> Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
>>>>> Cc: Peter Zijlstra <peterz@...radead.org>
>>>>> Cc: "Paul E. McKenney" <paulmck@...nel.org>
>>>>> Cc: Boqun Feng <boqun.feng@...il.com>
>>>>> Cc: Thomas Gleixner <tglx@...utronix.de>
>>>>> Cc: Ingo Molnar <mingo@...hat.com>
>>>>> Cc: Borislav Petkov <bp@...en8.de>
>>>>> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
>>>>> Cc: "H. Peter Anvin" <hpa@...or.com>
>>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@...cle.com>
>>>>> Cc: x86@...nel.org
>>>>> Cc: linux-kernel@...r.kernel.org
>>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>>>>>
>>>>> ---
>>>>> Changes in v4:
>>>>>     - Added Fixes tag
>>>>>
>>>>> Changes in v3:
>>>>>     - simplify control flow to always enable access to 0 pkey
>>>>>
>>>>> Changes in v2:
>>>>>     - fixed typos and reworded the comment
>>>>> ---
>>>>>     kernel/rseq.c | 11 +++++++++++
>>>>>     1 file changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/kernel/rseq.c b/kernel/rseq.c
>>>>> index 2cb16091ec0ae..9d9c976d3b78c 100644
>>>>> --- a/kernel/rseq.c
>>>>> +++ b/kernel/rseq.c
>>>>> @@ -10,6 +10,7 @@
>>>>>
>>>>>     #include <linux/sched.h>
>>>>>     #include <linux/uaccess.h>
>>>>> +#include <linux/pkeys.h>
>>>>>     #include <linux/syscalls.h>
>>>>>     #include <linux/rseq.h>
>>>>>     #include <linux/types.h>
>>>>> @@ -402,11 +403,19 @@ static int rseq_ip_fixup(struct pt_regs *regs)
>>>>>     void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
>>>>>     {
>>>>>         struct task_struct *t = current;
>>>>> +     pkey_reg_t saved_pkey;
>>>>>         int ret, sig;
>>>>>
>>>>>         if (unlikely(t->flags & PF_EXITING))
>>>>>                 return;
>>>>>
>>>>> +     /*
>>>>> +      * Enable access to the default (0) pkey in case the thread has
>>>>> +      * currently disabled access to it and struct rseq/rseq_cs has
>>>>> +      * 0 pkey assigned (the only supported value for now).
>>>>> +      */
>>>>> +     saved_pkey = enable_zero_pkey_val();
>>>>> +
>>>>>         /*
>>>>>          * regs is NULL if and only if the caller is in a syscall path.  Skip
>>>>>          * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
>>>>> @@ -419,9 +428,11 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
>>>>>         }
>>>>>         if (unlikely(rseq_update_cpu_node_id(t)))
>>>>>                 goto error;
>>>>> +     write_pkey_val(saved_pkey);
>>>>>         return;
>>>>>
>>>>>     error:
>>>>> +     write_pkey_val(saved_pkey);
>>>>>         sig = ksig ? ksig->sig : 0;
>>>>>         force_sigsegv(sig);
>>>>>     }


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com