[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e634c94-c9b0-40d2-ad51-c91c3ec36382@efficios.com>
Date: Fri, 21 Feb 2025 14:41:32 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: peterz@...radead.org, boqun.feng@...il.com, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
aruna.ramakrishna@...cle.com, elver@...gle.com,
"Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] rseq: Make rseq work with protection keys
On 2025-02-21 06:22, Dmitry Vyukov wrote:
> On Tue, 18 Feb 2025 at 16:37, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>>
>> On Tue, 18 Feb 2025 at 16:27, Mathieu Desnoyers
>> <mathieu.desnoyers@...icios.com> wrote:
>>>
>>> On 2025-02-18 10:10, Dmitry Vyukov wrote:
>>>> On Tue, 18 Feb 2025 at 15:57, Mathieu Desnoyers
>>>> <mathieu.desnoyers@...icios.com> wrote:
>>>>>
>>>>> On 2025-02-18 02:55, Dmitry Vyukov wrote:
>>>>>> On Mon, 17 Feb 2025 at 21:21, Mathieu Desnoyers
>>>>> [...]
>>>>>>>
>>>>>>> we still let this function read the rseq_cs.
>>>>>>>> + * It's unclear what benefits the resticted code gets by doing this
>>>>>>>
>>>>>>> restricted
>>>>>>>
>>>>>>>> + * (it probably already hijacked control flow at this point), and
>>>>>>>> + * presumably any sane sandbox should prohibit restricted code
>>>>>>>> + * from accessing struct rseq, and this is still better than
>>>>>>>> + * terminating the app unconditionally (it always has a choice
>>>>>>>> + * of not using rseq and pkeys together).
>>>>>>>
>>>>>>> Note that because userspace can complete an rseq critical section
>>>>>>> without clearing the rseq_cs pointer, this could happen simply because
>>>>>>> the kernel is preempting the task after it has:
>>>>>>>
>>>>>>> 1) completed an rseq critical section, without clearing rseq_cs,
>>>>>>> 2) changed pkey.
>>>>>>>
>>>>>>> So allowing this is important, and I would remove the comment about
>>>>>>> hijacked control flow and such. This can happen with normal use of the
>>>>>>> ABI.
>>>>>>
>>>>>> Thanks for the review!
>>>>>>
>>>>>> I've addressed all comments in the series in v2.
>>>>>>
>>>>>> I've reworded this paragraph to simplify sentences, but I still kept
>>>>>> the note aboud malicious rseq_cs.
>>>>>>
>>>>>> If we would not be circumventing normal protection, then, yes, these
>>>>>> cases would be the same. But since we are circumventing protection
>>>>>> that otherwise exists, I think it's important to think about
>>>>>> potentially malicious cases. In this context inaccessible rseq_cs
>>>>>> values that resulted from normal execution are very different from
>>>>>> malicious onces. Normal ones will point to a fixed set of real
>>>>>> well-formed rseq_cs objects, while malicious ones may point to
>>>>>> god-knows-where in an attempt of an attacker to do things we can't
>>>>>> even imagine right now (e.g. rseq_cs overlapping with protected crypto
>>>>>> keys).
>>>>>>
>>>>>> It's as if a particular instance of copy_to_user would allow
>>>>>> user-space to write arbitrary kernel memory, and memory of other
>>>>>> processes circumventing all normal protections. In that context we
>>>>>> would need to be very careful regarding what we actually allow.
>>>>>
>>>>> I'm considering that we should clear the rseq_cs pointer whenever
>>>>> userspace issues pkey_mprotect.
>>>>>
>>>>> This would ensure that no legitimate scenario can trigger a load
>>>>> from a rseq_cs area which has the wrong pkey, and therefore we
>>>>> could accept read/write from/to a struct rseq which has the wrong
>>>>> pkey, but kill the process if trying to read/write from a
>>>>> struct rseq_cs with the wrong key. This would prevent userspace
>>>>> from making the kernel read/write from/to memory with the wrong
>>>>> pkey through a pointer it controls (rseq_cs pointer).
>>>>>
>>>>> Thoughts ?
>>>>
>>>> I am not following.
>>>>
>>>> There are pkey_mprotect calls, then independently installs on rseq_cs
>>>> pointers that happen concurrently and after pkey_mprotect, and
>>>> independent set of pkey_set calls that happens concurrently and after
>>>> the previous 2.
>>>> I don't see how doing something at the pkey_mprotect call for the
>>>> single thread avoids any scenarios.
>>>
>>> Hrm. Sorry, I mixed up pkey_set() vs pkey_mprotect(). What I had in mind
>>> was actually pkey_set(). And that would need to clear rseq_cs for all
>>> threads belonging to the process, which may not be straightforward
>>> because those could legitimately be inside a rseq critical section.
>>>
>>> OK, let's try another approach: rather than kill the process if
>>> read/write of the rseq_cs area with the wrong key fails, could we simply
>>> clear the rseq_cs pointer in that case ? Technically there would be no
>>> legitimate use of this except for the case where it is meant to be lazily
>>> cleared.
>>
>> This may work, but 2 concerns with this:
>> 1. We don't know if the failure happened due to pkeys or not (at least
>> not easily), and I am afraid of touching the logic for other failures.
>> If the rseq_cs was a bogus pointer, or protected with normal mprotect,
>> what does it mean? Are we masking a programming bug? Are we
>> circumventing some other protections that were supposed to lead to the
>> process termination?
>> 2. This will complicate __rseq_handle_notify_resume() logic as it
>> would need to handle failures when accessing rseq and rseq_cs
>> differently (+plus there is signature check). The more complex the
>> logic, the higher chances of adding a bug now or in future.
>
>
> Do you see any problem with the current code? What exactly is it?
> If not, can we merge it as is?
After discussion with Dave Hansen, it appears that pkey is not really
meant to be secure against a malicious actor nowadays, it's more a
page protection tool that helps identifying bad memory use.
So I don't think we need to care about the security-related concerns
I raised above.
Please see my comment in reply to Dave about making the code simpler.
Once that is done, please send an updated patch, and the rest should
be OK.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists