[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <c36eb441-e93e-4beb-bdf2-1e6285f7a187@www.fastmail.com>
Date: Wed, 24 Aug 2022 09:45:21 -0700
From: "Andy Lutomirski" <luto@...nel.org>
To: Stephen Röttger <sroettger@...gle.com>
Cc: "Dave Hansen" <dave.hansen@...el.com>,
"Kees Cook" <keescook@...omium.org>,
"Dave Hansen" <dave.hansen@...ux.intel.com>,
"the arch/x86 maintainers" <x86@...nel.org>,
"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
"Jann Horn" <jannh@...gle.com>
Subject: Re: PKU usage improvements for threads
On Wed, Aug 24, 2022, at 1:51 AM, Stephen Röttger wrote:
> On Tue, Aug 23, 2022 at 8:24 PM Andy Lutomirski <luto@...nel.org> wrote:
>>
>>
>>
>> On Tue, Aug 23, 2022, at 11:12 AM, Dave Hansen wrote:
>> > On 8/23/22 04:08, Stephen Röttger wrote:
>> >> On Mon, Aug 22, 2022 at 11:11 PM Dave Hansen <dave.hansen@...el.com> wrote:
>> >>> On 8/22/22 13:40, Kees Cook wrote:
>> >>>> 1) It appears to be a bug that a thread without the correct PK can make
>> >>>> VMAs covered by a separate PK, out from under other threads. (e.g. mmap
>> >>>> a new mapping to wipe out the defined PK for it.) It seems that PK checks
>> >>>> should be made when modifying VMAs.
>> >>>
>> >>> Could you give an example of this? Is this something along the lines of
>> >>> a mmap(MAP_FIXED) wiping out an earlier mapping? Or, is it more subtle
>> >>> than that?
>> >>
>> >> Yes, that's one example. And the same applies to other operations on the
>> >> VMA. E.g. another case we'd like to prevent would be munmap(addr) where
>> >> addr is covered by a pkey to which the calling thread doesn't have access
>> >> permissions to.
>> >
>> > Yeah, that's something for which our defenses are quite weak. But, it
>> > also calls for a very generic mm/ solution and not something specific at
>> > all to pkeys.
>
> We were also thinking about if this should be a more generic feature instead of
> being tied to pkeys. I.e. the doc above has an alternative proposal to introduce
> something like a memory seal/unseal syscall.
> I was personally leaning towards using pkeys for this for a few reasons:
> * intuitively it would make sense to me to extend PKEY_DISABLE_ACCESS
> to also mean disable all changes to the memory, not just the data.
> * We couldn't come up with another use case for the more generic API that
> doesn't include pkeys.
> * Fewer syscalls for our use case, since we already set the pkey on the mappings
> we want to become immutable.
> That being said, either approach works for us.
>
>> > The concept would make a good lightning talk at Plumbers of LSF/MM.
>>
>> This kind of thing seems questionable to me. If the attacker controls syscall arguments, they can do almost anything. ISTM a CFI scheme should aim to prevent that bogus call in the first place, e.g. by preventing a problematic call.
What I'm trying to say is: CFI, by itself, can protect syscalls by making sure that callers are safe. So, for example, if all munmap() callers do:
if (addr is dangerous)
abort();
else
munmap();
Then, with CFI, an attacker can't get to the actual munmap without first doing the dangerous check. And you can implement this entirely in user code.
With syscall user dispatch (this thing: https://lwn.net/Articles/828510/ -- sorry, I meant that when I typed interception), you even have a way to intercept *all* munmap() calls, for example.
Regardless, I think it would be reasonable to have a nice way to create a sigaltstack and make it harder for the stack and the associated code to get corrupted. I"m not sure that pkeys are the right solution, but a nice coherent package that makes it difficult to accidentally change what happens when a specific signal is delivered could be quite powerful.
>
> I'm not sure that CFI can solve this problem since the attacker
> doesn't change the
> control flow in this attack. We will definitely have to ensure that
> certain syscall
> arguments can't be controlled by the attacker, e.g. if the protection
> bits of an mmap
> syscall are ever spilled to the stack, that would be one way to bypass
> the mitigation
> and we have to ensure in userspace that this doesn't happen.
> However, this seems infeasible for common syscalls like munmap. If any thread
> wants to unmap a page, it's likely that the data for that came from a
> place that the
> attacker might have corrupted.
>
>> Which makes me think that the actual solution is to have syscall interception support changing PKRU, perhaps via sigaltstack.
>
> Can you expand on this? I didn't really understand what you meant.
>
>>
>> >
>> >>>> 2) It would be very helpful to have a mechanism for the signal stack to
>> >>>> be PK aware, in the sense that the kernel would switch to a predefined
>> >>>> PK. i.e. having a new interface to sigaltstack() which includes a PK.
>> >>>
>> >>> Are you thinking that when switching to the sigaltstack that it would
>> >>> also pick up a specific PKRU value? Or, that it would ensure that PKRU
>> >>> allows access to the sigaltstack's pkey?
>> >>
>> >> Either of those would work for us.
>> >>
>> >>> Logically something like this:
>> >>>
>> >>> stack_t sas = {
>> >>> ss_sp = stack_ptr;
>> >>> ss_flags = ... flags;
>> >>> ss_size = ...;
>> >>> ss_pkey = 12;
>> >>> };
>> >>>
>> >>> Then the kernel would set up RSP to point to ss_sp, and do (logically):
>> >>>
>> >>> pkkru &= ~(3<<(12*2)); // clear Write and Access-disable for pkey-12
>> >>>
>> >>> before building the signal frame running the signal handler?
>> >>
>> >> Yeah, that would work for our use case.
>> >> We also have a doc discussing this in more detail :) :
>> >
>> > That doesn't seem like it would be too much of a stretch. There's a
>> > delicate point when building the stack frame that the kernel would need
>> > to move over to the new PKRU value to build the frame before it writes
>> > the *OLD* value to the frame. But, it's far from impossible.
>> >
>> > I also bet we could do this with minimal new ABI. There's already a
>> > ->ss_flags field. We could assign a flag to mean that stack_t doesn't
>> > end at '->ss_size' and that there's a pkey value *after* ss_size. I do
>> > think having a single pkey that is made accessible before signal entry
>> > is a more flexible ABI than taking an explicit PKRU value.
>> >
>> > I think that would allow just reusing sys_sigaltstack().
>>
>> sys_sigaltstack() is already pretty much useless with SHSTK, and it’s kinda busted with AVX512. How about we just add a whole new non-kludgy API?
>
> Attachments:
> * smime.p7s
Powered by blists - more mailing lists