linux-kernel - Re: PKU usage improvements for threads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <26078f2a-67be-4aa1-bbb2-dcd1168c9d12@www.fastmail.com>
Date:   Tue, 23 Aug 2022 11:24:13 -0700
From:   "Andy Lutomirski" <luto@...nel.org>
To:     "Dave Hansen" <dave.hansen@...el.com>,
        Stephen Röttger <sroettger@...gle.com>
Cc:     "Kees Cook" <keescook@...omium.org>,
        "Dave Hansen" <dave.hansen@...ux.intel.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>
Subject: Re: PKU usage improvements for threads



On Tue, Aug 23, 2022, at 11:12 AM, Dave Hansen wrote:
> On 8/23/22 04:08, Stephen Röttger wrote:
>> On Mon, Aug 22, 2022 at 11:11 PM Dave Hansen <dave.hansen@...el.com> wrote:
>>> On 8/22/22 13:40, Kees Cook wrote:
>>>> 1) It appears to be a bug that a thread without the correct PK can make
>>>> VMAs covered by a separate PK, out from under other threads. (e.g. mmap
>>>> a new mapping to wipe out the defined PK for it.) It seems that PK checks
>>>> should be made when modifying VMAs.
>>>
>>> Could you give an example of this?  Is this something along the lines of
>>> a mmap(MAP_FIXED) wiping out an earlier mapping?  Or, is it more subtle
>>> than that?
>> 
>> Yes, that's one example. And the same applies to other operations on the
>> VMA. E.g. another case we'd like to prevent would be munmap(addr) where
>> addr is covered by a pkey to which the calling thread doesn't have access
>> permissions to.
>
> Yeah, that's something for which our defenses are quite weak.  But, it
> also calls for a very generic mm/ solution and not something specific at
> all to pkeys.
>
> I assume that you wouldn't want to turn off *all* mmap(), MAP_FIXED or
> munmap() in the process.  You just want to make one or more VMAs more or
> less immutable.  That _sounds_ like a topic that would have broached at
> some point in the past, although it doesn't ring any bells.
>
> The concept would make a good lightning talk at Plumbers of LSF/MM.

This kind of thing seems questionable to me. If the attacker controls syscall arguments, they can do almost anything. ISTM a CFI scheme should aim to prevent that bogus call in the first place, e.g. by preventing a problematic call.

Which makes me think that the actual solution is to have syscall interception support changing PKRU, perhaps via sigaltstack.

>
>>>> 2) It would be very helpful to have a mechanism for the signal stack to
>>>> be PK aware, in the sense that the kernel would switch to a predefined
>>>> PK. i.e. having a new interface to sigaltstack() which includes a PK.
>>>
>>> Are you thinking that when switching to the sigaltstack that it would
>>> also pick up a specific PKRU value?  Or, that it would ensure that PKRU
>>> allows access to the sigaltstack's pkey?
>> 
>> Either of those would work for us.
>> 
>>> Logically something like this:
>>>
>>>         stack_t sas = {
>>>                 ss_sp = stack_ptr;
>>>                 ss_flags = ... flags;
>>>                 ss_size = ...;
>>>                 ss_pkey = 12;
>>>         };
>>>
>>> Then the kernel would set up RSP to point to ss_sp, and do (logically):
>>>
>>>    pkkru &= ~(3<<(12*2)); // clear Write and Access-disable for pkey-12
>>>
>>> before building the signal frame running the signal handler?
>> 
>> Yeah, that would work for our use case.
>> We also have a doc discussing this in more detail :) :
>
> That doesn't seem like it would be too much of a stretch.  There's a
> delicate point when building the stack frame that the kernel would need
> to move over to the new PKRU value to build the frame before it writes
> the *OLD* value to the frame.  But, it's far from impossible.
>
> I also bet we could do this with minimal new ABI.  There's already a
> ->ss_flags field.  We could assign a flag to mean that stack_t doesn't
> end at '->ss_size' and that there's a pkey value *after* ss_size.  I do
> think having a single pkey that is made accessible before signal entry
> is a more flexible ABI than taking an explicit PKRU value.
>
> I think that would allow just reusing sys_sigaltstack().

sys_sigaltstack() is already pretty much useless with SHSTK, and it’s kinda busted with AVX512. How about we just add a whole new non-kludgy API?