[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0ec67737-3c09-ba5c-f840-9ed02a0ea6bf@gmail.com>
Date: Mon, 23 Jan 2023 19:48:40 +0200
From: Topi Miettinen <toiwoton@...il.com>
To: Catalin Marinas <catalin.marinas@....com>,
David Hildenbrand <david@...hat.com>
Cc: Joey Gouly <joey.gouly@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Lennart Poettering <lennart@...ttering.net>,
Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl>,
Alexander Viro <viro@...iv.linux.org.uk>,
Kees Cook <keescook@...omium.org>,
Szabolcs Nagy <szabolcs.nagy@....com>,
Mark Brown <broonie@...nel.org>,
Jeremy Linton <jeremy.linton@....com>, linux-mm@...ck.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-abi-devel@...ts.sourceforge.net, nd@....com, shuah@...nel.org
Subject: Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
On 23.1.2023 18.04, Catalin Marinas wrote:
> On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote:
>> On 23.01.23 13:19, Catalin Marinas wrote:
>>> On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
>>>> On 19.01.23 17:03, Joey Gouly wrote:
>>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>>>> index 58b3abd457a3..cee1e4b566d8 100644
>>>>> --- a/include/linux/mman.h
>>>>> +++ b/include/linux/mman.h
>>>>> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>>>>> }
>>>>> unsigned long vm_commit_limit(void);
>>>>> +
>>>>> +/*
>>>>> + * Denies creating a writable executable mapping or gaining executable permissions.
>>>>> + *
>>>>> + * This denies the following:
>>>>> + *
>>>>> + * a) mmap(PROT_WRITE | PROT_EXEC)
>>>>> + *
>>>>> + * b) mmap(PROT_WRITE)
>>>>> + * mprotect(PROT_EXEC)
>>>>> + *
>>>>> + * c) mmap(PROT_WRITE)
>>>>> + * mprotect(PROT_READ)
>>>>> + * mprotect(PROT_EXEC)
>>>>> + *
>>>>> + * But allows the following:
>>>>> + *
>>>>> + * d) mmap(PROT_READ | PROT_EXEC)
>>>>> + * mmap(PROT_READ | PROT_EXEC | PROT_BTI)
>>>>> + */
>>>>
>>>> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
>>>> anymore? In an ideal world, there would be no further mprotect changes
>>>> required.
>>>
>>> I don't think it works for this scenario. We don't want to disable
>>> PROT_EXEC entirely, only disallow it if the mapping is not already
>>> executable. The below should be allowed:
>>>
>>> addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
>>>
>>> but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
>>> time.
>>
>> Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and disallow
>> VM_EXEC of course).
>
> This should work but it doesn't fully mimic systemd's MDWE behaviour
> (e.g. disallow mprotect(PROT_EXEC) even if the mmap was PROT_READ only).
> Topi wanted to stay close to that at least in the first incarnation of
> this control (can be extended later).
>
>> But I guess we'd have to go one step further: if we allow exec access
>> at mmap time, clear VM_MAYWRITE (and disallow VM_WRITE of course).
>
> Yes, both this and the VM_MAYEXEC clearing if VM_WRITE would be useful
> but as additional controls a process can enable.
>
>> That at least would be then similar to how we handle mmaped files: if the
>> file is not executable, we clear VM_MAYEXEC. If the file is not writable, we
>> clear VM_MAYWRITE.
>
> We still allow VM_MAYWRITE for private mappings, though we do clear
> VM_MAYEXEC if not executable.
>
> It would be nice to use VM_MAY* flags for this logic but we can only
> emulate MDWE if we change the semantics of 'MAY': only check the 'MAY'
> flags for permissions being changed (e.g. allow PROT_EXEC if the vma is
> already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up
> with some weird combinations like having VM_EXEC without VM_MAYEXEC
> (maybe that's fine).
>
>> Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to such
>> memory would be forbidden, which might also be what we are trying to
>> achieve, or is that expected to still work?
>
> I think currently with systemd's MDWE it still works (I haven't tried
> though), unless there's something else forcing that file read-only.
>
>> But clearing VM_MAYWRITE would mean that is_cow_mapping() would no
>> longer fire for some VMAs, and we'd have to check if that's fine in
>> all cases.
>
> This will break __access_remote_vm() AFAICT since it can't do a CoW on
> read-only private mapping.
>
>> Having that said, this patch handles the case when the prctl is applied to a
>> process after already having created some writable or executable mappings,
>> to at least forbid if afterwards on these mappings. What is expected to
>> happen if the process already has writable mappings that are executable at
>> the time we enable the prctl?
>
> They are expected to continue to work. The prctl() is meant to be
> invoked by something like systemd so that any subsequent exec() will
> inherit the property.
>
>> Clarifying what the expected semantics with /proc/self/mem are would be
>> nice.
>
> Yeah, this series doesn't handle this. Topi, do you know if systemd does
> anything about /proc/self/mem? To me this option is more about catching
> inadvertent write|exec mappings rather than blocking programs that
> insist on doing this (they can always map a memfd file twice with
> separate write and exec attributes for example).
>
I don't think so. For 100% compatibility with seccomp, the same cases of
mprotect() use should be blocked regardless of the file descriptor used.
There could be more relaxed PR_MDWE_* controls in the future if needed.
Updated systemd PR: https://github.com/systemd/systemd/pull/25276
I wish there were highly granular access controls for /proc, including
/proc/self and /proc/sys/*. Now the best options are to use mount
namespaces and/or SELinux, but they aren't too good for that.
-Topi
Powered by blists - more mailing lists