linux-kernel - Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8b4e31cf-de20-703c-4b53-ad86d4282a37@redhat.com>
Date:   Mon, 23 Jan 2023 13:53:46 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Joey Gouly <joey.gouly@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Lennart Poettering <lennart@...ttering.net>,
        Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Kees Cook <keescook@...omium.org>,
        Szabolcs Nagy <szabolcs.nagy@....com>,
        Mark Brown <broonie@...nel.org>,
        Jeremy Linton <jeremy.linton@....com>,
        Topi Miettinen <toiwoton@...il.com>, linux-mm@...ck.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-abi-devel@...ts.sourceforge.net, nd@....com, shuah@...nel.org
Subject: Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl

On 23.01.23 13:19, Catalin Marinas wrote:
> On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
>> On 19.01.23 17:03, Joey Gouly wrote:
>>> The aim of such policy is to prevent a user task from creating an
>>> executable mapping that is also writeable.
>>>
>>> An example of mmap() returning -EACCESS if the policy is enabled:
>>>
>>> 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
>>>
>>> Similarly, mprotect() would return -EACCESS below:
>>>
>>> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
>>>
>>> The BPF filter that systemd MDWE uses is stateless, and disallows
>>> mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
>>> be enabled if it was already PROT_EXEC, which allows the following case:
>>>
>>> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
>>>
>>> where PROT_BTI enables branch tracking identification on arm64.
>>>
>>> Signed-off-by: Joey Gouly <joey.gouly@....com>
>>> Co-developed-by: Catalin Marinas <catalin.marinas@....com>
>>> Signed-off-by: Catalin Marinas <catalin.marinas@....com>
>>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>>> ---
>>>    include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
>>>    include/linux/sched/coredump.h |  6 +++++-
>>>    include/uapi/linux/prctl.h     |  6 ++++++
>>>    kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
>>>    mm/mmap.c                      | 10 ++++++++++
>>>    mm/mprotect.c                  |  5 +++++
>>>    6 files changed, 93 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>> index 58b3abd457a3..cee1e4b566d8 100644
>>> --- a/include/linux/mman.h
>>> +++ b/include/linux/mman.h
>>> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>>>    }
>>>    unsigned long vm_commit_limit(void);
>>> +
>>> +/*
>>> + * Denies creating a writable executable mapping or gaining executable permissions.
>>> + *
>>> + * This denies the following:
>>> + *
>>> + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
>>> + *
>>> + *	b)	mmap(PROT_WRITE)
>>> + *		mprotect(PROT_EXEC)
>>> + *
>>> + *	c)	mmap(PROT_WRITE)
>>> + *		mprotect(PROT_READ)
>>> + *		mprotect(PROT_EXEC)
>>> + *
>>> + * But allows the following:
>>> + *
>>> + *	d)	mmap(PROT_READ | PROT_EXEC)
>>> + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
>>> + */
>>
>> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
>> anymore? In an ideal world, there would be no further mprotect changes
>> required.
> 
> I don't think it works for this scenario. We don't want to disable
> PROT_EXEC entirely, only disallow it if the mapping is not already
> executable. The below should be allowed:
> 
> 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> 
> but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
> time.

Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and 
disallow VM_EXEC of course). But I guess we'd have to go one step 
further: if we allow exec access at mmap time, clear VM_MAYWRITE (and 
disallow VM_WRITE of course).

That at least would be then similar to how we handle mmaped files: if 
the file is not executable, we clear VM_MAYEXEC. If the file is not 
writable, we clear VM_MAYWRITE.


Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to 
such memory would be forbidden, which might also be what we are trying 
to achieve, or is that expected to still work? But clearing VM_MAYWRITE 
would mean that is_cow_mapping() would no longer fire for some VMAs, and 
we'd have to check if that's fine in all cases.


Having that said, this patch handles the case when the prctl is applied 
to a process after already having created some writable or executable 
mappings, to at least forbid if afterwards on these mappings. What is 
expected to happen if the process already has writable mappings that are 
executable at the time we enable the prctl?

Clarifying what the expected semantics with /proc/self/mem are would be 
nice.

> 
> We could clear VM_MAYEXEC if the mapping was made VM_WRITE (either by
> mmap() or mprotect()) but IIRC we concluded that this should be an
> additional prctl() flag.

Yes, I agree with enabling this restriction on a per-process level. My 
remarks only apply to processes with this toggle enabled.

-- 
Thanks,

David / dhildenb