linux-kernel - Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y857Uoq7fjO0lZ12@arm.com>
Date:   Mon, 23 Jan 2023 12:19:30 +0000
From:   Catalin Marinas <catalin.marinas@....com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Joey Gouly <joey.gouly@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Lennart Poettering <lennart@...ttering.net>,
        Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Kees Cook <keescook@...omium.org>,
        Szabolcs Nagy <szabolcs.nagy@....com>,
        Mark Brown <broonie@...nel.org>,
        Jeremy Linton <jeremy.linton@....com>,
        Topi Miettinen <toiwoton@...il.com>, linux-mm@...ck.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-abi-devel@...ts.sourceforge.net, nd@....com, shuah@...nel.org
Subject: Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl

On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
> On 19.01.23 17:03, Joey Gouly wrote:
> > The aim of such policy is to prevent a user task from creating an
> > executable mapping that is also writeable.
> > 
> > An example of mmap() returning -EACCESS if the policy is enabled:
> > 
> > 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> > 
> > Similarly, mprotect() would return -EACCESS below:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> > 
> > The BPF filter that systemd MDWE uses is stateless, and disallows
> > mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
> > be enabled if it was already PROT_EXEC, which allows the following case:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> > 
> > where PROT_BTI enables branch tracking identification on arm64.
> > 
> > Signed-off-by: Joey Gouly <joey.gouly@....com>
> > Co-developed-by: Catalin Marinas <catalin.marinas@....com>
> > Signed-off-by: Catalin Marinas <catalin.marinas@....com>
> > Cc: Andrew Morton <akpm@...ux-foundation.org>
> > ---
> >   include/linux/mman.h           | 34 ++++++++++++++++++++++++++++++++++
> >   include/linux/sched/coredump.h |  6 +++++-
> >   include/uapi/linux/prctl.h     |  6 ++++++
> >   kernel/sys.c                   | 33 +++++++++++++++++++++++++++++++++
> >   mm/mmap.c                      | 10 ++++++++++
> >   mm/mprotect.c                  |  5 +++++
> >   6 files changed, 93 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 58b3abd457a3..cee1e4b566d8 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
> >   }
> >   unsigned long vm_commit_limit(void);
> > +
> > +/*
> > + * Denies creating a writable executable mapping or gaining executable permissions.
> > + *
> > + * This denies the following:
> > + *
> > + * 	a)	mmap(PROT_WRITE | PROT_EXEC)
> > + *
> > + *	b)	mmap(PROT_WRITE)
> > + *		mprotect(PROT_EXEC)
> > + *
> > + *	c)	mmap(PROT_WRITE)
> > + *		mprotect(PROT_READ)
> > + *		mprotect(PROT_EXEC)
> > + *
> > + * But allows the following:
> > + *
> > + *	d)	mmap(PROT_READ | PROT_EXEC)
> > + *		mmap(PROT_READ | PROT_EXEC | PROT_BTI)
> > + */
> 
> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
> anymore? In an ideal world, there would be no further mprotect changes
> required.

I don't think it works for this scenario. We don't want to disable
PROT_EXEC entirely, only disallow it if the mapping is not already
executable. The below should be allowed:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);

but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
time.

We could clear VM_MAYEXEC if the mapping was made VM_WRITE (either by
mmap() or mprotect()) but IIRC we concluded that this should be an
additional prctl() flag. This series aims to be pretty much a drop-in
replacement for the systemd's MDWE SECCOMP feature (but allowing
PROT_BTI).

-- 
Catalin