lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKbZUD2A+=bp_sd+Q0Yif7NJqMu8p__eb4yguq0agEcmLH8SDQ@mail.gmail.com>
Date: Tue, 17 Oct 2023 16:29:58 +0100
From: Pedro Falcato <pedro.falcato@...il.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: jeffxu@...omium.org, akpm@...ux-foundation.org, keescook@...omium.org, 
	sroettger@...gle.com, jeffxu@...gle.com, jorgelo@...omium.org, 
	groeck@...omium.org, linux-kernel@...r.kernel.org, 
	linux-kselftest@...r.kernel.org, linux-mm@...ck.org, jannh@...gle.com, 
	surenb@...gle.com, alex.sierra@....com, apopple@...dia.com, 
	aneesh.kumar@...ux.ibm.com, axelrasmussen@...gle.com, ben@...adent.org.uk, 
	catalin.marinas@....com, david@...hat.com, dwmw@...zon.co.uk, 
	ying.huang@...el.com, hughd@...gle.com, joey.gouly@....com, corbet@....net, 
	wangkefeng.wang@...wei.com, Liam.Howlett@...cle.com, 
	torvalds@...ux-foundation.org, lstoakes@...il.com, mawupeng1@...wei.com, 
	linmiaohe@...wei.com, namit@...are.com, peterx@...hat.com, 
	peterz@...radead.org, ryan.roberts@....com, shr@...kernel.io, vbabka@...e.cz, 
	xiujianfeng@...wei.com, yu.ma@...el.com, zhangpeng362@...wei.com, 
	dave.hansen@...el.com, luto@...nel.org, linux-hardening@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall

On Mon, Oct 16, 2023 at 4:18 PM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Mon, Oct 16, 2023 at 02:38:19PM +0000, jeffxu@...omium.org wrote:
> > Modern CPUs support memory permissions such as RW and NX bits. Linux has
> > supported NX since the release of kernel version 2.6.8 in August 2004 [1].
>
> This seems like a confusing way to introduce the subject.  Here, you're
> talking about page permissions, whereas (as far as I can tell), mseal() is
> about making _virtual_ addresses immutable, for some value of immutable.
>
> > Memory sealing additionally protects the mapping itself against
> > modifications. This is useful to mitigate memory corruption issues where
> > a corrupted pointer is passed to a memory management syscall. For example,
> > such an attacker primitive can break control-flow integrity guarantees
> > since read-only memory that is supposed to be trusted can become writable
> > or .text pages can get remapped. Memory sealing can automatically be
> > applied by the runtime loader to seal .text and .rodata pages and
> > applications can additionally seal security critical data at runtime.
> > A similar feature already exists in the XNU kernel with the
> > VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall [4].
> > Also, Chrome wants to adopt this feature for their CFI work [2] and this
> > patchset has been designed to be compatible with the Chrome use case.
>
> This [2] seems very generic and wide-ranging, not helpful.  [5] was more
> useful to understand what you're trying to do.
>
> > The new mseal() is an architecture independent syscall, and with
> > following signature:
> >
> > mseal(void addr, size_t len, unsigned int types, unsigned int flags)
> >
> > addr/len: memory range.  Must be continuous/allocated memory, or else
> > mseal() will fail and no VMA is updated. For details on acceptable
> > arguments, please refer to comments in mseal.c. Those are also fully
> > covered by the selftest.
>
> Mmm.  So when you say "continuous/allocated" what you really mean is
> "Must have contiguous VMAs" rather than "All pages in this range must
> be populated", yes?
>
> > types: bit mask to specify which syscall to seal, currently they are:
> > MM_SEAL_MSEAL 0x1
> > MM_SEAL_MPROTECT 0x2
> > MM_SEAL_MUNMAP 0x4
> > MM_SEAL_MMAP 0x8
> > MM_SEAL_MREMAP 0x10
>
> I don't understand why we want this level of granularity.  The OpenBSD
> and XNU examples just say "This must be immutable*".  For values of
> immutable that allow downgrading access (eg RW to RO or RX to RO),
> but not upgrading access (RW->RX, RO->*, RX->RW).
>
> > Each bit represents sealing for one specific syscall type, e.g.
> > MM_SEAL_MPROTECT will deny mprotect syscall. The consideration of bitmask
> > is that the API is extendable, i.e. when needed, the sealing can be
> > extended to madvise, mlock, etc. Backward compatibility is also easy.
>
> Honestly, it feels too flexible.  Why not just two flags to mprotect()
> -- PROT_IMMUTABLE and PROT_DOWNGRADABLE.  I can see a use for that --
> maybe for some things we want to be able to downgrade and for other
> things, we don't.

I think it's worth pointing out that this suggestion (with PROT_*)
could easily integrate with mmap() and as such allow for one-shot
mmap() + mseal().
If we consider the common case as 'addr = mmap(...); mseal(addr);', it
definitely sounds like a performance win as we halve the number of
syscalls for a sealed mapping. And if we trivially look at e.g OpenBSD
ld.so code, mmap() + mimmutable() and mprotect() + mimmutable() seem
like common patterns.

-- 
Pedro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ