[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <v22gngid25vcufvdfbv3pdymq3s72c47pizr23tkrmbbyiqoe4@y5yxseh6thnf>
Date: Tue, 16 Apr 2024 10:59:34 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
jeffxu@...omium.org
Cc: keescook@...omium.org, jannh@...gle.com, sroettger@...gle.com,
willy@...radead.org, gregkh@...uxfoundation.org,
usama.anjum@...labora.com, corbet@....net, surenb@...gle.com,
merimus@...gle.com, rdunlap@...radead.org, jeffxu@...gle.com,
jorgelo@...omium.org, groeck@...omium.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-mm@...ck.org, pedro.falcato@...il.com, dave.hansen@...el.com,
linux-hardening@...r.kernel.org, deraadt@...nbsd.org
Subject: Re: [PATCH v10 2/5] mseal: add mseal syscall
* jeffxu@...omium.org <jeffxu@...omium.org> [240415 12:35]:
> From: Jeff Xu <jeffxu@...omium.org>
>
> The new mseal() is an syscall on 64 bit CPU, and with
> following signature:
>
> int mseal(void addr, size_t len, unsigned long flags)
> addr/len: memory range.
> flags: reserved.
>
> mseal() blocks following operations for the given memory range.
>
> 1> Unmapping, moving to another location, and shrinking the size,
> via munmap() and mremap(), can leave an empty space, therefore can
> be replaced with a VMA with a new set of attributes.
>
> 2> Moving or expanding a different VMA into the current location,
> via mremap().
>
> 3> Modifying a VMA via mmap(MAP_FIXED).
>
> 4> Size expansion, via mremap(), does not appear to pose any specific
> risks to sealed VMAs. It is included anyway because the use case is
> unclear. In any case, users can rely on merging to expand a sealed VMA.
>
> 5> mprotect() and pkey_mprotect().
>
> 6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
> memory, when users don't have write permission to the memory. Those
> behaviors can alter region contents by discarding pages, effectively a
> memset(0) for anonymous memory.
>
> Following input during RFC are incooperated into this patch:
>
> Jann Horn: raising awareness and providing valuable insights on the
> destructive madvise operations.
> Linus Torvalds: assisting in defining system call signature and scope.
> Liam R. Howlett: perf optimization.
> Theo de Raadt: sharing the experiences and insight gained from
> implementing mimmutable() in OpenBSD.
>
> Finally, the idea that inspired this patch comes from Stephen Röttger’s
> work in Chrome V8 CFI.
No per-vma change is checked prior to entering a per-vma modification
loop today. This means that mseal() differs in behaviour in "up-front
failure" vs "partial change failure" that exists in every other
function.
I'm not saying it's wrong or that it's right - I'm just wondering what
the direction is here. Either we should do as much up-front as
possible or keep with tradition and have (partial) success where
possible.
If you look at do_mprotect_pkey(), you can even see
map_deny_write_exec() being checked in a loop during modifications.
I think we can all agree that having some up-front and some later
without any reason will lead to a higher probability of things getting
missed.
Thanks,
Liam
Powered by blists - more mailing lists