[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <692f9624-e440-4cf2-8202-861c679ddb73@lucifer.local>
Date: Fri, 25 Jul 2025 08:01:59 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Pedro Falcato <pfalcato@...e.de>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Jeff Xu <jeffxu@...omium.org>
Subject: Re: [PATCH v3 2/5] mm/mseal: update madvise() logic
On Fri, Jul 25, 2025 at 12:12:54AM +0200, David Hildenbrand wrote:
> On 16.07.25 19:38, Lorenzo Stoakes wrote:
> > The madvise() logic is inexplicably performed in mm/mseal.c - this ought
> > to be located in mm/madvise.c.
> >
> > Additionally can_modify_vma_madv() is inconsistently named and, in
> > combination with is_ro_anon(), is very confusing logic.
> >
> > Put a static function in mm/madvise.c instead - can_madvise_modify() -
> > that spells out exactly what's happening. Also explicitly check for an
> > anon VMA.
> >
> > Also add commentary to explain what's going on.
> >
> > Essentially - we disallow discarding of data in mseal()'d mappings in
> > instances where the user couldn't otherwise write to that data.
> >
> > Shared mappings are always backed, so no discard will actually truly
> > discard the data. Read-only anonymous and MAP_PRIVATE file-backed
> > mappings are the ones we are interested in.
> >
> > We make a change to the logic here to correct a mistake - we must disallow
> > discard of read-only MAP_PRIVATE file-backed mappings, which previously we
> > were not.
> >
> > The justification for this change is to account for the case where:
> >
> > 1. A MAP_PRIVATE R/W file-backed mapping is established.
> > 2. The mapping is written to, which backs it with anonymous memory.
> > 3. The mapping is mprotect()'d read-only.
> > 4. The mapping is mseal()'d.
>
> Thinking about this a bit (should have realized this implication earlier)
Well none of us did...
> ... assuming we have:
>
> 1. A MAP_PRIVATE R/O file-backed mapping.
> 2. The mapping is mseal()'d.
>
> We only really have anon folios in there with things like (a) uprobe (b)
> debugger access (c) similarly weird FOLL_FORCE stuff.
>
> Now, most executables/libraries are mapped that way. If someone would rely
> on MADV_DONTNEED to zap pages in there (to free up memory), that would get
> rejected.
Right, yes.
This is odd behaviour to me. But I guess this is what Jeff meant by 'detecting
this' in android.
The documentation is really not specific enough, we need to fix that. It's
effectively stating any anon mappings are sealed, which is just not true with
existing semantics.
However I see:
Memory sealing can automatically be applied by the runtime loader to
seal .text and .rodata pages and applications can additionally seal
security critical data at runtime.
So yes, we're going to break MADV_DONTNEED of this mappings.
BUT.
Would you really want to MADV_DONTNEED away uprobes etc.?? That seems... very
strange and broken behaviour no?
Note that, also, mappings of read-only files have VM_SHARED stripped. So they
become read-only (With ~VM_MAYWRITE).
To be clear this is where the mode of the file is read-only, not that the
mapping is read-only alone.
So with this change, we'd disallow discard of this.
It'd be pretty odd to mseal() a read-only file-backed mapping and then try to
discard, but maybe somebody would weirdly rely upon this?
It's inconsistent, as a person MAP_SHARED mapping a file that is read/write but
mapped read-only (or r/w of course), can discard fine eve if sealed, but if the
file happens to be read-only can't.
But we could add a VM_MAYWRITE check also.
OK maybe I"m softening on the anon_vma thing see below.
So we could combine these checks to avoid these issues.
>
> Does something like that rely on MADV_DONTNEED working? Good question.
Kees/Jeff? Can you check if android relies on this?
>
> Checking for anon_vma in addition, ad mentioned in the other thread, would
> be a "cheap" check to rule out that there are currently anon vmas in there.
>
> Well, not 100% reliable, because MADV_DONTNEED can race with page faults ...
But hang on, it's read-only so we shouldn't get racing faults... right?
Hmm maybe I'll soften on this anon_vma idea then. Maybe it is a 'cheap fix' to
rule out the _usual_ cases.
We're not changing zapping logic for this though, sorry. That's just a crazy
length to go to.
>
> --
> Cheers,
>
> David / dhildenb
>
In any case, I'm going to send a version of this with the controversial bit
stripped so we (hopefully) land the refactorings for 6.17 and can change
semantics if necessary for 6.18.
Cheers, Lorenzo
Powered by blists - more mailing lists