[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f66d89a-631a-43eb-b4f9-c9a0b44caaae@redhat.com>
Date: Fri, 25 Jul 2025 00:12:54 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: "Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Pedro Falcato <pfalcato@...e.de>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Jeff Xu <jeffxu@...omium.org>
Subject: Re: [PATCH v3 2/5] mm/mseal: update madvise() logic
On 16.07.25 19:38, Lorenzo Stoakes wrote:
> The madvise() logic is inexplicably performed in mm/mseal.c - this ought
> to be located in mm/madvise.c.
>
> Additionally can_modify_vma_madv() is inconsistently named and, in
> combination with is_ro_anon(), is very confusing logic.
>
> Put a static function in mm/madvise.c instead - can_madvise_modify() -
> that spells out exactly what's happening. Also explicitly check for an
> anon VMA.
>
> Also add commentary to explain what's going on.
>
> Essentially - we disallow discarding of data in mseal()'d mappings in
> instances where the user couldn't otherwise write to that data.
>
> Shared mappings are always backed, so no discard will actually truly
> discard the data. Read-only anonymous and MAP_PRIVATE file-backed
> mappings are the ones we are interested in.
>
> We make a change to the logic here to correct a mistake - we must disallow
> discard of read-only MAP_PRIVATE file-backed mappings, which previously we
> were not.
>
> The justification for this change is to account for the case where:
>
> 1. A MAP_PRIVATE R/W file-backed mapping is established.
> 2. The mapping is written to, which backs it with anonymous memory.
> 3. The mapping is mprotect()'d read-only.
> 4. The mapping is mseal()'d.
Thinking about this a bit (should have realized this implication
earlier) ... assuming we have:
1. A MAP_PRIVATE R/O file-backed mapping.
2. The mapping is mseal()'d.
We only really have anon folios in there with things like (a) uprobe (b)
debugger access (c) similarly weird FOLL_FORCE stuff.
Now, most executables/libraries are mapped that way. If someone would
rely on MADV_DONTNEED to zap pages in there (to free up memory), that
would get rejected.
Does something like that rely on MADV_DONTNEED working? Good question.
Checking for anon_vma in addition, ad mentioned in the other thread,
would be a "cheap" check to rule out that there are currently anon vmas
in there.
Well, not 100% reliable, because MADV_DONTNEED can race with page faults ...
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists