[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <053f3a04-9195-4f8d-8959-42e0c3ba077b@lucifer.local>
Date: Thu, 30 Oct 2025 16:43:57 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Pedro Falcato <pfalcato@...e.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        Jonathan Corbet <corbet@....net>, David Hildenbrand <david@...hat.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Jann Horn <jannh@...gle.com>, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-mm@...ck.org, linux-trace-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, Andrei Vagin <avagin@...il.com>
Subject: Re: [PATCH 1/3] mm: introduce VM_MAYBE_GUARD and make visible for
 guard regions
On Thu, Oct 30, 2025 at 04:31:56PM +0000, Pedro Falcato wrote:
> On Thu, Oct 30, 2025 at 04:23:58PM +0000, Lorenzo Stoakes wrote:
> > On Thu, Oct 30, 2025 at 04:16:20PM +0000, Pedro Falcato wrote:
> > > On Wed, Oct 29, 2025 at 04:50:31PM +0000, Lorenzo Stoakes wrote:
> > > > Currently, if a user needs to determine if guard regions are present in a
> > > > range, they have to scan all VMAs (or have knowledge of which ones might
> > > > have guard regions).
> > > >
> > > > Since commit 8e2f2aeb8b48 ("fs/proc/task_mmu: add guard region bit to
> > > > pagemap") and the related commit a516403787e0 ("fs/proc: extend the
> > > > PAGEMAP_SCAN ioctl to report guard regions"), users can use either
> > > > /proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this
> > > > operation at a virtual address level.
> > > >
> > > > This is not ideal, and it gives no visibility at a /proc/$pid/smaps level
> > > > that guard regions exist in ranges.
> > > >
> > > > This patch remedies the situation by establishing a new VMA flag,
> > > > VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is
> > > > uncertain because we cannot reasonably determine whether a
> > > > MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and
> > > > additionally VMAs may change across merge/split).
> > > >
> > > > We utilise 0x800 for this flag which makes it available to 32-bit
> > > > architectures also, a flag that was previously used by VM_DENYWRITE, which
> > > > was removed in commit 8d0920bde5eb ("mm: remove VM_DENYWRITE") and hasn't
> > > > bee reused yet.
> > > >
> > > > The MADV_GUARD_INSTALL madvise() operation now must take an mmap write
> > > > lock (and also VMA write lock) whereas previously it did not, but this
> > > > seems a reasonable overhead.
> > >
> > > Do you though? Could it be possible to simply atomically set the flag with
> > > the read lock held? This would make it so we can't split the VMA (and tightly
> >
> > VMA flags are not accessed atomically so no I don't think we can do that in any
> > workable way.
> >
>
> FWIW I think you could work it as an atomic flag and treat those races as benign
> (this one, at least).
It's not benign as we need to ensure that page tables are correctly propagated
on fork.
>
> > I also don't think it's at all necessary, see below.
> >
> > > define what "may have a guard page"), but it sounds much better than introducing
> > > lock contention. I don't think it is reasonable to add a write lock to a feature
> > > that may be used by such things as thread stack allocation, malloc, etc.
> >
> > What lock contention? It's per-VMA so the contention is limited to the VMA in
> > question, and only over the span of time you are setting the gaurd region.
>
> Don't we always need to take the mmap write lock when grabbing a VMA write
> lock as well?
Yup. But at the same time you're doing the kind of operations that'd use this
you'd already be taking the lock anyway.
You don't hold it for long and you won't be doing this any more often than you'd
be doing other write operations, which you're also not going to be holding up
faults on other VMAs either (they can access other VMAs despite mmap write lock
being held), so I don't think there's ay issue here.
>
> > When allocating thread stacks you'll be mapping things into memory which... take
> > the write lock. malloc() if it goes to the kernel will also take the write lock.
> >
>
> But yes, good point, you're already serializing anyway. I don't think this is
> a big deal.
Indeed
>
> > So I think you're overly worried about an operation that a. isn't going to be
> > something that happens all that often, b. when it does, it's at a time when
> > you'd be taking write locks anyway and c. won't contend important stuff like
> > page faults for any VMA other than the one having the the guard region
> > installed.
>
> Yep, thanks.
No problemo, you can get yourself into sticky situations with lock contention
but I think this is not one! :)
>
> --
> Pedro
Cheers, Lorenzo
Powered by blists - more mailing lists
 
