lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5ac97930-b5d1-4a9d-bd49-5a09c823a920@lucifer.local>
Date: Wed, 19 Nov 2025 09:27:10 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        Jonathan Corbet <corbet@....net>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Jann Horn <jannh@...gle.com>, Pedro Falcato <pfalcato@...e.de>,
        Zi Yan <ziy@...dia.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
        Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
        Lance Yang <lance.yang@...ux.dev>, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-mm@...ck.org, linux-trace-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, Andrei Vagin <avagin@...il.com>
Subject: Re: [PATCH v4 6/9] mm: set the VM_MAYBE_GUARD flag on guard region
 install

On Wed, Nov 19, 2025 at 10:16:14AM +0100, David Hildenbrand (Red Hat) wrote:
> > +/* Can we retract page tables for this file-backed VMA? */
> > +static bool file_backed_vma_is_retractable(struct vm_area_struct *vma)
>
> It's not really the VMA that is retractable :)
>
> Given that the function we are called this from is called
> "retract_page_tables" (and not file_backed_...) I guess I would just have
> called this
>
> "page_tables_are_retractable"
>
> "page_tables_support_retract"
>
> Or sth. along those lines.

Well it's specific to the VMA and it starts getting messy, this is the problem
with naming, you can_end_up_with_way_too_specific_names :)

Also you'd need to say file-baked for clarity really, and that's getting far too
long...

I think this is fine as-is given it's a static function, a user thinking 'what
does retractable mean?' can see right away in the comment immdiately above the
function name.

>
> > +{
> > +	/*
> > +	 * Check vma->anon_vma to exclude MAP_PRIVATE mappings that
> > +	 * got written to. These VMAs are likely not worth removing
> > +	 * page tables from, as PMD-mapping is likely to be split later.
> > +	 */
> > +	if (READ_ONCE(vma->anon_vma))
> > +		return false;
> > +
> > +	/*
> > +	 * When a vma is registered with uffd-wp, we cannot recycle
> > +	 * the page table because there may be pte markers installed.
> > +	 * Other vmas can still have the same file mapped hugely, but
> > +	 * skip this one: it will always be mapped in small page size
> > +	 * for uffd-wp registered ranges.
> > +	 */
> > +	if (userfaultfd_wp(vma))
> > +		return false;
> > +
> > +	/*
> > +	 * If the VMA contains guard regions then we can't collapse it.
> > +	 *
> > +	 * This is set atomically on guard marker installation under mmap/VMA
> > +	 * read lock, and here we may not hold any VMA or mmap lock at all.
> > +	 *
> > +	 * This is therefore serialised on the PTE page table lock, which is
> > +	 * obtained on guard region installation after the flag is set, so this
> > +	 * check being performed under this lock excludes races.
> > +	 */
> > +	if (vma_flag_test_atomic(vma, VM_MAYBE_GUARD_BIT))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> >   static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
> >   {
> >   	struct vm_area_struct *vma;
> > @@ -1724,14 +1761,6 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
> >   		spinlock_t *ptl;
> >   		bool success = false;
> > -		/*
> > -		 * Check vma->anon_vma to exclude MAP_PRIVATE mappings that
> > -		 * got written to. These VMAs are likely not worth removing
> > -		 * page tables from, as PMD-mapping is likely to be split later.
> > -		 */
> > -		if (READ_ONCE(vma->anon_vma))
> > -			continue;
> > -
> >   		addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
> >   		if (addr & ~HPAGE_PMD_MASK ||
> >   		    vma->vm_end < addr + HPAGE_PMD_SIZE)
> > @@ -1743,14 +1772,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
> >   		if (hpage_collapse_test_exit(mm))
> >   			continue;
> > -		/*
> > -		 * When a vma is registered with uffd-wp, we cannot recycle
> > -		 * the page table because there may be pte markers installed.
> > -		 * Other vmas can still have the same file mapped hugely, but
> > -		 * skip this one: it will always be mapped in small page size
> > -		 * for uffd-wp registered ranges.
> > -		 */
> > -		if (userfaultfd_wp(vma))
> > +
> > +		if (!file_backed_vma_is_retractable(vma))
> >   			continue;
> >   		/* PTEs were notified when unmapped; but now for the PMD? */
> > @@ -1777,15 +1800,15 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
> >   			spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
> >   		/*
> > -		 * Huge page lock is still held, so normally the page table
> > -		 * must remain empty; and we have already skipped anon_vma
> > -		 * and userfaultfd_wp() vmas.  But since the mmap_lock is not
> > -		 * held, it is still possible for a racing userfaultfd_ioctl()
> > -		 * to have inserted ptes or markers.  Now that we hold ptlock,
> > -		 * repeating the anon_vma check protects from one category,
> > -		 * and repeating the userfaultfd_wp() check from another.
> > +		 * Huge page lock is still held, so normally the page table must
> > +		 * remain empty; and we have already skipped anon_vma and
> > +		 * userfaultfd_wp() vmas.  But since the mmap_lock is not held,
> > +		 * it is still possible for a racing userfaultfd_ioctl() or
> > +		 * madvise() to have inserted ptes or markers.  Now that we hold
> > +		 * ptlock, repeating the retractable checks protects us from
> > +		 * races against the prior checks.
> >   		 */
> > -		if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) {
> > +		if (likely(file_backed_vma_is_retractable(vma))) {
> >   			pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
> >   			pmdp_get_lockless_sync();
> >   			success = true;
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 0b3280752bfb..5dbe40be7c65 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -1141,15 +1141,21 @@ static long madvise_guard_install(struct madvise_behavior *madv_behavior)
> >   		return -EINVAL;
> >   	/*
> > -	 * If we install guard markers, then the range is no longer
> > -	 * empty from a page table perspective and therefore it's
> > -	 * appropriate to have an anon_vma.
> > -	 *
> > -	 * This ensures that on fork, we copy page tables correctly.
> > +	 * Set atomically under read lock. All pertinent readers will need to
> > +	 * acquire an mmap/VMA write lock to read it. All remaining readers may
> > +	 * or may not see the flag set, but we don't care.
> > +	 */
> > +	vma_flag_set_atomic(vma, VM_MAYBE_GUARD_BIT);
> > +
>
> In general LGTM.

Thanks

>
> > +	/*
> > +	 * If anonymous and we are establishing page tables the VMA ought to
> > +	 * have an anon_vma associated with it.
>
> Do you know why? I know that as soon as we have anon folios in there we need
> it, but is it still required for guard regions? Patch #5 should handle the
> for case I guess.
>
> Which other code depends on that?

There seems to be a general convention of people seeing 'vma->anon_vma' as
meaning it has page tables, and vice-versa, for anon VMAs.

Obviously we change fork behaviour for this now with the flag, and _perhaps_
it's not necessary, but I'd rather keep this consistent for now (this is what we
were doing before) and come back to it, rather than audit the code base for
assumptions.

I'd probably like to do a patch adding vma_has_page_tables() or something that
EXPLICITLY spells this out for cases that need it.

And it's not really overhead, as there'd not be much point in guard regions if
you didn't fault in the memory (running off the end of empty range doesn't
really make snese).

The key change here is that file-backed guard regions no longer do the horrible
thing of having your file-backed VMA act as if it were MAP_PRIVATE with its own
pointless anon_vma just to get correct fork behaviour... :)

>
> --
> Cheers
>
> David

Thanks, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ