lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fa0d2df8-70ef-4781-9f26-e9bfbfb498df@lucifer.local>
Date: Fri, 7 Nov 2025 12:50:43 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: "David Hildenbrand (Red Hat)" <davidhildenbrandkernel@...il.com>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Ryan Roberts <ryan.roberts@....com>,
        "Garg, Shivank" <shivankg@....com>,
        Andrew Morton <akpm@...ux-foundation.org>, Zi Yan <ziy@...dia.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Nico Pache <npache@...hat.com>, Dev Jain <dev.jain@....com>,
        Barry Song <baohua@...nel.org>, Lance Yang <lance.yang@...ux.dev>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        zokeefe@...gle.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: madvise(MADV_COLLAPSE) fails with EINVAL on dirty file-backed
 text pages

On Fri, Nov 07, 2025 at 10:09:41AM +0000, Lorenzo Stoakes wrote:
> On Thu, Nov 06, 2025 at 10:05:41PM +0100, David Hildenbrand (Red Hat) wrote:
> > 		/*
> > 		 * The lock of new_folio is still held, we will be blocked in
> > 		 * the page fault path, which prevents the pte entries from
> > 		 * being set again. So even though the old empty PTE page may be
> > 		 * concurrently freed and a new PTE page is filled into the pmd
> > 		 * entry, it is still empty and can be removed.
> > 		 *
> > 		 * So here we only need to recheck if the state of pmd entry
> > 		 * still meets our requirements, rather than checking pmd_same()
> > 		 * like elsewhere.
> > 		 */
> > 		if (check_pmd_state(pmd) != SCAN_SUCCEED)
> > 			goto drop_pml;
> > 		ptl = pte_lockptr(mm, pmd);
> > 		if (ptl != pml)
> > 			spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
> >
> > 		/*
> > 		 * Huge page lock is still held, so normally the page table
> > 		 * must remain empty; and we have already skipped anon_vma
> > 		 * and userfaultfd_wp() vmas.  But since the mmap_lock is not
> > 		 * held, it is still possible for a racing userfaultfd_ioctl()
> > 		 * to have inserted ptes or markers.  Now that we hold ptlock,
> > 		 * repeating the anon_vma check protects from one category,
> > 		 * and repeating the userfaultfd_wp() check from another.
> > 		 */
> > 		if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) {
> > 			pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
> > 			pmdp_get_lockless_sync();
> > 			success = true;
> > 		}
> >
> > Given !vma->anon_vma, we cannot have anon folios in there.
> >
> > Given !userfaultfd_wp(vma), we cannot have uffd-wp markers in there.
>
> Right.
>
> >
> > Given that all folios in the range we are collapsing where unmapped, we cannot have
> > them mapped there.
> >
> > So the conclusion is that the page table must be empty and can be removed.
> >
> >
> > Could guard markers be in there?
>
> Right now guard markers only exist if vma->anon_vma is set, including the
> file-backed case.
>
> But for file-backed guard regions after my VMA sticky series this won't be the
> case any more :)
>
> So I had better go change that...
>
> I hate that we have open-coded stuff all over the place that makes assumptions
> like this.
>
> This also ignores any other marker types. How I hate the uffd wp implementation.

OK I audited all vma->anon_vma uses and _this_ is literally the only place that
is affected :)

Thanks for mentioning :P have written a self test to repro and fix will land in
v3 of the sticky VMA series.

Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ