[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fa0d2df8-70ef-4781-9f26-e9bfbfb498df@lucifer.local>
Date: Fri, 7 Nov 2025 12:50:43 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: "David Hildenbrand (Red Hat)" <davidhildenbrandkernel@...il.com>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Ryan Roberts <ryan.roberts@....com>,
"Garg, Shivank" <shivankg@....com>,
Andrew Morton <akpm@...ux-foundation.org>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
Nico Pache <npache@...hat.com>, Dev Jain <dev.jain@....com>,
Barry Song <baohua@...nel.org>, Lance Yang <lance.yang@...ux.dev>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
zokeefe@...gle.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: madvise(MADV_COLLAPSE) fails with EINVAL on dirty file-backed
text pages
On Fri, Nov 07, 2025 at 10:09:41AM +0000, Lorenzo Stoakes wrote:
> On Thu, Nov 06, 2025 at 10:05:41PM +0100, David Hildenbrand (Red Hat) wrote:
> > /*
> > * The lock of new_folio is still held, we will be blocked in
> > * the page fault path, which prevents the pte entries from
> > * being set again. So even though the old empty PTE page may be
> > * concurrently freed and a new PTE page is filled into the pmd
> > * entry, it is still empty and can be removed.
> > *
> > * So here we only need to recheck if the state of pmd entry
> > * still meets our requirements, rather than checking pmd_same()
> > * like elsewhere.
> > */
> > if (check_pmd_state(pmd) != SCAN_SUCCEED)
> > goto drop_pml;
> > ptl = pte_lockptr(mm, pmd);
> > if (ptl != pml)
> > spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
> >
> > /*
> > * Huge page lock is still held, so normally the page table
> > * must remain empty; and we have already skipped anon_vma
> > * and userfaultfd_wp() vmas. But since the mmap_lock is not
> > * held, it is still possible for a racing userfaultfd_ioctl()
> > * to have inserted ptes or markers. Now that we hold ptlock,
> > * repeating the anon_vma check protects from one category,
> > * and repeating the userfaultfd_wp() check from another.
> > */
> > if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) {
> > pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
> > pmdp_get_lockless_sync();
> > success = true;
> > }
> >
> > Given !vma->anon_vma, we cannot have anon folios in there.
> >
> > Given !userfaultfd_wp(vma), we cannot have uffd-wp markers in there.
>
> Right.
>
> >
> > Given that all folios in the range we are collapsing where unmapped, we cannot have
> > them mapped there.
> >
> > So the conclusion is that the page table must be empty and can be removed.
> >
> >
> > Could guard markers be in there?
>
> Right now guard markers only exist if vma->anon_vma is set, including the
> file-backed case.
>
> But for file-backed guard regions after my VMA sticky series this won't be the
> case any more :)
>
> So I had better go change that...
>
> I hate that we have open-coded stuff all over the place that makes assumptions
> like this.
>
> This also ignores any other marker types. How I hate the uffd wp implementation.
OK I audited all vma->anon_vma uses and _this_ is literally the only place that
is affected :)
Thanks for mentioning :P have written a self test to repro and fix will land in
v3 of the sticky VMA series.
Cheers, Lorenzo
Powered by blists - more mailing lists