[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRHsSxhIikzC9AAN@kernel.org>
Date: Mon, 10 Nov 2025 15:44:43 +0200
From: Mike Rapoport <rppt@...nel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Janosch Frank <frankja@...ux.ibm.com>,
Claudio Imbrenda <imbrenda@...ux.ibm.com>,
David Hildenbrand <david@...hat.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, Peter Xu <peterx@...hat.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Arnd Bergmann <arnd@...db.de>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
Lance Yang <lance.yang@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>,
Oscar Salvador <osalvador@...e.de>,
Vlastimil Babka <vbabka@...e.cz>,
Suren Baghdasaryan <surenb@...gle.com>,
Michal Hocko <mhocko@...e.com>,
Matthew Brost <matthew.brost@...el.com>,
Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
Byungchul Park <byungchul@...com>,
Gregory Price <gourry@...rry.net>,
Ying Huang <ying.huang@...ux.alibaba.com>,
Alistair Popple <apopple@...dia.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Yuanchu Xie <yuanchu@...gle.com>, Wei Xu <weixugc@...gle.com>,
Kemeng Shi <shikemeng@...weicloud.com>,
Kairui Song <kasong@...cent.com>, Nhat Pham <nphamcs@...il.com>,
Baoquan He <bhe@...hat.com>, Chris Li <chrisl@...nel.org>,
SeongJae Park <sj@...nel.org>, Matthew Wilcox <willy@...radead.org>,
Jason Gunthorpe <jgg@...pe.ca>, Leon Romanovsky <leon@...nel.org>,
Xu Xin <xu.xin16@....com.cn>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Jann Horn <jannh@...gle.com>, Miaohe Lin <linmiaohe@...wei.com>,
Naoya Horiguchi <nao.horiguchi@...il.com>,
Pedro Falcato <pfalcato@...e.de>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>,
Hugh Dickins <hughd@...gle.com>, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, linux-s390@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-arch@...r.kernel.org, damon@...ts.linux.dev
Subject: Re: [PATCH v2 01/16] mm: correctly handle UFFD PTE markers
On Mon, Nov 10, 2025 at 01:01:36PM +0000, Lorenzo Stoakes wrote:
> On Mon, Nov 10, 2025 at 01:17:37PM +0200, Mike Rapoport wrote:
> > On Sat, Nov 08, 2025 at 05:08:15PM +0000, Lorenzo Stoakes wrote:
> > > PTE markers were previously only concerned with UFFD-specific logic - that
> > > is, PTE entries with the UFFD WP marker set or those marked via
> > > UFFDIO_POISON.
> > >
> > > However since the introduction of guard markers in commit
> > > 7c53dfbdb024 ("mm: add PTE_MARKER_GUARD PTE marker"), this has no longer
> > > been the case.
> > >
> > > Issues have been avoided as guard regions are not permitted in conjunction
> > > with UFFD, but it still leaves very confusing logic in place, most notably
> > > the misleading and poorly named pte_none_mostly() and
> > > huge_pte_none_mostly().
> > >
> > > This predicate returns true for PTE entries that ought to be treated as
> > > none, but only in certain circumstances, and on the assumption we are
> > > dealing with H/W poison markers or UFFD WP markers.
> > >
> > > This patch removes these functions and makes each invocation of these
> > > functions instead explicitly check what it needs to check.
> > >
> > > As part of this effort it introduces is_uffd_pte_marker() to explicitly
> > > determine if a marker in fact is used as part of UFFD or not.
> > >
> > > In the HMM logic we note that the only time we would need to check for a
> > > fault is in the case of a UFFD WP marker, otherwise we simply encounter a
> > > fault error (VM_FAULT_HWPOISON for H/W poisoned marker, VM_FAULT_SIGSEGV
> > > for a guard marker), so only check for the UFFD WP case.
> > >
> > > While we're here we also refactor code to make it easier to understand.
> > >
> > > Reviewed-by: Vlastimil Babka <vbabka@...e.cz>
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > > ---
> > > fs/userfaultfd.c | 83 +++++++++++++++++++----------------
> > > include/asm-generic/hugetlb.h | 8 ----
> > > include/linux/swapops.h | 18 --------
> > > include/linux/userfaultfd_k.h | 21 +++++++++
> > > mm/hmm.c | 2 +-
> > > mm/hugetlb.c | 47 ++++++++++----------
> > > mm/mincore.c | 17 +++++--
> > > mm/userfaultfd.c | 27 +++++++-----
> > > 8 files changed, 123 insertions(+), 100 deletions(-)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index 54c6cc7fe9c6..04c66b5001d5 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> > > @@ -233,40 +233,46 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
> > > {
> > > struct vm_area_struct *vma = vmf->vma;
> > > pte_t *ptep, pte;
> > > - bool ret = true;
> > >
> > > assert_fault_locked(vmf);
> > >
> > > ptep = hugetlb_walk(vma, vmf->address, vma_mmu_pagesize(vma));
> > > if (!ptep)
> > > - goto out;
> > > + return true;
> > >
> > > - ret = false;
> > > pte = huge_ptep_get(vma->vm_mm, vmf->address, ptep);
> > >
> > > /*
> > > * Lockless access: we're in a wait_event so it's ok if it
> > > - * changes under us. PTE markers should be handled the same as none
> > > - * ptes here.
> > > + * changes under us.
> > > */
> > > - if (huge_pte_none_mostly(pte))
> > > - ret = true;
> > > +
> > > + /* If missing entry, wait for handler. */
> >
> > It's actually #PF handler that waits ;-)
>
> Think I meant uffd userland 'handler' as in handle_userfault(). But this is not
> clear obviously.
>
> >
> > When userfaultfd_(huge_)must_wait() return true, it means that process that
> > caused a fault should wait until userspace resolves the fault and return
> > false means that it's ok to retry the #PF.
>
> Yup.
>
> >
> > So the comment here should probably read as
> >
> > /* entry is still missing, wait for userspace to resolve the fault */
> >
>
> Will update to make clearer thanks.
>
> >
> > > + if (huge_pte_none(pte))
> > > + return true;
> > > + /* UFFD PTE markers require handling. */
> > > + if (is_uffd_pte_marker(pte))
> > > + return true;
> > > + /* If VMA has UFFD WP faults enabled and WP fault, wait for handler. */
> > > if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
> > > - ret = true;
> > > -out:
> > > - return ret;
> > > + return true;
> > > +
> > > + /* Otherwise, if entry isn't present, let fault handler deal with it. */
> >
> > Entry is actually present here, e.g because there is a thread that called
> > UFFDIO_COPY in parallel with the fault, so no need to stuck the faulting
> > process.
>
> Well it might not be? Could be a swap entry, migration entry, etc. unless I'm
> missing cases? Point of comment was 'ok if non-present in a way that doesn't
> require a userfaultfd userland handler the fault handler will deal'
>
> But anyway agree this isn't clear, probably better to just say 'otherwise no
> need for userland uffd handler to do anything here' or similar.
It's not that userspace does not need to do anything, it's just that pte is
good enough for the faulting thread to retry the page fault without waiting
for userspace to resolve the fault.
> Cheers, Lorenzo
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists