lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4037214b-d1c6-4e0b-ad9d-6722aea7aba9@lucifer.local>
Date: Tue, 16 Dec 2025 10:45:43 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
        linux-mm@...ck.org, Will Deacon <will@...nel.org>,
        "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Nick Piggin <npiggin@...il.com>, Peter Zijlstra <peterz@...radead.org>,
        Arnd Bergmann <arnd@...db.de>, Muchun Song <muchun.song@...ux.dev>,
        Oscar Salvador <osalvador@...e.de>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Pedro Falcato <pfalcato@...e.de>, Rik van Riel <riel@...riel.com>,
        Harry Yoo <harry.yoo@...cle.com>,
        Laurence Oberman <loberman@...hat.com>,
        Prakash Sangappa <prakash.sangappa@...cle.com>,
        Nadav Amit <nadav.amit@...il.com>, stable@...r.kernel.org
Subject: Re: [PATCH v1 4/4] mm/hugetlb: fix excessive IPI broadcasts when
 unsharing PMD tables using mmu_gather

On Mon, Dec 15, 2025 at 03:47:57PM +0100, David Hildenbrand (Red Hat) wrote:
>
> > > >
> > > > As Nadav points out, should also initialise fully_unshared_tables.
> > >
> > > Right, but on an earlier init path, not on the range reset path here.
> >
> > Shouldn't we reset it also?
> >
> > I mean __tlb_reset_range() is also called by __tlb_gather_mmu() (invoked by
> > tlb_gather_mmu[_fullmm]()).
> >
>
> __tlb_reset_range() is all about flushing the TLB.
>
> In v2 I have:

(Was waiting for reply here before looking there of course :)

>
> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> index 247e3f9db6c7a..030a162a263ba 100644
> --- a/mm/mmu_gather.c
> +++ b/mm/mmu_gather.c
> @@ -426,6 +426,7 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
>  #endif
>         tlb->vma_pfn = 0;
> +       tlb->fully_unshared_tables = 0;
>         __tlb_reset_range(tlb);
>         inc_tlb_flush_pending(tlb->mm);
>  }
>

OK so I guess since this isn't tied to flushing the TLB, but rather only to the
IPI we're good.

>
> > >
> > > >
> > > > >    	/*
> > > > >    	 * Do not reset mmu_gather::vma_* fields here, we do not
> > > > >    	 * call into tlb_start_vma() again to set them if there is an
> > > > > @@ -484,7 +496,7 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
> > > > >    	 * these bits.
> > > > >    	 */
> > > > >    	if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
> > > > > -	      tlb->cleared_puds || tlb->cleared_p4ds))
> > > > > +	      tlb->cleared_puds || tlb->cleared_p4ds || tlb->unshared_tables))
> > > >
> > > > What about fully_unshared_tables? I guess though unshared_tables implies
> > > > fully_unshared_tables.
> > >
> > > fully_unshared_tables is only for triggering IPIs and consequently not about
> > > flushing TLBs.
> > >
> > > The TLB part is taken care of by unshared_tables, and we will always set
> > > unshared_tables when unsharing any page tables (incl. fully unshared ones).
> >
> > OK, so is there ever a situation where fully_unshared_tables would be set
> > without unshared_tables? Presumably not.
>
> tlb_unshare_pmd_ptdesc() will always set "unshared_tables" but only conditionally
> sets "fully_unshared_tables".

Of course.

>
> "unshared_tables" might get handled by a prior TLB flush,
> leaving only fully_unshared_tables set to perform the IPI in tlb_flush_unshared_tables().

I see, right, so we can in fact have one set without the other.

But this logic pertains to the TLB flush so we're ok.

>
> So the important part is that whenever we unshare, we set "unshared_tables".

Yes.

>
> [...]
>
> > > > > +{
> > > > > +	/*
> > > > > +	 * As soon as the caller drops locks to allow for reuse of
> > > > > +	 * previously-shared tables, these tables could get modified and
> > > > > +	 * even reused outside of hugetlb context. So flush the TLB now.
> > > >
> > > > Hmm but you're doing this in both the case of unshare and fully unsharing, so is
> > > > this the right place to make this comment?
> > >
> > > That's why I start the comment below with "Similarly", to make it clear that
> > > the comments build up on each other.
> > >
> > > But I'm afraid I might not be getting your point fully here :/
> >
> > what I mean is, if we are not at the point of the table being fully unshared,
> > nobody else can come in and reuse it right? Because we're still using it, just
> > dropped a ref + flushed tlb?
>
> After we drop the lock, someone else could fully unshare it. And that other (MM) would
> not be able to flush the TLB for us -- in contrast to the IPI that would affect all
> CPUs.

Ah so it's about TLB flushing for _us_ whereas the other user who fully unshares
will flush for _them_ which may not affect the CPUs we're using.

Yeah what fun this is :)

>
> >
> > Isn't really the correct comment here that ranges that previously mapped the
> > shared pages might no longer, so we must clear the TLB? I may be missing
> > something :)
>
> There are cases where we defer flushing the TLB until we dropped all (exclusive) locks.
> In particular, MADV_DONTNEED does that in some cases, essentially deferring the flush
> to the tlb_finish_mmu().
>
> free_pgtables() will also defer the flush, performing the TLB flush during tlb_finish_mmu(),
> before
>
> The point is (as I tried to make clear in the comment), for unsharing we have no control
> whenn the page table gets freed after we drop the lock.
>
> So we must flush the TLB now and cannot defer it like we do in the other cases.

Yeah I guess because of the above - that is - other users may unshare for their
CPUs but not unshare for ours?

>
> >
> > Or maybe the right thing is 'we must always flush the TLB because <blahdyblah>,
> > and if we are fully unsharing tables we must avoid reuse of previously-shared
> > tables when the caller drops the locks' or something?
>
> I hope the description above made it clearer why I spell out that the TLB must be flushed
> now.

Yeah I think so thanks :)

>
> >
> > >
> > > >
> > > > Surely here this is about flushing TLBs for the unsharer only as it no longer
> > > > uses it?
> > > >
> > > > > +	 *
> > > > > +	 * Note that we cannot defer the flush to a later point even if we are
> > > > > +	 * not the last sharer of the page table.
> > > > > +	 */
> > > >
> > > > Not hugely clear, some double negative here. Maybe worth saying something like:
> > > >
> > > > 'Even if we are not fully unsharing a PMD table, we must flush the TLB for the
> > > > unsharer who no longer has access to this memory'
> > > >
> > > > Or something? Assuming this is accurate :)
> > >
> > > I'll adjust it to "Not that even if we are not fully unsharing a PMD table,
> > > we must flush the TLB for the unsharer now.".
> >
> > I guess you mean Note or that's even more confusing :P
>
> :) Yeah, I did that in v2.

Thanks :)

>
> --
> Cheers
>
> David

Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ