[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <253140b0-c9b9-4ef0-8b36-af307296519b@kernel.org>
Date: Fri, 9 Jan 2026 15:11:40 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Lance Yang <lance.yang@...ux.dev>
Cc: dave.hansen@...el.com, dave.hansen@...ux.intel.com, will@...nel.org,
aneesh.kumar@...nel.org, npiggin@...il.com, peterz@...radead.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, x86@...nel.org,
hpa@...or.com, arnd@...db.de, akpm@...ux-foundation.org,
lorenzo.stoakes@...cle.com, ziy@...dia.com, baolin.wang@...ux.alibaba.com,
Liam.Howlett@...cle.com, npache@...hat.com, ryan.roberts@....com,
dev.jain@....com, baohua@...nel.org, shy828301@...il.com, riel@...riel.com,
jannh@...gle.com, linux-arch@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, ioworker0@...il.com
Subject: Re: [PATCH RESEND v3 1/2] mm/tlb: skip redundant IPI when TLB flush
already synchronized
On 1/7/26 07:37, Lance Yang wrote:
> Hi David,
>
> On 2026/1/7 00:10, Lance Yang wrote:
> [..]
>>> What could work is tracking "tlb_table_flush_sent_ipi" really when we
>>> are flushing the TLB for removed/unshared tables, and maybe resetting
>>> it ... I don't know when from the top of my head.
>>
>
> Seems like we could fix the issue that the flag lifetime was broken
> if the MMU gather gets reused by splitting the flush and reset. This
> ensures the flag stays valid between flush and sync.
>
> Now tlb_flush_unshared_tables() does:
> 1) __tlb_flush_mmu_tlbonly() - flush only, keeps flags alive
> 2) tlb_gather_remove_table_sync_one() - can check the flag
> 3) __tlb_reset_range() - reset everything after sync
>
> Something like this:
>
> ---8<---
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index 3975f7d11553..a95b054dfcca 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -415,6 +415,7 @@ static inline void __tlb_reset_range(struct
> mmu_gather *tlb)
> tlb->cleared_puds = 0;
> tlb->cleared_p4ds = 0;
> tlb->unshared_tables = 0;
> + tlb->tlb_flush_sent_ipi = 0;
As raised, the "tlb_flush_sent_ipi" is confusing when we sent to
different CPUs based on whether we are removing page tables or not.
I think you would really want to track that explicitly
"tlb_table_flush_sent_ipi" ?
> /*
> * Do not reset mmu_gather::vma_* fields here, we do not
> * call into tlb_start_vma() again to set them if there is an
> @@ -492,7 +493,7 @@ tlb_update_vma_flags(struct mmu_gather *tlb, struct
> vm_area_struct *vma)
> tlb->vma_pfn |= !!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP));
> }
>
> -static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
> +static inline void __tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
> {
> /*
> * Anything calling __tlb_adjust_range() also sets at least one of
> @@ -503,6 +504,11 @@ static inline void tlb_flush_mmu_tlbonly(struct
> mmu_gather *tlb)
> return;
>
> tlb_flush(tlb);
> +}
> +
> +static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
> +{
> + __tlb_flush_mmu_tlbonly(tlb);
> __tlb_reset_range(tlb);
> }
>
> @@ -824,7 +830,7 @@ static inline void tlb_flush_unshared_tables(struct
> mmu_gather *tlb)
> * flush the TLB for the unsharer now.
> */
> if (tlb->unshared_tables)
> - tlb_flush_mmu_tlbonly(tlb);
> + __tlb_flush_mmu_tlbonly(tlb);
>
> /*
> * Similarly, we must make sure that concurrent GUP-fast will not
> @@ -834,14 +840,16 @@ static inline void
> tlb_flush_unshared_tables(struct mmu_gather *tlb)
> * We only perform this when we are the last sharer of a page table,
> * as the IPI will reach all CPUs: any GUP-fast.
> *
> - * Note that on configs where tlb_remove_table_sync_one() is a NOP,
> - * the expectation is that the tlb_flush_mmu_tlbonly() would have issued
> - * required IPIs already for us.
> + * Use tlb_gather_remove_table_sync_one() instead of
> + * tlb_remove_table_sync_one() to skip the redundant IPI if the
> + * TLB flush above already sent one.
> */
> if (tlb->fully_unshared_tables) {
> - tlb_remove_table_sync_one();
> + tlb_gather_remove_table_sync_one(tlb);
> tlb->fully_unshared_tables = false;
> }
> +
> + __tlb_reset_range(tlb);
> }
> #endif /* CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING */
> ---
>
> For khugepaged, it should be fine - it uses a local mmu_gather that
> doesn't get reused. The lifetime is simply:
>
> tlb_gather_mmu() → flush → sync → tlb_finish_mmu()
>
> Let me know if this addresses your concern :)
I'll probably have to see the full picture. But this lifetime stuff in
core-mm ends up getting more complicated than v2 without a clear benefit
to me (except maybe handling some x86 oddities better ;) )
--
Cheers
David
Powered by blists - more mailing lists