[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1b27a3fa-359a-43d0-bdeb-c31341749367@kernel.org>
Date: Wed, 31 Dec 2025 13:33:30 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Dave Hansen <dave.hansen@...el.com>, Lance Yang <lance.yang@...ux.dev>,
akpm@...ux-foundation.org
Cc: will@...nel.org, aneesh.kumar@...nel.org, npiggin@...il.com,
peterz@...radead.org, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, arnd@...db.de,
lorenzo.stoakes@...cle.com, ziy@...dia.com, baolin.wang@...ux.alibaba.com,
Liam.Howlett@...cle.com, npache@...hat.com, ryan.roberts@....com,
dev.jain@....com, baohua@...nel.org, ioworker0@...il.com,
shy828301@...il.com, riel@...riel.com, jannh@...gle.com,
linux-arch@...r.kernel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/3] skip redundant TLB sync IPIs
On 12/31/25 05:26, Dave Hansen wrote:
> On 12/29/25 06:52, Lance Yang wrote:
> ...
>> This series introduces a way for architectures to indicate their TLB flush
>> already provides full synchronization, allowing the redundant IPI to be
>> skipped. For now, the optimization is implemented for x86 first and applied
>> to all page table operations that free or unshare tables.
>
> I really don't like all the complexity here. Even on x86, there are
> three or more ways of deriving this. Having the pv_ops check the value
> of another pv op is also a bit unsettling.
Right. What I actually meant is that we simply have a property "bool
flush_tlb_multi_implies_ipi_broadcast" that we set only to true from the
initialization code.
Without comparing the pv_ops.
That should reduce the complexity quite a bit IMHO.
But maybe you have an even better way on how to indicate support, in a
very simple way.
>
> That said, complexity can be worth it with sufficient demonstrated
> gains. But:
>
>> When unsharing hugetlb PMD page tables or collapsing pages in khugepaged,
>> we send two IPIs: one for TLB invalidation, and another to synchronize
>> with concurrent GUP-fast walkers.
>
> Those aren't exactly hot paths. khugepaged is fundamentally rate
> limited. I don't think unsharing hugetlb PMD page tables just is all
> that common either.
Given that the added IPIs during unsharing broke Oracle DBs rather badly
[1], I think this is actually a case worth optimizing.
I'd assume that the impact can be measured on a many-core/many-socket
system with an adjusted reproducer of [1]. The impact will not be as big
as what [1] fixed (we reduced the tlb_remove_table_sync_one()
invocations quite drastically).
After all, tlb_remove_table_sync_one() sends an IPI to *all* CPUs in the
system, not just the ones in the MM CPU mask, which is rather bad on
systems with a lot of CPUs. Of course, this way we can only optimize on
systems that actually send IPIs during TLB flushes.
For other systems, it will be more tricky to avoid these broadcast IPIs.
(I have the faint recollection that the IPI broadcast through
tlb_remove_table_sync_one() is a problem when called from
__tlb_remove_table_one() on RT systems ...)
[1] https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org
--
Cheers
David
Powered by blists - more mailing lists