[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bbfdf226-4660-4949-b17b-0d209ee4ef8c@kernel.org>
Date: Fri, 9 Jan 2026 16:40:19 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Lance Yang <lance.yang@...ux.dev>, dave.hansen@...el.com
Cc: dave.hansen@...ux.intel.com, will@...nel.org, aneesh.kumar@...nel.org,
npiggin@...il.com, peterz@...radead.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, x86@...nel.org, hpa@...or.com,
arnd@...db.de, akpm@...ux-foundation.org, lorenzo.stoakes@...cle.com,
ziy@...dia.com, baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com,
npache@...hat.com, ryan.roberts@....com, dev.jain@....com,
baohua@...nel.org, shy828301@...il.com, riel@...riel.com, jannh@...gle.com,
linux-arch@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, ioworker0@...il.com
Subject: Re: [PATCH RESEND v3 1/2] mm/tlb: skip redundant IPI when TLB flush
already synchronized
On 1/9/26 16:30, Lance Yang wrote:
>
>
> On 2026/1/9 22:13, David Hildenbrand (Red Hat) wrote:
>>
>>>> What could work is tracking "tlb_table_flush_sent_ipi" really when we
>>>> are flushing the TLB for removed/unshared tables, and maybe resetting
>>>> it ... I don't know when from the top of my head.
>>>
>>> Not sure what's the best way forward here :(
>>>
>>>>
>>>> v2 was simpler IMHO.
>>>
>>> The main concern Dave raised was that with PV hypercalls or when
>>> INVLPGB is available, we can't tell from a static check whether IPIs
>>> were actually sent.
>>
>> Why can't we set the boolean at runtime when initializing the pv_ops
>> structure, when we are sure that it is allowed?
>
> Yes, thanks, that sounds like a reasonable trade-off :)
>
> As you mentioned:
>
> "this lifetime stuff in core-mm ends up getting more complicated than
> v2 without a clear benefit".
>
> I totally agree that v3 is too complicated :(
>
> But Dave's concern about v2 was that we can't accurately tell whether
> IPIs were actually sent in PV environments or with INVLPGB, which
> misses optimization opportunities. The INVLPGB+no_global_asid case
> also sends IPIs during TLB flush.
>
> Anyway, yeah, I'd rather start with a simple approach, even if it's
> not perfect. We can always improve it later ;)
>
> Any ideas on how to move forward?
I'd hope Dave can comment :)
In general, I saw the whole thing as a two step process:
1) Avoid IPIs completely when the TLB flush sent them. We can achieve
that through v2 or v3, one-way or the other, I don't particularly
care as long as it is clean and simple.
2) For other configs/arch, send IPIs only to CPUs that are actually in
GUP-fast etc. That would resolve some RT headake with broadcast IPIs.
Regarding 2), it obviously only applies to setups where 1) does not
apply: like x86 with INVLPGB or arm64.
I once had the idea of letting CPUs that enter/exit GUP-fast (and
similar) to indicate in a global cpumask (or per-CPU variables) that
they are in that context. Then, we can just collect these CPUs and limit
the IPIs to them (usually, not a lot ...).
The trick here is to not slowdown GUP-fast too much. And one person
(Yair in RT context) who played with that was not able to reduce the
overhead sufficiently enough.
I guess the options are
a) Per-MM CPU mask we have to update atomically when entering/leaving
GUP-fast
b) Global mask we have to update atomically when entering/leaving GUP-fast
c) Per-CPU variable we have to update when entering-leaving GUP-fast.
Interrupts are disabled, so we don't have to worry about reschedule etc.
Maybe someone reading along has other thoughts.
--
Cheers
David
Powered by blists - more mailing lists