linux-kernel - Re: [PATCH RESEND v3 1/2] mm/tlb: skip redundant IPI when TLB flush already synchronized

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <bbfdf226-4660-4949-b17b-0d209ee4ef8c@kernel.org>
Date: Fri, 9 Jan 2026 16:40:19 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Lance Yang <lance.yang@...ux.dev>, dave.hansen@...el.com
Cc: dave.hansen@...ux.intel.com, will@...nel.org, aneesh.kumar@...nel.org,
 npiggin@...il.com, peterz@...radead.org, tglx@...utronix.de,
 mingo@...hat.com, bp@...en8.de, x86@...nel.org, hpa@...or.com,
 arnd@...db.de, akpm@...ux-foundation.org, lorenzo.stoakes@...cle.com,
 ziy@...dia.com, baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com,
 npache@...hat.com, ryan.roberts@....com, dev.jain@....com,
 baohua@...nel.org, shy828301@...il.com, riel@...riel.com, jannh@...gle.com,
 linux-arch@...r.kernel.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, ioworker0@...il.com
Subject: Re: [PATCH RESEND v3 1/2] mm/tlb: skip redundant IPI when TLB flush
 already synchronized

On 1/9/26 16:30, Lance Yang wrote:
> 
> 
> On 2026/1/9 22:13, David Hildenbrand (Red Hat) wrote:
>>
>>>> What could work is tracking "tlb_table_flush_sent_ipi" really when we
>>>> are flushing the TLB for removed/unshared tables, and maybe resetting
>>>> it ... I don't know when from the top of my head.
>>>
>>> Not sure what's the best way forward here :(
>>>
>>>>
>>>> v2 was simpler IMHO.
>>>
>>> The main concern Dave raised was that with PV hypercalls or when
>>> INVLPGB is available, we can't tell from a static check whether IPIs
>>> were actually sent.
>>
>> Why can't we set the boolean at runtime when initializing the pv_ops
>> structure, when we are sure that it is allowed?
> 
> Yes, thanks, that sounds like a reasonable trade-off :)
> 
> As you mentioned:
> 
> "this lifetime stuff in core-mm ends up getting more complicated than
> v2 without a clear benefit".
> 
> I totally agree that v3 is too complicated :(
> 
> But Dave's concern about v2 was that we can't accurately tell whether
> IPIs were actually sent in PV environments or with INVLPGB, which
> misses optimization opportunities. The INVLPGB+no_global_asid case
> also sends IPIs during TLB flush.
> 
> Anyway, yeah, I'd rather start with a simple approach, even if it's
> not perfect. We can always improve it later ;)
> 
> Any ideas on how to move forward?

I'd hope Dave can comment :)

In general, I saw the whole thing as a two step process:

1) Avoid IPIs completely when the TLB flush sent them. We can achieve
    that through v2 or v3, one-way or the other, I don't particularly
    care as long as it is clean and simple.

2) For other configs/arch, send IPIs only to CPUs that are actually in
    GUP-fast etc. That would resolve some RT headake with broadcast IPIs.


Regarding 2), it obviously only applies to setups where 1) does not 
apply: like x86 with INVLPGB or arm64.

I once had the idea of letting CPUs that enter/exit GUP-fast (and 
similar) to indicate in a global cpumask (or per-CPU variables) that 
they are in that context. Then, we can just collect these CPUs and limit 
the IPIs to them (usually, not a lot ...).

The trick here is to not slowdown GUP-fast too much. And one person 
(Yair in RT context) who played with that was not able to reduce the 
overhead sufficiently enough.

I guess the options are

a) Per-MM CPU mask we have to update atomically when entering/leaving 
GUP-fast

b) Global mask we have to update atomically when entering/leaving GUP-fast

c) Per-CPU variable we have to update when entering-leaving GUP-fast. 
Interrupts are disabled, so we don't have to worry about reschedule etc.

Maybe someone reading along has other thoughts.

-- 
Cheers

David