lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cea71c01-68e7-4f7f-9931-017109d95ef0@intel.com>
Date: Fri, 2 Jan 2026 08:41:50 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>,
 Lance Yang <lance.yang@...ux.dev>, akpm@...ux-foundation.org
Cc: will@...nel.org, aneesh.kumar@...nel.org, npiggin@...il.com,
 peterz@...radead.org, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, arnd@...db.de,
 lorenzo.stoakes@...cle.com, ziy@...dia.com, baolin.wang@...ux.alibaba.com,
 Liam.Howlett@...cle.com, npache@...hat.com, ryan.roberts@....com,
 dev.jain@....com, baohua@...nel.org, ioworker0@...il.com,
 shy828301@...il.com, riel@...riel.com, jannh@...gle.com,
 linux-arch@...r.kernel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/3] skip redundant TLB sync IPIs

On 12/31/25 04:33, David Hildenbrand (Red Hat) wrote:
> On 12/31/25 05:26, Dave Hansen wrote:
>> On 12/29/25 06:52, Lance Yang wrote:
>> ...
>>> This series introduces a way for architectures to indicate their TLB
>>> flush
>>> already provides full synchronization, allowing the redundant IPI to be
>>> skipped. For now, the optimization is implemented for x86 first and
>>> applied
>>> to all page table operations that free or unshare tables.
>>
>> I really don't like all the complexity here. Even on x86, there are
>> three or more ways of deriving this. Having the pv_ops check the value
>> of another pv op is also a bit unsettling.
> 
> Right. What I actually meant is that we simply have a property "bool
> flush_tlb_multi_implies_ipi_broadcast" that we set only to true from the
> initialization code.
> 
> Without comparing the pv_ops.
> 
> That should reduce the complexity quite a bit IMHO.

Yeah, that sounds promising.

> But maybe you have an even better way on how to indicate support, in a
> very simple way.

Rather than having some kind of explicit support enumeration, the other
idea I had would be to actually track the state about what needs to get
flushed somewhere. For instance, even CPUs with enabled INVLPGB support
still use IPIs sometimes. That makes the
tlb_table_flush_implies_ipi_broadcast() check a bit imperfect as is
because it will for the extra sync IPI even when INVLPGB isn't being
used for an mm.

First, we already save some semblance of support for doing different
flushes when freeing page tables mmu_gather->freed_tables. But, the call
sites in question here are for a single flush and don't use mmu_gathers.

The other pretty straightforward thing to do would be to add something
to mm->context that indicates that page tables need to be freed but
there might still be wild gup walkers out there that need an IPI. It
would get set when the page tables are modified and cleared at all the
sites where an IPIs are sent.


>> That said, complexity can be worth it with sufficient demonstrated
>> gains. But:
>>
>>> When unsharing hugetlb PMD page tables or collapsing pages in
>>> khugepaged,
>>> we send two IPIs: one for TLB invalidation, and another to synchronize
>>> with concurrent GUP-fast walkers.
>>
>> Those aren't exactly hot paths. khugepaged is fundamentally rate
>> limited. I don't think unsharing hugetlb PMD page tables just is all
>> that common either.
> 
> Given that the added IPIs during unsharing broke Oracle DBs rather badly
> [1], I think this is actually a case worth optimizing.
...
> [1] https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org

Gah, that's good context, thanks.

Are there any tests out there that might catch these this case better?
It might be something good to have 0day watch for.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ