linux-kernel - Re: [PATCH 06/12] x86/mm: use INVLPGB for kernel TLB flushes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1409344951af9427799bd28d7865c9ea7fa87ed3.camel@surriel.com>
Date: Fri, 10 Jan 2025 00:31:23 -0500
From: Rik van Riel <riel@...riel.com>
To: Dave Hansen <dave.hansen@...el.com>, x86@...nel.org
Cc: linux-kernel@...r.kernel.org, kernel-team@...a.com, 
	dave.hansen@...ux.intel.com, luto@...nel.org, peterz@...radead.org, 
	tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com, 
	akpm@...ux-foundation.org, nadav.amit@...il.com,
 zhengqi.arch@...edance.com, 	linux-mm@...ck.org
Subject: Re: [PATCH 06/12] x86/mm: use INVLPGB for kernel TLB flushes

On Thu, 2025-01-09 at 13:18 -0800, Dave Hansen wrote:
> 
> But actually I think INVLPGB is *WAY* better than INVLPG here. 
> INVLPG
> doesn't have ranged invalidation. It will only architecturally
> invalidate multiple 4K entries when the hardware fractured them in
> the
> first place. I think we should probably take advantage of what
> INVLPGB
> can do instead of following the INVLPG approach.
> 
> INVLPGB will invalidate a range no matter where the underlying
> entries
> came from. Its "increment the virtual address at the 2M boundary"
> mode
> will invalidate entries of any size. That's my reading of the docs at
> least. Is that everyone else's reading too?

Ohhhh, good point! I glossed over that the first
half dozen times I was reading the document, because
I was trying to use the ASID, and working to figure
out why things kept crashing (turns out I can only 
use the PCID on bare metal)

> 
> So, let's pick a number "Z" which is >= invlpgb_count_max. Z could
> arguably be set to tlb_single_page_flush_ceiling. Then do this:
> 
> 	   4k -> Z*4k => use 4k step
> 	>Z*4k -> Z*2M => use 2M step
> 	>Z*2M	      => invalidate everything
> 
> Invalidations <=Z*4k are exact. They never zap extra TLB entries.
> 
> Invalidations that use the 2M step *might* unnecessarily zap some
> extra
> 4k mappings in the last 2M, but this is *WAY* better than
> invalidating
> everything.
> 
This is a great idea.

Then the code in get_flush_tlb_info can adjust
start, end, and stride_shift as needed.

INVLPGB also supports invalidation of an entire
1GB region, so we can take your idea one step
further :)

With up to 8 pages zapped by a single INVLPGB
instruction, and multiple in flight simultaneously,
maybe we could set the threshold to 64, for 8
INVLPGBs in flight at once?

That way we can invalidate up to 1/8th of a
512 entry range with individual zaps, before
just zapping the higher level entry.

> "Invalidate everything" obviously stinks, but it should only be for
> pretty darn big invalidations. 

That would only come into play when we get
past several GB worth of invalidation.

-- 
All Rights Reversed.