[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1409344951af9427799bd28d7865c9ea7fa87ed3.camel@surriel.com>
Date: Fri, 10 Jan 2025 00:31:23 -0500
From: Rik van Riel <riel@...riel.com>
To: Dave Hansen <dave.hansen@...el.com>, x86@...nel.org
Cc: linux-kernel@...r.kernel.org, kernel-team@...a.com,
dave.hansen@...ux.intel.com, luto@...nel.org, peterz@...radead.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com,
akpm@...ux-foundation.org, nadav.amit@...il.com,
zhengqi.arch@...edance.com, linux-mm@...ck.org
Subject: Re: [PATCH 06/12] x86/mm: use INVLPGB for kernel TLB flushes
On Thu, 2025-01-09 at 13:18 -0800, Dave Hansen wrote:
>
> But actually I think INVLPGB is *WAY* better than INVLPG here.
> INVLPG
> doesn't have ranged invalidation. It will only architecturally
> invalidate multiple 4K entries when the hardware fractured them in
> the
> first place. I think we should probably take advantage of what
> INVLPGB
> can do instead of following the INVLPG approach.
>
> INVLPGB will invalidate a range no matter where the underlying
> entries
> came from. Its "increment the virtual address at the 2M boundary"
> mode
> will invalidate entries of any size. That's my reading of the docs at
> least. Is that everyone else's reading too?
Ohhhh, good point! I glossed over that the first
half dozen times I was reading the document, because
I was trying to use the ASID, and working to figure
out why things kept crashing (turns out I can only
use the PCID on bare metal)
>
> So, let's pick a number "Z" which is >= invlpgb_count_max. Z could
> arguably be set to tlb_single_page_flush_ceiling. Then do this:
>
> 4k -> Z*4k => use 4k step
> >Z*4k -> Z*2M => use 2M step
> >Z*2M => invalidate everything
>
> Invalidations <=Z*4k are exact. They never zap extra TLB entries.
>
> Invalidations that use the 2M step *might* unnecessarily zap some
> extra
> 4k mappings in the last 2M, but this is *WAY* better than
> invalidating
> everything.
>
This is a great idea.
Then the code in get_flush_tlb_info can adjust
start, end, and stride_shift as needed.
INVLPGB also supports invalidation of an entire
1GB region, so we can take your idea one step
further :)
With up to 8 pages zapped by a single INVLPGB
instruction, and multiple in flight simultaneously,
maybe we could set the threshold to 64, for 8
INVLPGBs in flight at once?
That way we can invalidate up to 1/8th of a
512 entry range with individual zaps, before
just zapping the higher level entry.
> "Invalidate everything" obviously stinks, but it should only be for
> pretty darn big invalidations.
That would only come into play when we get
past several GB worth of invalidation.
--
All Rights Reversed.
Powered by blists - more mailing lists