linux-kernel - Re: [PATCH v11 08/12] x86/mm: use broadcast TLB flushing for page reclaim TLB flushing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cfa6e15f-3fe5-42bc-a877-fd46bb3c9f88@intel.com>
Date: Fri, 14 Feb 2025 10:51:58 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: Rik van Riel <riel@...riel.com>, x86@...nel.org
Cc: linux-kernel@...r.kernel.org, bp@...en8.de, peterz@...radead.org,
 dave.hansen@...ux.intel.com, zhengqi.arch@...edance.com,
 nadav.amit@...il.com, thomas.lendacky@....com, kernel-team@...a.com,
 linux-mm@...ck.org, akpm@...ux-foundation.org, jackmanb@...gle.com,
 jannh@...gle.com, mhklinux@...look.com, andrew.cooper3@...rix.com,
 Manali Shukla <Manali.Shukla@....com>
Subject: Re: [PATCH v11 08/12] x86/mm: use broadcast TLB flushing for page
 reclaim TLB flushing

On 2/13/25 08:13, Rik van Riel wrote:
> In the page reclaim code, we only track the CPU(s) where the TLB needs
> to be flushed, rather than all the individual mappings that may be getting
> invalidated.
> 
> Use broadcast TLB flushing when that is available.

The changelog here is a little light. This patch is doing a *ton* of stuff.

The existing code has two cases where it is doing a full TLB flush, not
a ranged flush.

	1. An actual IPI to some CPUs in batch->cpumask
	2. A local flush, no IPI

The change here eliminates both of those options, even the "common case"
which is not sending an IPI at all. So this replaces a CPU-local (aka. 1
logical CPU) TLB flush with a broadcast to the *ENTIRE* system. That's a
really really big change to not be noted. It's not something that's an
obvious win to me.

> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 3c29ef25dce4..de3f6e4ed16d 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -1316,7 +1316,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>  	 * a local TLB flush is needed. Optimize this use-case by calling
>  	 * flush_tlb_func_local() directly in this case.
>  	 */
> -	if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
> +	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
> +		invlpgb_flush_all_nonglobals();
> +	} else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
>  		flush_tlb_multi(&batch->cpumask, info);
>  	} else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
>  		lockdep_assert_irqs_enabled();

The structure of the code is also a bit off to me. O'd kinda prefer that
we stick the pattern of (logically):

	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
		invlpgb_...();
	} else {
		on_each_cpu*();
	}

This patch is going a couple of functions up in the call chain above the
on_each_cpu()'s.

It would be more consistent with the previous modifications in this
series if the X86_FEATURE_INVLPGB check was in native_flush_tlb_multi(),
instead.

Would that make sense here? It would also preserve the "common case"
optimization that's in arch_tlbbatch_flush().