lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bbe96ff4-3633-4008-b524-6b183a64caf6@intel.com>
Date: Mon, 6 Jan 2025 09:21:11 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: Rik van Riel <riel@...riel.com>, x86@...nel.org
Cc: linux-kernel@...r.kernel.org, kernel-team@...a.com,
 dave.hansen@...ux.intel.com, luto@...nel.org, peterz@...radead.org,
 tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com,
 akpm@...ux-foundation.org, nadav.amit@...il.com, zhengqi.arch@...edance.com,
 linux-mm@...ck.org
Subject: Re: [PATCH 06/12] x86/mm: use INVLPGB for kernel TLB flushes

On 12/30/24 09:53, Rik van Riel wrote:
> Use broadcast TLB invalidation for kernel addresses when available.
> 
> This stops us from having to send IPIs for kernel TLB flushes.

Could this be changed to imperative voice, please?

	Remove the need to send IPIs for kernel TLB flushes.

> +static void broadcast_kernel_range_flush(unsigned long start, unsigned long end)
> +{
> +	unsigned long addr;
> +	unsigned long maxnr = invlpgb_count_max;
> +	unsigned long threshold = tlb_single_page_flush_ceiling * maxnr;

The 'tlb_single_page_flush_ceiling' value was determined by looking at
_local_ invalidation cost. Could you talk a bit about why it's also a
good value to use for remote invalidations? Does it hold up for INVLPGB
the same way it did for good ol' INVLPG? Has there been any explicit
testing here to find a good value?

I'm also confused by the multiplication here. Let's say
invlpgb_count_max==20 and tlb_single_page_flush_ceiling==30.

You would need to switch away from single-address invalidation when the
number of addresses is >20 for INVLPGB functional reasons. But you'd
also need to switch away when >30 for performance reasons
(tlb_single_page_flush_ceiling).

But I don't understand how that would make the threshold 20*30=600
invalidations.

> +	/*
> +	 * TLBSYNC only waits for flushes originating on the same CPU.
> +	 * Disabling migration allows us to wait on all flushes.
> +	 */

Imperative voice here too, please:

	Disable migration to wait on all flushes.

> +	guard(preempt)();
> +
> +	if (end == TLB_FLUSH_ALL ||
> +	    (end - start) > threshold << PAGE_SHIFT) {

This is basically a copy-and-paste of the "range vs. global" flush
logic, but taking 'invlpgb_count_max' into account.

It would be ideal if those limit checks could be consolidated. I suspect
that when the 'threshold' calculation above gets clarified that they may
be easier to consolidate.

BTW, what is a typical value for 'invlpgb_count_max'? Is it more or less
than the typical value for 'tlb_single_page_flush_ceiling'?

Maybe we should just lower 'tlb_single_page_flush_ceiling' if
'invlpgb_count_max' falls below it so we only have _one_ runtime value
to consider.


> +		invlpgb_flush_all();
> +	} else {
> +		unsigned long nr;
> +		for (addr = start; addr < end; addr += nr << PAGE_SHIFT) {
> +			nr = min((end - addr) >> PAGE_SHIFT, maxnr);
> +			invlpgb_flush_addr(addr, nr);
> +		}
> +	}
> +
> +	tlbsync();
> +}
> +
>  static void do_kernel_range_flush(void *info)
>  {
>  	struct flush_tlb_info *f = info;
> @@ -1089,6 +1115,11 @@ static void do_kernel_range_flush(void *info)
>  
>  void flush_tlb_kernel_range(unsigned long start, unsigned long end)
>  {
> +	if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
> +		broadcast_kernel_range_flush(start, end);
> +		return;
> +	}
> +
>  	/* Balance as user space task's flush, a bit conservative */
>  	if (end == TLB_FLUSH_ALL ||
>  	    (end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) {

I also wonder if this would all get simpler if we give in and *always*
call get_flush_tlb_info(). That would provide a nice single place to
consolidate the "all vs. ranged" flush logic.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ