linux-kernel - Re: [PATCH v5 10/12] x86,tlb: do targeted broadcast flushing from tlbbatch code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <5501C6EC-38C7-472F-B129-F7A5144C3096@gmail.com>
Date: Mon, 20 Jan 2025 20:56:57 +0200
From: Nadav Amit <nadav.amit@...il.com>
To: Rik van Riel <riel@...riel.com>
Cc: the arch/x86 maintainers <x86@...nel.org>,
 Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
 Borislav Petkov <bp@...en8.de>,
 peterz@...radead.org,
 Dave Hansen <dave.hansen@...ux.intel.com>,
 zhengqi.arch@...edance.com,
 thomas.lendacky@....com,
 kernel-team@...a.com,
 "open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>,
 Andrew Morton <akpm@...ux-foundation.org>,
 jannh@...gle.com,
 mhklinux@...look.com,
 andrew.cooper3@...rix.com
Subject: Re: [PATCH v5 10/12] x86,tlb: do targeted broadcast flushing from
 tlbbatch code



> On 20 Jan 2025, at 19:56, Rik van Riel <riel@...riel.com> wrote:
> 
> How would you keep track of CPUs where the tlbsync
> has NOT happened before arch_tlbbatch_flush()?
> 
> That part seems to be missing still.

You only keep track if there is a pending tlbsync on *your* CPU. No need to
track if other CPUs did not issue tlbsync during arch_tlbbatch_add_pending().
If the process that does the reclamation was migrated, a TLBSYNC is issued
during the context switch, before that thread that does the reclamation has
any chance of being scheduled.

I hope this code changes on top of your would make it clearer:

> +void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> +					     struct mm_struct *mm,
> +					     unsigned long uaddr)
> +{
> +	if (static_cpu_has(X86_FEATURE_INVLPGB) && mm_global_asid(mm)) {
> +		u16 asid = mm_global_asid(mm);
> +		/*
> +		 * Queue up an asynchronous invalidation. The corresponding
> +		 * TLBSYNC is done in arch_tlbbatch_flush(), and must be done
> +		 * on the same CPU.
> +		 */

#if 0 		// remove
> +		if (!batch->used_invlpgb) {
> +			batch->used_invlpgb = true;
> +			migrate_disable();
> +		}
#endif

		 batch->used_invlpg = true;
		 preempt_disable();

> +		invlpgb_flush_user_nr_nosync(kern_pcid(asid), uaddr, 1, false);
> +		/* Do any CPUs supporting INVLPGB need PTI? */
> +		if (static_cpu_has(X86_FEATURE_PTI))
> +			invlpgb_flush_user_nr_nosync(user_pcid(asid), uaddr, 1, false);

		 this_cpu_write(cpu_tlbstate.pending_tlbsync, true);
		 preempt_enable();
> +
> +		/*
> +		 * Some CPUs might still be using a local ASID for this
> +		 * process, and require IPIs, while others are using the
> +		 * global ASID.
> +		 *
> +		 * In this corner case we need to do both the broadcast
> +		 * TLB invalidation, and send IPIs. The IPIs will help
> +		 * stragglers transition to the broadcast ASID.
> +		 */
> +		if (READ_ONCE(mm->context.asid_transition))
> +			goto also_send_ipi;
> +	} else {
> +also_send_ipi:
> +		inc_mm_tlb_gen(mm);
> +		cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> +	}
> +	mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
> +}
> +

Then in switch_mm_irqs_off(), b

	if (this_cpu_read(cpu_tlbstate.pending_tlbsync))
		tlbsync();

Note that when switch_mm_irqs_off() is called due to context switch from 
context_switch(), finish_task_switch() has still not took place, so the
task cannot be scheduled on other cores.