linux-kernel - Re: [RFC PATCH 7/9] x86/mm: Introduce Remote Action Request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <09b6eb12ede47b2e2be69bdd68a8732104b26eb0.camel@surriel.com>
Date: Tue, 06 May 2025 11:16:17 -0400
From: Rik van Riel <riel@...riel.com>
To: Nadav Amit <nadav.amit@...il.com>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, "open
 list:MEMORY MANAGEMENT"	 <linux-mm@...ck.org>, the arch/x86 maintainers
 <x86@...nel.org>, 	kernel-team@...a.com, Dave Hansen
 <dave.hansen@...ux.intel.com>, luto@...nel.org, 	peterz@...radead.org,
 Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar	 <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, "H. Peter Anvin"	 <hpa@...or.com>, Yu-cheng
 Yu <yu-cheng.yu@...el.com>
Subject: Re: [RFC PATCH 7/9] x86/mm: Introduce Remote Action Request

On Tue, 2025-05-06 at 09:59 +0300, Nadav Amit wrote:
> 
> 
> > On 6 May 2025, at 3:37, Rik van Riel <riel@...riel.com> wrote:
> > 
> > +void smp_call_rar_many(const struct cpumask *mask, u16 pcid,
> > +		       unsigned long start, unsigned long end)
> > +{
> > +	unsigned long pages = (end - start + PAGE_SIZE) /
> > PAGE_SIZE;
> > +	int cpu, next_cpu, this_cpu = smp_processor_id();
> > +	cpumask_t *dest_mask;
> > +	unsigned long idx;
> > +
> > +	if (pages > RAR_INVLPG_MAX_PAGES || end == TLB_FLUSH_ALL)
> > +		pages = RAR_INVLPG_MAX_PAGES;
> > +
> > +	/*
> > +	 * Can deadlock when called with interrupts disabled.
> > +	 * We allow cpu's that are not yet online though, as no
> > one else can
> > +	 * send smp call function interrupt to this cpu and as
> > such deadlocks
> > +	 * can't happen.
> > +	 */
> > +	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> > +		     && !oops_in_progress &&
> > !early_boot_irqs_disabled);
> 
> To ease it for the reader, consider using the updated version from
> smp.c
> (or - even better - refactor into common inline function):
> 
> 	if (cpu_online(this_cpu) && !oops_in_progress &&
> 	    !early_boot_irqs_disabled)
> 		lockdep_assert_irqs_enabled();

Nice cleanup. I will change this. Thank you.

> 
> 
> > +
> > +	/* Try to fastpath.  So, what's a CPU they want?  Ignoring
> > this one. */
> > +	cpu = cpumask_first_and(mask, cpu_online_mask);
> > +	if (cpu == this_cpu)
> > +		cpu = cpumask_next_and(cpu, mask,
> > cpu_online_mask);
> > +
> > +	/* No online cpus?  We're done. */
> > +	if (cpu >= nr_cpu_ids)
> > +		return;
> > +
> > +	/* Do we have another CPU which isn't us? */
> > +	next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
> > +	if (next_cpu == this_cpu)
> > +		next_cpu = cpumask_next_and(next_cpu, mask,
> > cpu_online_mask);
> > +
> > +	/* Fastpath: do that cpu by itself. */
> 
> If you follow my comment (suggestion) about the concurrent flushes,
> then 
> this part should be moved to be in the same was as done in the
> updated
> smp_call_function_many_cond().
> 
> IOW, the main difference between this path and the “slow path” is 
> arch_send_rar_ipi_mask() vs arch_send_rar_single_ipi() (and maybe
> “and” with cpu_online_mask).

It gets better. Page 8 of the RAR whitepaper tells
us that we can simply use RAR to have a CPU send
itself TLB flush instructions, and the microcode
will do the flush at the same time the other CPUs
handle theirs.

"At this point, the ILP may invalidate its own TLB by 
 signaling RAR to itself in order to invoke the RAR handler
 locally as well"

I tried this, but things blew up very early in
boot, presumably due to the CPU trying to send
itself a RAR before it was fully configured to
handle them.

The code may need a better decision point than
cpu_feature_enabled(X86_FEATURE_RAR) to decide
whether or not to use RAR.

Probably something that indicates RAR is actually
ready to use on all CPUs.

> 
> Since 2019 we have move into “multi” TLB flush instead of “many”.
> 
> This means that try to take advantage of the time between sending the
> IPI
> and the indication it was completed to do the local TLB flush. 

I think we have 3 cases here:

1) Only the local TLB needs to be flushed.
   In this case we can INVPCID locally, and skip any
   potential contention on the RAR payload table.

2) Only one remote CPU needs to be flushed (no local).
   This can use the arch_rar_send_single_ipi() thing.

3) Multiple CPUs need to be flushed. This could include
   the local CPU, or be only multiple remote CPUs.
   For this case we could just use arch_send_rar_ipi_mask(),
   including sending a RAR request to the local CPU, which
   should handle it concurrently with the other CPUs.

Does that seem like a reasonable way to handle things?

> > +
> > +	for_each_cpu(cpu, dest_mask)
> > +		wait_for_done(idx, cpu);
> > +
> > +	free_payload(idx);
> > +	unlock(this_cpu_ptr(&rar_lock));
> 
> We don’t do lock/unlock on kernel/smp.c . So I would expect at least
> a
> comment as for why it is required.
> 
That is a very good question!

It is locking a per-cpu lock, which no other code
path takes.

It looks like it could protect against preemption,
on a kernel with full preemption enabled, but that
should not be needed since the code in arch/x86/mm/tlb.c
disables preemption around every call to the RAR code.

I suspect that lock is no longer needed, but maybe
somebody at Intel has a reason why we still do?

-- 
All Rights Reversed.