[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZD1q3TF2ixVD1f2M@FVFF77S0Q05N>
Date: Mon, 17 Apr 2023 16:50:53 +0100
From: Mark Rutland <mark.rutland@....com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
David Woodhouse <dwmw@...radead.org>,
Andrew Cooper <andrew.cooper3@...rix.com>,
Brian Gerst <brgerst@...il.com>,
Arjan van de Veen <arjan@...ux.intel.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Paul McKenney <paulmck@...nel.org>,
Tom Lendacky <thomas.lendacky@....com>,
Sean Christopherson <seanjc@...gle.com>,
Oleksandr Natalenko <oleksandr@...alenko.name>,
Paul Menzel <pmenzel@...gen.mpg.de>,
"Guilherme G. Piccoli" <gpiccoli@...lia.com>,
Piotr Gorski <lucjan.lucjanov@...il.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
linux-arm-kernel@...ts.infradead.org,
David Woodhouse <dwmw@...zon.co.uk>,
Usama Arif <usama.arif@...edance.com>,
Juergen Gross <jgross@...e.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
xen-devel@...ts.xenproject.org,
Russell King <linux@...linux.org.uk>,
Arnd Bergmann <arnd@...db.de>, Guo Ren <guoren@...nel.org>,
linux-csky@...r.kernel.org,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
linux-mips@...r.kernel.org,
"James E.J. Bottomley" <James.Bottomley@...senPartnership.com>,
Helge Deller <deller@....de>, linux-parisc@...r.kernel.org,
Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>,
linux-riscv@...ts.infradead.org, Sabin Rapan <sabrapan@...zon.com>
Subject: Re: [patch 22/37] arm64: smp: Switch to hotplug core state
synchronization
On Sat, Apr 15, 2023 at 01:44:49AM +0200, Thomas Gleixner wrote:
> Switch to the CPU hotplug core state tracking and synchronization
> mechanim. No functional change intended.
>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Will Deacon <will@...nel.org>
> Cc: linux-arm-kernel@...ts.infradead.org
I gave this a spin on arm64 (in a 64-vCPU VM on an M1 host), and it seems to
work fine with a bunch of vCPUs being hotplugged off and on again randomly.
FWIW:
Tested-by: Mark Rutland <mark.rutland@....com>
I also hacked the code to have the dying CPU spin forever before the call to
cpuhp_ap_report_dead(). In that case I see a warning, and that we don't call
arch_cpuhp_cleanup_dead_cpu(), and that the CPU is marked as offline (per
/sys/devices/system/cpu/$N/online).
As a tangent/aside, we might need to improve that for confidential compute
architectures, and we might want to generically track cpus which might still be
using kernel text/data. On arm64 we ensure that via our cpu_kill() callback
(which'll use PSCI CPU_AFFINITY_INFO), but I'm not sure if TDX and/or SEV-SNP
have a similar mechanism.
Otherwise, a malicious hypervisor can pause a vCPU just before it leaves the
kernel (e.g. immediately after the arch_cpuhp_cleanup_dead_cpu() call), wait
for a kexec (or resuse of stack memroy), and unpause the vCPU to cause things
to blow up.
Thanks,
Mark.
> ---
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/smp.h | 2 +-
> arch/arm64/kernel/smp.c | 14 +++++---------
> 3 files changed, 7 insertions(+), 10 deletions(-)
>
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -216,6 +216,7 @@ config ARM64
> select HAVE_KPROBES
> select HAVE_KRETPROBES
> select HAVE_GENERIC_VDSO
> + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
> select IRQ_DOMAIN
> select IRQ_FORCED_THREADING
> select KASAN_VMALLOC if KASAN
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -99,7 +99,7 @@ static inline void arch_send_wakeup_ipi_
>
> extern int __cpu_disable(void);
>
> -extern void __cpu_die(unsigned int cpu);
> +static inline void __cpu_die(unsigned int cpu) { }
> extern void cpu_die(void);
> extern void cpu_die_early(void);
>
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -333,17 +333,13 @@ static int op_cpu_kill(unsigned int cpu)
> }
>
> /*
> - * called on the thread which is asking for a CPU to be shutdown -
> - * waits until shutdown has completed, or it is timed out.
> + * Called on the thread which is asking for a CPU to be shutdown after the
> + * shutdown completed.
> */
> -void __cpu_die(unsigned int cpu)
> +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
> {
> int err;
>
> - if (!cpu_wait_death(cpu, 5)) {
> - pr_crit("CPU%u: cpu didn't die\n", cpu);
> - return;
> - }
> pr_debug("CPU%u: shutdown\n", cpu);
>
> /*
> @@ -370,8 +366,8 @@ void cpu_die(void)
>
> local_daif_mask();
>
> - /* Tell __cpu_die() that this CPU is now safe to dispose of */
> - (void)cpu_report_death();
> + /* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose of */
> + cpuhp_ap_report_dead();
>
> /*
> * Actually shutdown the CPU. This must never fail. The specific hotplug
>
Powered by blists - more mailing lists