[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBrhKERLxvklAhiP@FVFF77S0Q05N>
Date: Wed, 22 Mar 2023 11:06:16 +0000
From: Mark Rutland <mark.rutland@....com>
To: Usama Arif <usama.arif@...edance.com>
Cc: dwmw2@...radead.org, tglx@...utronix.de, kim.phillips@....com,
brgerst@...il.com, piotrgorski@...hyos.org,
oleksandr@...alenko.name, arjan@...ux.intel.com, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
x86@...nel.org, pbonzini@...hat.com, paulmck@...nel.org,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
rcu@...r.kernel.org, mimoja@...oja.de, hewenliang4@...wei.com,
thomas.lendacky@....com, seanjc@...gle.com, pmenzel@...gen.mpg.de,
fam.zheng@...edance.com, punit.agrawal@...edance.com,
simon.evans@...edance.com, liangma@...ngbit.com,
gpiccoli@...lia.com, David Woodhouse <dwmw@...zon.co.uk>
Subject: Re: [PATCH v16 2/8] cpu/hotplug: Reset task stack state in _cpu_up()
On Tue, Mar 21, 2023 at 07:40:02PM +0000, Usama Arif wrote:
> From: David Woodhouse <dwmw@...zon.co.uk>
>
> Commit dce1ca0525bf ("sched/scs: Reset task stack state in bringup_cpu()")
> ensured that the shadow call stack and KASAN poisoning were removed from
> a CPU's stack each time that CPU is brought up, not just once.
>
> This is not incorrect. However, with parallel bringup, an architecture
> may obtain the idle thread for a new CPU from a pre-bringup stage, by
> calling idle_thread_get() for itself. This would mean that the cleanup
> in bringup_cpu() would be too late.
>
> Move the SCS/KASAN cleanup to the generic _cpu_up() function instead,
> which already ensures that the new CPU's stack is available, purely to
> allow for early failure. This occurs when the CPU to be brought up is
> in the CPUHP_OFFLINE state, which should correctly do the cleanup any
> time the CPU has been taken down to the point where such is needed.
>
> Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
This all sounds fine to me, and the patch itself looks good.
I built an arm64 kernel with the first three patches from this series applied
atop v6.3-rc3, with defconfig + CONFIG_SHADOW_CALL_STACK=y +
CONFIG_KASAN_INLINE=y + CONFIG_KASAN_STACK=y. I then hotplugged a cpu with:
while true; do
echo 0 > /sys/devices/system/cpu/cpu1/online;
echo 1 > /sys/devices/system/cpu/cpu1/online;
done
... and that was perfectly happy to run for minutes with no unexpected failures.
To make sure I wasn't avoiding issues by chance, I also tried with each of
scs_task_reset() and kasan_unpoison_task_stack() commented out. With
scs_task_reset() commented out, cpu re-onlining fails after a few iterations,
and with kasan_unpoison_task_stack() commented out I get a KASAN splat upon the
first re-onlining. So that all looks good.
FWIW, for the first three patches:
Reviewed-by: Mark Rutland <mark.rutland@....com>
Tested-by: Mark Rutland <mark.rutland@....com> [arm64]
Mark.
> ---
> kernel/cpu.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 6c0a92ca6bb5..43e0a77f21e8 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -591,12 +591,6 @@ static int bringup_cpu(unsigned int cpu)
> struct task_struct *idle = idle_thread_get(cpu);
> int ret;
>
> - /*
> - * Reset stale stack state from the last time this CPU was online.
> - */
> - scs_task_reset(idle);
> - kasan_unpoison_task_stack(idle);
> -
> /*
> * Some architectures have to walk the irq descriptors to
> * setup the vector space for the cpu which comes online.
> @@ -1383,6 +1377,12 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
> ret = PTR_ERR(idle);
> goto out;
> }
> +
> + /*
> + * Reset stale stack state from the last time this CPU was online.
> + */
> + scs_task_reset(idle);
> + kasan_unpoison_task_stack(idle);
> }
>
> cpuhp_tasks_frozen = tasks_frozen;
> --
> 2.25.1
>
Powered by blists - more mailing lists