[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d4cde50b4aab24612823714dfcbe69bc4bb63b60.camel@infradead.org>
Date: Mon, 20 Dec 2021 17:10:21 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Igor Mammedov <imammedo@...hat.com>
Cc: Tom Lendacky <thomas.lendacky@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"x86@...nel.org" <x86@...nel.org>,
"H . Peter Anvin" <hpa@...or.com>,
Paolo Bonzini <pbonzini@...hat.com>,
"Paul E . McKenney" <paulmck@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
"mimoja@...oja.de" <mimoja@...oja.de>,
"hewenliang4@...wei.com" <hewenliang4@...wei.com>,
"hushiyuan@...wei.com" <hushiyuan@...wei.com>,
"luolongjun@...wei.com" <luolongjun@...wei.com>,
"hejingxian@...wei.com" <hejingxian@...wei.com>
Subject: Re: [PATCH v3 0/9] Parallel CPU bringup for x86_64
On Fri, 2021-12-17 at 11:09 +0100, Igor Mammedov wrote:
> On Fri, 17 Dec 2021 00:13:16 +0000
> David Woodhouse <dwmw2@...radead.org> wrote:
>
> > On Thu, 2021-12-16 at 16:52 -0600, Tom Lendacky wrote:
> > > On baremetal, I haven't seen an issue. This only seems to have a problem
> > > with Qemu/KVM.
> > >
> > > With 191f08997577 I could boot without issues with and without the
> > > no_parallel_bringup. Only after I applied e78fa57dd642 did the failure happen.
> > >
> > > With e78fa57dd642 I could boot 64 vCPUs pretty consistently, but when I
> > > jumped to 128 vCPUs it failed again. When I moved the series to
> > > df9726cb7178, then 64 vCPUs also failed pretty consistently.
> > >
> > > Strange thing is it is random. Sometimes (rarely) it works on the first
> > > boot and then sometimes it doesn't, at which point it will reset and
> > > reboot 3 or 4 times and then make it past the failure and fully boot.
> >
> > Hm, some of that is just artifacts of timing, I'm sure. But now I'm
>
> that's most likely the case (there is a race somewhere left).
> To trigger CPU bringup (hotplug) races, I used to run QEMU guest with
> heavy vCPU overcommit. It helps to induce unexpected delays at CPU bringup
> time.
That last commit which actually enables parallel bringup does *two*
things. It makes the generic cpuhp code bring all the CPUs through all
the CPUHP_*_PREPARE stages and then actually brings them up. With that
test patch I sent, the bringup basically *wasn't* parallel any more;
they were using the trampoline lock all the way to the point where they
start waiting on cpu_callin_mask.
So maybe it's the 'prepare' ordering, like the x2apic one I already
fixed... but some weirdness that only triggers on some CPUs. Can we
back out the actual pseudo-parallel bringup and do *only* the prepare
part, by doing something like this on top...
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1337,7 +1337,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
int ret;
/* If parallel AP bringup isn't enabled, perform the first steps now. */
- if (!do_parallel_bringup) {
+ if (1 || !do_parallel_bringup) {
ret = do_cpu_up(cpu, tidle);
if (ret)
return ret;
@@ -1366,7 +1366,8 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
/* Bringup step one: Send INIT/SIPI to the target AP */
static int native_cpu_kick(unsigned int cpu)
{
- return do_cpu_up(cpu, idle_thread_get(cpu));
+ return 0;
+ // return do_cpu_up(cpu, idle_thread_get(cpu));
}
/**
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5174 bytes)
Powered by blists - more mailing lists