[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <efbe0d3d92e6c279e3a6d7a4191ca7470bc4beec.camel@infradead.org>
Date: Wed, 29 Dec 2021 13:54:54 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Paul Menzel <pmenzel@...gen.mpg.de>,
Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H . Peter Anvin" <hpa@...or.com>,
Paolo Bonzini <pbonzini@...hat.com>,
"Paul E . McKenney" <paulmck@...nel.org>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
rcu@...r.kernel.org, mimoja@...oja.de, hewenliang4@...wei.com,
hushiyuan@...wei.com, luolongjun@...wei.com, hejingxian@...wei.com
Subject: Re: [PATCH v3 0/9] Parallel CPU bringup for x86_64
On Wed, 2021-12-29 at 14:18 +0100, Paul Menzel wrote:
> > Or the one in
> > https://lore.kernel.org/lkml/d4cde50b4aab24612823714dfcbe69bc4bb63b60.camel@infradead.org
> >
> > which makes it do nothing except prepare all the CPUs before bringing
> > them up one at a time?
>
> I applied it on top the other one, and it made no difference either.
It's possible I missed something else in the prepare stage that doesn't
cope with all CPUs being prepared first.
My next attempt might be to change the loop in bringup_nonboot_cpus()
to bring all the CPUs not to the CPUHP_BP_PARALLEL_DYN state(s) but
instead just bring them to somewhere like CPUHP_RCUTREE_PREP, which is
somewhere in the middle between CPUHP_OFFLINE and CPUHP_BRINGUP_CPU.
Then a binary chop search — if that one boots, try maybe
CPUHP_TOPOLOGY_PREPARE. And if not, try CPUHP_PROFILE_PREPARE. Etc.
> > My current theory (not that I've spent that much time thinking about it
> > in the last week) is that there's something about the existing CPU
> > bringup, possibly a CPU bug or something special about the AMD CPUs,
> > which is triggered by just making it a little bit *faster*, which is
> > why bringing them up from kexec (especially in qemu) can cause it too?
>
> Would having the serial console enabled make a difference?
>
Yes. I couldn't make this fail in my EC2 m6a instance (for clean boots;
I have never managed to kexec it) until I turned off the serial console
to make things go faster.
> > Tom seemed to find that it was in load_TR_desc(), so if you could try
> > this hack on a machine that doesn't magically wink out of existence on
> > a triplefault before even flushing its serial output, that would be
> > much appreciated...
> Unfortunately, no more messages were printed on the serial console.
I suppose we need to litter those outputs somewhere earlier in the
trampoline then, perhaps it *isn't* getting to load_TR_desc() in your
case?
Will be back online properly next week and can actually provide some of
the above suggestions in patch form if you're willing to keep testing.
Thanks!
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5174 bytes)
Powered by blists - more mailing lists