[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b0b8ff8c72ef2c08fac68cea99f5cba9a30feab.camel@infradead.org>
Date: Tue, 21 Mar 2023 19:14:07 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Thomas Gleixner <tglx@...utronix.de>,
Usama Arif <usama.arif@...edance.com>, kim.phillips@....com,
brgerst@...il.com
Cc: piotrgorski@...hyos.org, oleksandr@...alenko.name,
arjan@...ux.intel.com, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, x86@...nel.org,
pbonzini@...hat.com, paulmck@...nel.org,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
rcu@...r.kernel.org, mimoja@...oja.de, hewenliang4@...wei.com,
thomas.lendacky@....com, seanjc@...gle.com, pmenzel@...gen.mpg.de,
fam.zheng@...edance.com, punit.agrawal@...edance.com,
simon.evans@...edance.com, liangma@...ngbit.com,
gpiccoli@...lia.com
Subject: Re: [PATCH v15 03/12] cpu/hotplug: Add dynamic parallel bringup
states before CPUHP_BRINGUP_CPU
On Mon, 2023-03-20 at 15:30 +0100, Thomas Gleixner wrote:
>
> This causes a subtle issue. The bringup loop above moves all CPUs to
> cpuhp_state == CPUHP_BP_PARALLEL_DYN_END. So the serial bootup will
> start from there and bring them fully up.
>
> Now if a bringup fails, then the rollback will only go back down to
> CPUHP_BP_PARALLEL_DYN_END, which means that the control CPU won't do any
> cleanups below CPUHP_BP_PARALLEL_DYN_END.
>
> That 'fail' is a common case for SMT soft disable via the 'nosmt'
> command line parameter. Due to the marvelous MCE broadcast 'feature' we
> need to bringup the SMT siblings at least to the CPUHP_AP_ONLINE_IDLE
> state once and then roll them back.
>
> While this is not necessarily a fatal problem, it's changing behaviour
> and with quite some of the details hidden in the (then not issued)
> teardown callbacks might cause some hard to decode subtle surprises.
>
> So that second for_each_present_cpu() loop needs to check the return
> value of cpu_up() and issue a full rollback to CPUHP_OFFLINE in case of
> fail.
@@ -1524,8 +1531,22 @@ void bringup_nonboot_cpus(unsigned int setup_max_cpus)
for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
break;
- if (!cpu_online(cpu))
- cpu_up(cpu, CPUHP_ONLINE);
+ if (!cpu_online(cpu)) {
+ int ret = cpu_up(cpu, CPUHP_ONLINE);
+
+ /*
+ * For the parallel bringup case, roll all the way back
+ * to CPUHP_OFFLINE on failure; don't leave them in the
+ * parallel stages. This happens in the nosmt case for
+ * non-primary threads.
+ */
+ if (ret && cpuhp_hp_states[CPUHP_BP_PARALLEL_DYN].name) {
+ struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+ if (can_rollback_cpu(st))
+ WARN_ON(cpuhp_invoke_callback_range(false, cpu, st,
+ CPUHP_OFFLINE));
+ }
+ }
}
}
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)
Powered by blists - more mailing lists