lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 Mar 2023 19:14:07 +0000
From:   David Woodhouse <dwmw2@...radead.org>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Usama Arif <usama.arif@...edance.com>, kim.phillips@....com,
        brgerst@...il.com
Cc:     piotrgorski@...hyos.org, oleksandr@...alenko.name,
        arjan@...ux.intel.com, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, x86@...nel.org,
        pbonzini@...hat.com, paulmck@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        rcu@...r.kernel.org, mimoja@...oja.de, hewenliang4@...wei.com,
        thomas.lendacky@....com, seanjc@...gle.com, pmenzel@...gen.mpg.de,
        fam.zheng@...edance.com, punit.agrawal@...edance.com,
        simon.evans@...edance.com, liangma@...ngbit.com,
        gpiccoli@...lia.com
Subject: Re: [PATCH v15 03/12] cpu/hotplug: Add dynamic parallel bringup
 states before CPUHP_BRINGUP_CPU

On Mon, 2023-03-20 at 15:30 +0100, Thomas Gleixner wrote:
> 
> This causes a subtle issue. The bringup loop above moves all CPUs to
> cpuhp_state == CPUHP_BP_PARALLEL_DYN_END. So the serial bootup will
> start from there and bring them fully up.
> 
> Now if a bringup fails, then the rollback will only go back down to
> CPUHP_BP_PARALLEL_DYN_END, which means that the control CPU won't do any
> cleanups below CPUHP_BP_PARALLEL_DYN_END.
> 
> That 'fail' is a common case for SMT soft disable via the 'nosmt'
> command line parameter. Due to the marvelous MCE broadcast 'feature' we
> need to bringup the SMT siblings at least to the CPUHP_AP_ONLINE_IDLE
> state once and then roll them back.
> 
> While this is not necessarily a fatal problem, it's changing behaviour
> and with quite some of the details hidden in the (then not issued)
> teardown callbacks might cause some hard to decode subtle surprises.
> 
> So that second for_each_present_cpu() loop needs to check the return
> value of cpu_up() and issue a full rollback to CPUHP_OFFLINE in case of
> fail.

@@ -1524,8 +1531,22 @@ void bringup_nonboot_cpus(unsigned int setup_max_cpus)
        for_each_present_cpu(cpu) {
                if (num_online_cpus() >= setup_max_cpus)
                        break;
-               if (!cpu_online(cpu))
-                       cpu_up(cpu, CPUHP_ONLINE);
+               if (!cpu_online(cpu)) {
+                       int ret = cpu_up(cpu, CPUHP_ONLINE);
+
+                       /*
+                        * For the parallel bringup case, roll all the way back
+                        * to CPUHP_OFFLINE on failure; don't leave them in the
+                        * parallel stages. This happens in the nosmt case for
+                        * non-primary threads.
+                        */
+                       if (ret && cpuhp_hp_states[CPUHP_BP_PARALLEL_DYN].name) {
+                               struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+                               if (can_rollback_cpu(st))
+                                       WARN_ON(cpuhp_invoke_callback_range(false, cpu, st,
+                                                                           CPUHP_OFFLINE));
+                       }
+               }
        }
 }
 


Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ