linux-kernel - Re: [patch V2 1/8] x86/smp: Make stop_other

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87fs6t8y7p.ffs@tglx>
Date:   Thu, 15 Jun 2023 00:40:10 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Ashok Raj <ashok.raj@...el.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Mario Limonciello <mario.limonciello@....com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Tony Battersby <tonyb@...ernetics.com>,
        Ashok Raj <ashok.raj@...ux.intel.com>,
        Tony Luck <tony.luck@...el.com>,
        Arjan van de Veen <arjan@...ux.intel.com>,
        Eric Biederman <ebiederm@...ssion.com>,
        Ashok Raj <ashok.raj@...el.com>
Subject: Re: [patch V2 1/8] x86/smp: Make stop_other_cpus() more robust

On Wed, Jun 14 2023 at 13:47, Ashok Raj wrote:
> On Wed, Jun 14, 2023 at 09:53:21PM +0200, Thomas Gleixner wrote:
>> 
>> Now let me look into this NMI cruft.
>> 
>
> Maybe if each CPU going down can set their mask, we can simply hit NMI to
> only the problematic ones?
>
> The simple count doesn't capture the CPUs in trouble.

Even a mask is not cutting it. If CPUs did not react on the reboot
vector then there is no guarantee that they are not going to do so
concurrently to the NMI IPI:

CPU0                          CPU1

IPI(BROADCAST, REBOOT);
wait() // timeout
                              stop_this_cpu()
if (!all_stopped()) {
  for_each_cpu(cpu, mask) {
                                mark_stopped(); <- all_stopped() == true now
       IPI(cpu, NMI);
  }                            --> NMI()

  // no wait() because all_stopped() == true

proceed_and_hope() ....

On bare metal this is likely to "work" by chance, but in a guest all
bets are off.

I'm not surprised at all.

The approach of piling hardware and firmware legacy on top of hardware
and firmware legacy in the hope that we can "fix" that in software was
wrong from the very beginning.

What's surprising is that this worked for a really long time. Though
with increasing complexity the thereby produced debris is starting to
rear its ugly head.

I'm sure the marketing departements of _all_ x86 vendors will come up
with a brilliant slogan for that. Something like:

  "We are committed to ensure that you are able to experience the
   failures of the past forever with increasingly improved performance
   and new exciting features which are fully backwards failure
   compatible."

TBH, the (OS) software industry has proliferated that by joining the
'features first' choir without much thought and push back. See
arch/x86/kernel/cpu/* for prime examples.

Ranted enough. I'm going to sleep now and look at this mess tomorrow
morning with brain awake again. Though that will not change the
underlying problem, which is unfixable.

Thanks,

        tglx