linux-kernel - Re: [patch V2 1/8] x86/smp: Make stop_other

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZIonT7+sxAx8IWEE@a4bf019067fa.jf.intel.com>
Date:   Wed, 14 Jun 2023 13:47:11 -0700
From:   Ashok Raj <ashok.raj@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>
CC:     LKML <linux-kernel@...r.kernel.org>, <x86@...nel.org>,
        Mario Limonciello <mario.limonciello@....com>,
        Tom Lendacky <thomas.lendacky@....com>,
        "Tony Battersby" <tonyb@...ernetics.com>,
        Ashok Raj <ashok.raj@...ux.intel.com>,
        Tony Luck <tony.luck@...el.com>,
        Arjan van de Veen <arjan@...ux.intel.com>,
        Eric Biederman <ebiederm@...ssion.com>,
        Ashok Raj <ashok.raj@...el.com>
Subject: Re: [patch V2 1/8] x86/smp: Make stop_other_cpus() more robust

On Wed, Jun 14, 2023 at 09:53:21PM +0200, Thomas Gleixner wrote:
> > If we go down the INIT path, life is less complicated.. 
> >
> > After REBOOT_VECTOR IPI, if stop_cpus_count > 0, we send NMI to all CPUs.
> > Won't this completely update the atomic_dec() since CPUs in hlt() will also
> > take the NMI correct? I'm not sure if this is problematic.
> >
> > Or should we reinitialize stop_cpus_count before the NMI hurrah
> 
> Bah. Didn't think about HLT. Let me go back to the drawing board. Good catch!
> 
> >> +	/*
> >> +	 * Ensure that the cache line is invalidated on the other CPUs. See
> >> +	 * comment vs. SME in stop_this_cpu().
> >> +	 */
> >> +	atomic_set(&stop_cpus_count, INT_MAX);
> >
> > Didn't understand why INT_MAX here?
> 
> Any random number will do. The only purpose is to ensure that there is
> no dirty cache line on the other (stopped) CPUs.
> 
> Now let me look into this NMI cruft.
> 

Maybe if each CPU going down can set their mask, we can simply hit NMI to
only the problematic ones?

The simple count doesn't capture the CPUs in trouble.