linux-kernel - Re: [RFC PATCH 1/5] x86/kexec: Do unconditional WBINVD for bare-metal in stop_this

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23c990b979a5b46a5436279126565198ea985889.camel@intel.com>
Date: Tue, 18 Mar 2025 03:41:56 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "tglx@...utronix.de" <tglx@...utronix.de>, "peterz@...radead.org"
	<peterz@...radead.org>, "mingo@...hat.com" <mingo@...hat.com>, "Hansen, Dave"
	<dave.hansen@...el.com>, "Huang, Kai" <kai.huang@...el.com>, "bp@...en8.de"
	<bp@...en8.de>, "kirill.shutemov@...ux.intel.com"
	<kirill.shutemov@...ux.intel.com>
CC: "ashish.kalra@....com" <ashish.kalra@....com>, "dyoung@...hat.com"
	<dyoung@...hat.com>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"x86@...nel.org" <x86@...nel.org>, "sagis@...gle.com" <sagis@...gle.com>,
	"hpa@...or.com" <hpa@...or.com>, "Chatre, Reinette"
	<reinette.chatre@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "Williams, Dan J" <dan.j.williams@...el.com>,
	"bhe@...hat.com" <bhe@...hat.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "dwmw@...zon.co.uk" <dwmw@...zon.co.uk>,
	"nik.borisov@...e.com" <nik.borisov@...e.com>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>
Subject: Re: [RFC PATCH 1/5] x86/kexec: Do unconditional WBINVD for bare-metal
 in stop_this_cpu()

On Mon, 2025-03-17 at 10:11 +0000, Huang, Kai wrote:
> On Thu, 2025-03-13 at 18:40 +0000, Edgecombe, Rick P wrote:
> > On Thu, 2025-03-13 at 00:34 +1300, Kai Huang wrote:
> > > TL;DR:
> > > 
> > > Change to do unconditional WBINVD in stop_this_cpu() for bare metal to
> > > cover kexec support for both AMD SME and Intel TDX.  Previously there
> > > _was_ some issue preventing from doing so but now it has been fixed.
> >                               ^ Adding "the kernel" here would read a little
> > cleaner to me.
> 
> OK.
> 
> > 
> > 
> > When I read "some issue" I start assuming it wasn't fully debugged and it is
> > "some issue" that no one knows. But below it sounds like it was root caused.
> 
> I am not sure what's wrong with "some issue".  I used "_was_" to convey this
> issue was fixed (thus root caused).  Perhaps the "root caused" part isn't clear?
> I can explicitly call it out.
> 
>   Previously there _was_ some issue preventing the kernel from doing so but 
>   now it has been root caused and fixed. 

The problem is the phrase "some issue" sounds like the issue is not understood.
You could just change it to "an issue". I don't even think you need the "_"
around "_was_". Just "Previously there was an issue preventing the kernel..." is
more reassuring.


[snip]

> > 
> > Instead of a play-by-play, it might be more informative to summarize the edges
> > covered in this history:
> 
> I think the above is also informative.  Boris suggested to keep the discussion
> around those relevant commits in the changelog:
> 
> https://lore.kernel.org/linux-kernel/20240228110207.GCZd8Sr8mXHA2KTiLz@fat_crate.local/

Summary: Kirill says it's too verbose, can be dropped.

> https://lore.kernel.org/linux-kernel/20240415175912.GAZh1q8GgpY3tpJj5a@fat_crate.local/

Summary: Boris says not to drop it and it's ok to be verbose.

But what I'm saying is that the section doesn't summarize the issue well. It
just links to a bunch of commits for the reviewer to go figure it out
themselves. So I think explaining the takeaways instead of only linking to
threads wouldn't be objectionable. You don't need to drop the commit references,
just don't leave so much homework for the reader.

> 
> >   - Don't do anything that writes memory between wbinvd and native_halt(). This
> > includes function calls that touch the stack.
> 
> This is a requirement to SME, but changing to unconditional WBINVD on bare-metal
> doesn't change this behaviour.  What's the purpose of mentioning here?

The purpose is to help the reviewer understand the delicate design of this
function so that they can evaluate whether the changes upset it.

> 
> >   - Avoid issuing wbinvd on multiple CPUs at the same time. As tglx implies it is
> > too expensive.
> 
> Boris suggested to do unconditional WBINVD.  The test by Dave Young also
> confirms there was no issue on the once-problematic platforms:
> 
> https://lore.kernel.org/lkml/CALu+AoSZkq1kz-xjvHkkuJ3C71d0SM5ibEJurdgmkZqZvNp2dQ@mail.gmail.com/

I'm not sure what your point is here. That there is no issue? This log points to
a commit that contradicts the assertions it is making. Especially since to
understand this part of the log, the reviewer is going to have to read those
referenced commits, don't you think it's a problem? It is opening new doubts.

> 
> >   - Don't race reboot by watching cpumask instead of num_online_cpus(). But there
> > is a race still.
> > 
> > Hmm, on the last one tglx says:
> >      The cpumask cannot plug all holes either, but it's better than a raw
> >      counter and allows to restrict the NMI fallback IPI to be sent only the
> >      CPUs which have not reported within the timeout window
> > 
> > Are we opening up more platforms to a race by unconditionally doing the wbinvd?
> > Can we be clarify that nothing bad happens if we lose the race? (and is it
> > true?)
> 
> IIUC in most cases the REBOOT vector will just do the job, stop_other_cpus()
> won't need to send NMIs thus I believe in most cases this shouldn't increase
> race.  
> 
> I don't know what kinda platform will need NMI to stop remote CPUs.  For
> instance, AFAICT my SPR machine with 192 CPUs never needed to send NMI in my
> testings.
> 
> Dave Yong tested on those once-problematic platforms and doing unconditional
> wbinvd was fine.  This patchset (albeit not upstreamed) has also been tested by
> customers in their environment.  I believe doing unconditional WBINVD is fine.

Sounds too much like: "Someone tested it on a platform that used to have a
problem and at least that one is fixed, but we really don't understand what the
issue is".

I haven't tried to understand what race tglx was seeing, or what the consequence
is. I think we should understand and explain why it's harmless, especially since
this bring it up by linking the log that references it.