[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ad70c7a-a2d9-c7d9-4921-13000240c086@amd.com>
Date: Thu, 11 Apr 2024 09:13:35 -0500
From: Tom Lendacky <thomas.lendacky@....com>
To: "Huang, Kai" <kai.huang@...el.com>, linux-kernel@...r.kernel.org
Cc: x86@...nel.org, dave.hansen@...el.com, bp@...en8.de,
kirill.shutemov@...ux.intel.com, tglx@...utronix.de, mingo@...hat.com,
hpa@...or.com, luto@...nel.org, peterz@...radead.org,
rick.p.edgecombe@...el.com, ashish.kalra@....com, chao.gao@...el.com,
bhe@...hat.com, nik.borisov@...e.com, pbonzini@...hat.com, seanjc@...gle.com
Subject: Re: [PATCH v3 1/5] x86/kexec: do unconditional WBINVD for bare-metal
in stop_this_cpu()
On 4/10/24 17:26, Huang, Kai wrote:
> On 11/04/2024 4:14 am, Tom Lendacky wrote:
>> On 4/10/24 11:08, Tom Lendacky wrote:
>>> On 4/7/24 07:44, Kai Huang wrote:
>>>
>>>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>>> index b8441147eb5e..5ba8a9c1e47a 100644
>>>> --- a/arch/x86/kernel/process.c
>>>> +++ b/arch/x86/kernel/process.c
>>>> @@ -813,18 +813,16 @@ void __noreturn stop_this_cpu(void *dummy)
>>>> mcheck_cpu_clear(c);
>>>> /*
>>>> - * Use wbinvd on processors that support SME. This provides
>>>> support
>>>> - * for performing a successful kexec when going from SME inactive
>>>> - * to SME active (or vice-versa). The cache must be cleared so
>>>> that
>>>> - * if there are entries with the same physical address, both
>>>> with and
>>>> - * without the encryption bit, they don't race each other when
>>>> flushed
>>>> - * and potentially end up with the wrong entry being committed to
>>>> - * memory.
>>>> + * The kernel could leave caches in incoherent state on SME/TDX
>>>> + * capable platforms. Flush cache to avoid silent memory
>>>> + * corruption for these platforms.
>>>> *
>>>> - * Test the CPUID bit directly because the machine might've
>>>> cleared
>>>> - * X86_FEATURE_SME due to cmdline options.
>>>> + * stop_this_cpu() is not a fast path, just do unconditional
>>>> + * WBINVD for simplicity. But only do WBINVD for bare-metal
>>>> + * as TDX guests and SEV-ES/SEV-SNP guests will get unexpected
>>>> + * (and unnecessary) #VE and may unable to handle.
>>>
>>> In addition to Kirill's comment on #VE...
>>>
>>> This last part of the comment reads a bit odd since you say
>>> unconditional and then say only do WBINVD for bare-metal. Maybe
>>> something like this makes it a bit clearer?:
>>>
>>> For TDX and SEV-ES/SEV-SNP guests, a WBINVD may cause an exception
>>> (#VE or #VC). However, all exception handling has been torn down at
>>> this point, so this would cause the guest to crash. Since memory
>>> within these types of guests is coherent only issue the WBINVD on
>>> bare-metal.
>>
>> Hmmm... actually it was the other WBINVD in patch #2 that caused the
>> crash, so what I wrote above isn't accurate. You might want to re-word
>> as appropriate.
>
> Yeah that's why I used "may unable to handle" in the comment, as I
> thought there's no need to be that specific?
Yes, makes sense.
>
> I tend not to mention "memory within these types of guests is coherent".
> I mean the current upstream kernel _ONLY_ does WBINVD for SME, that
> means for all non-SME environment there's no concern to do WBINVD here.
>
> Here we only extend to do WBINVD on more environments, so as long as
> there's no harm to do WBINVD for them it's OK.
>
> How about below?
>
> /*
> * The kernel could leave caches in incoherent state on SME/TDX
> * capable platforms. Flush cache to avoid silent memory
> * corruption for these platforms.
> *
> * For TDX and SEV-ES/SEV-SNP guests, a WBINVD causes an
> * exception (#VE or #VC), and the kernel may not be able
> * to handle such exception (e.g., TDX guest panics if it
> * sees #VE). Since stop_this_cpu() isn't a fast path, just
> * issue the WBINVD on bare-metal instead of sprinkling
> * around vendor-specific checks.
> */
I think that's ok, but maybe just adding that the WBINVD is not
necessary for TDX and SEV-ES/SEV-SNP would make it clearer. Just my
opinion, though.
Thanks,
Tom
>>
>> Thanks,
>> Tom
>>
>>>
>>> And you can expand the comment block out to at least 80 characters to
>>> make it more compact.
>
> OK I can do. I guess I have to change my vim setting to do so, though :-)
Powered by blists - more mailing lists