[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <224a0aad-077b-4830-8e63-3a42a86120fa@intel.com>
Date: Fri, 12 Apr 2024 09:55:19 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Tom Lendacky <thomas.lendacky@....com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
CC: "x86@...nel.org" <x86@...nel.org>, "Hansen, Dave" <dave.hansen@...el.com>,
"bp@...en8.de" <bp@...en8.de>, "kirill.shutemov@...ux.intel.com"
<kirill.shutemov@...ux.intel.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>, "hpa@...or.com" <hpa@...or.com>,
"luto@...nel.org" <luto@...nel.org>, "peterz@...radead.org"
<peterz@...radead.org>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"ashish.kalra@....com" <ashish.kalra@....com>, "Gao, Chao"
<chao.gao@...el.com>, "bhe@...hat.com" <bhe@...hat.com>,
"nik.borisov@...e.com" <nik.borisov@...e.com>, "pbonzini@...hat.com"
<pbonzini@...hat.com>, "seanjc@...gle.com" <seanjc@...gle.com>
Subject: Re: [PATCH v3 1/5] x86/kexec: do unconditional WBINVD for bare-metal
in stop_this_cpu()
On 12/04/2024 2:13 am, Tom Lendacky wrote:
> On 4/10/24 17:26, Huang, Kai wrote:
>> On 11/04/2024 4:14 am, Tom Lendacky wrote:
>>> On 4/10/24 11:08, Tom Lendacky wrote:
>>>> On 4/7/24 07:44, Kai Huang wrote:
>>>>
>>>>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>>>> index b8441147eb5e..5ba8a9c1e47a 100644
>>>>> --- a/arch/x86/kernel/process.c
>>>>> +++ b/arch/x86/kernel/process.c
>>>>> @@ -813,18 +813,16 @@ void __noreturn stop_this_cpu(void *dummy)
>>>>> mcheck_cpu_clear(c);
>>>>> /*
>>>>> - * Use wbinvd on processors that support SME. This provides
>>>>> support
>>>>> - * for performing a successful kexec when going from SME inactive
>>>>> - * to SME active (or vice-versa). The cache must be cleared so
>>>>> that
>>>>> - * if there are entries with the same physical address, both
>>>>> with and
>>>>> - * without the encryption bit, they don't race each other when
>>>>> flushed
>>>>> - * and potentially end up with the wrong entry being committed to
>>>>> - * memory.
>>>>> + * The kernel could leave caches in incoherent state on SME/TDX
>>>>> + * capable platforms. Flush cache to avoid silent memory
>>>>> + * corruption for these platforms.
>>>>> *
>>>>> - * Test the CPUID bit directly because the machine might've
>>>>> cleared
>>>>> - * X86_FEATURE_SME due to cmdline options.
>>>>> + * stop_this_cpu() is not a fast path, just do unconditional
>>>>> + * WBINVD for simplicity. But only do WBINVD for bare-metal
>>>>> + * as TDX guests and SEV-ES/SEV-SNP guests will get unexpected
>>>>> + * (and unnecessary) #VE and may unable to handle.
>>>>
>>>> In addition to Kirill's comment on #VE...
>>>>
>>>> This last part of the comment reads a bit odd since you say
>>>> unconditional and then say only do WBINVD for bare-metal. Maybe
>>>> something like this makes it a bit clearer?:
>>>>
>>>> For TDX and SEV-ES/SEV-SNP guests, a WBINVD may cause an exception
>>>> (#VE or #VC). However, all exception handling has been torn down at
>>>> this point, so this would cause the guest to crash. Since memory
>>>> within these types of guests is coherent only issue the WBINVD on
>>>> bare-metal.
>>>
>>> Hmmm... actually it was the other WBINVD in patch #2 that caused the
>>> crash, so what I wrote above isn't accurate. You might want to re-word
>>> as appropriate.
>>
>> Yeah that's why I used "may unable to handle" in the comment, as I
>> thought there's no need to be that specific?
>
> Yes, makes sense.
>
>>
>> I tend not to mention "memory within these types of guests is coherent".
>> I mean the current upstream kernel _ONLY_ does WBINVD for SME, that
>> means for all non-SME environment there's no concern to do WBINVD here.
>>
>> Here we only extend to do WBINVD on more environments, so as long as
>> there's no harm to do WBINVD for them it's OK.
>>
>> How about below?
>>
>> /*
>> * The kernel could leave caches in incoherent state on SME/TDX
>> * capable platforms. Flush cache to avoid silent memory
>> * corruption for these platforms.
>> *
>> * For TDX and SEV-ES/SEV-SNP guests, a WBINVD causes an
>> * exception (#VE or #VC), and the kernel may not be able
>> * to handle such exception (e.g., TDX guest panics if it
>> * sees #VE). Since stop_this_cpu() isn't a fast path, just
>> * issue the WBINVD on bare-metal instead of sprinkling
>> * around vendor-specific checks.
>> */
>
> I think that's ok, but maybe just adding that the WBINVD is not
> necessary for TDX and SEV-ES/SEV-SNP would make it clearer. Just my
> opinion, though.
>
Yeah can do that.
Will add "WBINVD is not necessary for TDX and SEV-ES/SEV-SNP guests"
before starting the "Since stop_this_cpu() ...".
Thanks for the feedback.
Powered by blists - more mailing lists