lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 11 Apr 2024 09:13:35 -0500
From: Tom Lendacky <thomas.lendacky@....com>
To: "Huang, Kai" <kai.huang@...el.com>, linux-kernel@...r.kernel.org
Cc: x86@...nel.org, dave.hansen@...el.com, bp@...en8.de,
 kirill.shutemov@...ux.intel.com, tglx@...utronix.de, mingo@...hat.com,
 hpa@...or.com, luto@...nel.org, peterz@...radead.org,
 rick.p.edgecombe@...el.com, ashish.kalra@....com, chao.gao@...el.com,
 bhe@...hat.com, nik.borisov@...e.com, pbonzini@...hat.com, seanjc@...gle.com
Subject: Re: [PATCH v3 1/5] x86/kexec: do unconditional WBINVD for bare-metal
 in stop_this_cpu()

On 4/10/24 17:26, Huang, Kai wrote:
> On 11/04/2024 4:14 am, Tom Lendacky wrote:
>> On 4/10/24 11:08, Tom Lendacky wrote:
>>> On 4/7/24 07:44, Kai Huang wrote:
>>>
>>>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>>> index b8441147eb5e..5ba8a9c1e47a 100644
>>>> --- a/arch/x86/kernel/process.c
>>>> +++ b/arch/x86/kernel/process.c
>>>> @@ -813,18 +813,16 @@ void __noreturn stop_this_cpu(void *dummy)
>>>>       mcheck_cpu_clear(c);
>>>>       /*
>>>> -     * Use wbinvd on processors that support SME. This provides 
>>>> support
>>>> -     * for performing a successful kexec when going from SME inactive
>>>> -     * to SME active (or vice-versa). The cache must be cleared so 
>>>> that
>>>> -     * if there are entries with the same physical address, both 
>>>> with and
>>>> -     * without the encryption bit, they don't race each other when 
>>>> flushed
>>>> -     * and potentially end up with the wrong entry being committed to
>>>> -     * memory.
>>>> +     * The kernel could leave caches in incoherent state on SME/TDX
>>>> +     * capable platforms.  Flush cache to avoid silent memory
>>>> +     * corruption for these platforms.
>>>>        *
>>>> -     * Test the CPUID bit directly because the machine might've 
>>>> cleared
>>>> -     * X86_FEATURE_SME due to cmdline options.
>>>> +     * stop_this_cpu() is not a fast path, just do unconditional
>>>> +     * WBINVD for simplicity.  But only do WBINVD for bare-metal
>>>> +     * as TDX guests and SEV-ES/SEV-SNP guests will get unexpected
>>>> +     * (and unnecessary) #VE and may unable to handle.
>>>
>>> In addition to Kirill's comment on #VE...
>>>
>>> This last part of the comment reads a bit odd since you say 
>>> unconditional and then say only do WBINVD for bare-metal. Maybe 
>>> something like this makes it a bit clearer?:
>>>
>>> For TDX and SEV-ES/SEV-SNP guests, a WBINVD may cause an exception 
>>> (#VE or #VC). However, all exception handling has been torn down at 
>>> this point, so this would cause the guest to crash. Since memory 
>>> within these types of guests is coherent only issue the WBINVD on 
>>> bare-metal.
>>
>> Hmmm... actually it was the other WBINVD in patch #2 that caused the 
>> crash, so what I wrote above isn't accurate. You might want to re-word 
>> as appropriate.
> 
> Yeah that's why I used "may unable to handle" in the comment, as I 
> thought there's no need to be that specific?

Yes, makes sense.

> 
> I tend not to mention "memory within these types of guests is coherent". 
>   I mean the current upstream kernel _ONLY_ does WBINVD for SME, that 
> means for all non-SME environment there's no concern to do WBINVD here.
> 
> Here we only extend to do WBINVD on more environments, so as long as 
> there's no harm to do WBINVD for them it's OK.
> 
> How about below?
> 
>      /*
>       * The kernel could leave caches in incoherent state on SME/TDX
>       * capable platforms.  Flush cache to avoid silent memory
>       * corruption for these platforms.
>       *
>       * For TDX and SEV-ES/SEV-SNP guests, a WBINVD causes an
>       * exception (#VE or #VC), and the kernel may not be able
>       * to handle such exception (e.g., TDX guest panics if it
>       * sees #VE).  Since stop_this_cpu() isn't a fast path, just
>       * issue the WBINVD on bare-metal instead of sprinkling
>       * around vendor-specific checks.
>       */

I think that's ok, but maybe just adding that the WBINVD is not 
necessary for TDX and SEV-ES/SEV-SNP would make it clearer. Just my 
opinion, though.

Thanks,
Tom

>>
>> Thanks,
>> Tom
>>
>>>
>>> And you can expand the comment block out to at least 80 characters to 
>>> make it more compact.
> 
> OK I can do.  I guess I have to change my vim setting to do so, though :-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ