lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 12 Apr 2024 09:55:19 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Tom Lendacky <thomas.lendacky@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
CC: "x86@...nel.org" <x86@...nel.org>, "Hansen, Dave" <dave.hansen@...el.com>,
	"bp@...en8.de" <bp@...en8.de>, "kirill.shutemov@...ux.intel.com"
	<kirill.shutemov@...ux.intel.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...hat.com" <mingo@...hat.com>, "hpa@...or.com" <hpa@...or.com>,
	"luto@...nel.org" <luto@...nel.org>, "peterz@...radead.org"
	<peterz@...radead.org>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
	"ashish.kalra@....com" <ashish.kalra@....com>, "Gao, Chao"
	<chao.gao@...el.com>, "bhe@...hat.com" <bhe@...hat.com>,
	"nik.borisov@...e.com" <nik.borisov@...e.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "seanjc@...gle.com" <seanjc@...gle.com>
Subject: Re: [PATCH v3 1/5] x86/kexec: do unconditional WBINVD for bare-metal
 in stop_this_cpu()



On 12/04/2024 2:13 am, Tom Lendacky wrote:
> On 4/10/24 17:26, Huang, Kai wrote:
>> On 11/04/2024 4:14 am, Tom Lendacky wrote:
>>> On 4/10/24 11:08, Tom Lendacky wrote:
>>>> On 4/7/24 07:44, Kai Huang wrote:
>>>>
>>>>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>>>> index b8441147eb5e..5ba8a9c1e47a 100644
>>>>> --- a/arch/x86/kernel/process.c
>>>>> +++ b/arch/x86/kernel/process.c
>>>>> @@ -813,18 +813,16 @@ void __noreturn stop_this_cpu(void *dummy)
>>>>>        mcheck_cpu_clear(c);
>>>>>        /*
>>>>> -     * Use wbinvd on processors that support SME. This provides
>>>>> support
>>>>> -     * for performing a successful kexec when going from SME inactive
>>>>> -     * to SME active (or vice-versa). The cache must be cleared so
>>>>> that
>>>>> -     * if there are entries with the same physical address, both
>>>>> with and
>>>>> -     * without the encryption bit, they don't race each other when
>>>>> flushed
>>>>> -     * and potentially end up with the wrong entry being committed to
>>>>> -     * memory.
>>>>> +     * The kernel could leave caches in incoherent state on SME/TDX
>>>>> +     * capable platforms.  Flush cache to avoid silent memory
>>>>> +     * corruption for these platforms.
>>>>>         *
>>>>> -     * Test the CPUID bit directly because the machine might've
>>>>> cleared
>>>>> -     * X86_FEATURE_SME due to cmdline options.
>>>>> +     * stop_this_cpu() is not a fast path, just do unconditional
>>>>> +     * WBINVD for simplicity.  But only do WBINVD for bare-metal
>>>>> +     * as TDX guests and SEV-ES/SEV-SNP guests will get unexpected
>>>>> +     * (and unnecessary) #VE and may unable to handle.
>>>>
>>>> In addition to Kirill's comment on #VE...
>>>>
>>>> This last part of the comment reads a bit odd since you say
>>>> unconditional and then say only do WBINVD for bare-metal. Maybe
>>>> something like this makes it a bit clearer?:
>>>>
>>>> For TDX and SEV-ES/SEV-SNP guests, a WBINVD may cause an exception
>>>> (#VE or #VC). However, all exception handling has been torn down at
>>>> this point, so this would cause the guest to crash. Since memory
>>>> within these types of guests is coherent only issue the WBINVD on
>>>> bare-metal.
>>>
>>> Hmmm... actually it was the other WBINVD in patch #2 that caused the
>>> crash, so what I wrote above isn't accurate. You might want to re-word
>>> as appropriate.
>>
>> Yeah that's why I used "may unable to handle" in the comment, as I
>> thought there's no need to be that specific?
> 
> Yes, makes sense.
> 
>>
>> I tend not to mention "memory within these types of guests is coherent".
>>    I mean the current upstream kernel _ONLY_ does WBINVD for SME, that
>> means for all non-SME environment there's no concern to do WBINVD here.
>>
>> Here we only extend to do WBINVD on more environments, so as long as
>> there's no harm to do WBINVD for them it's OK.
>>
>> How about below?
>>
>>       /*
>>        * The kernel could leave caches in incoherent state on SME/TDX
>>        * capable platforms.  Flush cache to avoid silent memory
>>        * corruption for these platforms.
>>        *
>>        * For TDX and SEV-ES/SEV-SNP guests, a WBINVD causes an
>>        * exception (#VE or #VC), and the kernel may not be able
>>        * to handle such exception (e.g., TDX guest panics if it
>>        * sees #VE).  Since stop_this_cpu() isn't a fast path, just
>>        * issue the WBINVD on bare-metal instead of sprinkling
>>        * around vendor-specific checks.
>>        */
> 
> I think that's ok, but maybe just adding that the WBINVD is not
> necessary for TDX and SEV-ES/SEV-SNP would make it clearer. Just my
> opinion, though.
> 

Yeah can do that.

Will add "WBINVD is not necessary for TDX and SEV-ES/SEV-SNP guests" 
before starting the "Since stop_this_cpu() ...".

Thanks for the feedback.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ