[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9164b0e3-f520-4aab-8b80-520131f0e4db@amd.com>
Date: Mon, 19 Feb 2024 13:45:37 -0600
From: Tom Lendacky <thomas.lendacky@....com>
To: Borislav Petkov <bp@...en8.de>, "Huang, Kai" <kai.huang@...el.com>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org, dave.hansen@...el.com,
 kirill.shutemov@...ux.intel.com, tglx@...utronix.de, mingo@...hat.com,
 hpa@...or.com, luto@...nel.org, peterz@...radead.org, chao.gao@...el.com,
 bhe@...hat.com, nik.borisov@...e.com, pbonzini@...hat.com
Subject: Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush
 during kexec
On 2/19/24 10:16, Borislav Petkov wrote:
> On Wed, Jan 31, 2024 at 11:31:53AM +0000, Huang, Kai wrote:
>> From: Kai Huang <kai.huang@...el.com>
>>
>> Currently on AMD SME platforms, during kexec() caches are flushed
>> manually before jumping to the new kernel due to memory encryption.
>> Intel TDX needs to flush cachelines of TDX private memory before
>> jumping to the second kernel too, otherwise they may silently corrupt
>> the new kernel.
>>
>> Instead of sprinkling both AMD and Intel's specific checks around,
>> introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both
>> Intel and AMD, and simplify the logic:
>>
>> 	Could the old kernel leave incoherent caches around?
> 
> 	"Is it possible that the kernel could leave caches in incoherent state?"
> 
>> 	If so, do WBINVD.
>>
>> Convert the AMD SME to use this new CC attribute.
> 
>> A later patch will
>> utilize this new attribute for Intel TDX too.
> 
> Yeah, once those are all in git, the concept of "subsequent patch"
> becomes ambiguous depending on how you're sorting them. So try to read
> that commit message out of the context of all those "subsequent patches"
> and see if it still makes sense.
> 
> IOW, you should strive for your commit messages to make sense on their
> own, without referencing other patches.
> 
> In this particular case, that "later patch" can go.
> 
>> Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu();
>> 2) relocate_kernel().  stop_this_cpu() checks the CPUID directly to do
>> WBINVD for the reason that the current kernel's SME enabling status may
>> not match the new kernel's choice.  However the relocate_kernel() only
>> does the WBINVD when the current kernel has enabled SME for the reason
>> that the new kernel is always placed in an "unencrypted" area.
>>
>> To simplify the logic, for AMD SME change to always use the way that is
>> done in stop_this_cpu().  This will cause an additional WBINVD in
>> relocate_kernel() when the current kernel hasn't enabled SME (e.g.,
>> disabled by kernel command line), but this is acceptable for the sake of
>> having less complicated code (see [1] for the relevant discussion).
>>
>> Note currently the kernel only advertises CC vendor for AMD SME when SME
>> is actually enabled by the kernel.  To always advertise the new
>> CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling
>> status, change to set CC vendor as long as the hardware has enabled SME.
>>
>> Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has
>> enabled SME" is still different from "checking the CPUID" (the way that
>> is done in stop_this_cpu()), but technically the former also serves the
>> purpose and is actually more accurate.
>>
>> Such change allows sme_me_mask to be 0 while CC vendor reports as AMD.
>> But this doesn't impact other CC attributes on AMD platforms, nor does
>> it impact the cc_mkdec()/cc_mkenc().
>>
>> [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/
>>
>> Suggested-by: Dave Hansen <dave.hansen@...el.com>
>> Signed-off-by: Kai Huang <kai.huang@...el.com>
>> ---
>>   arch/x86/coco/core.c               | 13 +++++++++++++
>>   arch/x86/kernel/machine_kexec_64.c |  2 +-
>>   arch/x86/kernel/process.c          | 14 +++-----------
>>   arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++-
>>   include/linux/cc_platform.h        | 15 +++++++++++++++
>>   5 files changed, 42 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
>> index eeec9986570e..8d6d727e6e18 100644
>> --- a/arch/x86/coco/core.c
>> +++ b/arch/x86/coco/core.c
>> @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
>>   	case CC_ATTR_HOST_MEM_ENCRYPT:
>>   		return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
>>   
>> +	case CC_ATTR_HOST_MEM_INCOHERENT:
>> +		/*
>> +		 * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
>> +		 * enabled on the platform regardless whether the kernel
>> +		 * has actually enabled the SME.
>> +
> 
> "represents whether SME has [been] enabled ... regardless whether the
> kernel has enabled SME"?!?
> 
> I think this needs to be:
> 
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index d07be9d05cd0..e3653465e532 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -67,6 +67,13 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
>   
>          switch (attr) {
>          case CC_ATTR_MEM_ENCRYPT:
> +
> +               /*
> +                * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
> +                * enabled on the platform regardless whether the kernel
> +                * has actually enabled the SME.
> +                */
> +       case CC_ATTR_HOST_MEM_INCOHERENT:
This change won't return the correct answer. The check needs to remain 
against the sev_status value.
>                  return sme_me_mask;
>   
>          case CC_ATTR_HOST_MEM_ENCRYPT:
> 
> 
>> +		return !(sev_status & MSR_AMD64_SEV_ENABLED);
>> +
>> +	/*
>> +	 * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask
>> +	 * as it must be true when there's any SEV enable bit set in
>> +	 * sev_status.
>> +	 */
> 
> Superfluous comment.
> 
>>   	case CC_ATTR_GUEST_MEM_ENCRYPT:
>>   		return sev_status & MSR_AMD64_SEV_ENABLED;
>>   
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index bc0a5348b4a6..c9c6974e2e9c 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image)
>>   				       (unsigned long)page_list,
>>   				       image->start,
>>   				       image->preserve_context,
>> -				       cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
>> +				       cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT));
>>   
>>   #ifdef CONFIG_KEXEC_JUMP
>>   	if (image->preserve_context)
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index ab49ade31b0d..2c7e8d9889c0 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy)
>>   	mcheck_cpu_clear(c);
>>   
>>   	/*
>> -	 * Use wbinvd on processors that support SME. This provides support
>> -	 * for performing a successful kexec when going from SME inactive
>> -	 * to SME active (or vice-versa). The cache must be cleared so that
>> -	 * if there are entries with the same physical address, both with and
>> -	 * without the encryption bit, they don't race each other when flushed
>> -	 * and potentially end up with the wrong entry being committed to
>> -	 * memory.
>> -	 *
>> -	 * Test the CPUID bit directly because the machine might've cleared
>> -	 * X86_FEATURE_SME due to cmdline options.
>> +	 * Use wbinvd on processors that the first kernel *could*
>> +	 * potentially leave incoherent cachelines.
> 
> No need for that comment anymore - people can grep for
> CC_ATTR_HOST_MEM_INCOHERENT's definition simply.
> 
>>   	 */
>> -	if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
>> +	if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT))
>>   		native_wbinvd();
>>   
>>   	/*
>> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
>> index 7f72472a34d6..87e4fddab770 100644
>> --- a/arch/x86/mm/mem_encrypt_identity.c
>> +++ b/arch/x86/mm/mem_encrypt_identity.c
>> @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp)
>>   		msr = __rdmsr(MSR_AMD64_SYSCFG);
>>   		if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
>>   			return;
>> +
>> +		/*
>> +		 * Always set CC vendor when the platform has SME enabled
>> +		 * regardless whether the kernel will actually activates the
>> +		 * SME or not.  This reports the CC_ATTR_HOST_MEM_INCOHERENT
>> +		 * being true as long as the platform has SME enabled so that
>> +		 * stop_this_cpu() can do necessary WBINVD during kexec().
>> +		 */
>> +		cc_vendor = CC_VENDOR_AMD;
>>   	} else {
>>   		/* SEV state cannot be controlled by a command line option */
>>   		sme_me_mask = me_mask;
>> +		cc_vendor = CC_VENDOR_AMD;
>>   		goto out;
>>   	}
>>   
> 
> So you can't put it before the if - just slap it in both branches. Geez!
I think that will still work because sme_me_mask and sev_status will both 
be 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't 
evaluate to true. However, that will cause any platform that hasn't 
enabled memory encryption (see SYS_CFG MSR), to also perform the WBINVD.
The idea looks like it was to keep existing behavior where cc_vendor 
wasn't set if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' is false.
> 
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index 0166ab1780cc..1e4566cc233f 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -549,6 +549,8 @@ void __init sme_enable(struct boot_params *bp)
>          if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
>                  snp_abort();
>   
> +       cc_vendor = CC_VENDOR_AMD;
> +
>          /* Check if memory encryption is enabled */
>          if (feature_mask == AMD_SME_BIT) {
>                  /*
> @@ -597,6 +599,5 @@ void __init sme_enable(struct boot_params *bp)
>   out:
>          RIP_REL_REF(sme_me_mask) = me_mask;
>          physical_mask &= ~me_mask;
> -       cc_vendor = CC_VENDOR_AMD;
>          cc_set_mask(me_mask);
>   }
> 
> Btw, pls do your patches ontop of tip/master as other patches in there
> are touching this file.
> 
>> @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp)
>>   out:
>>   	if (sme_me_mask) {
>>   		physical_mask &= ~sme_me_mask;
>> -		cc_vendor = CC_VENDOR_AMD;
>>   		cc_set_mask(sme_me_mask);
>>   	}
>>   }
>> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
>> index cb0d6cd1c12f..2f7273596102 100644
>> --- a/include/linux/cc_platform.h
>> +++ b/include/linux/cc_platform.h
>> @@ -42,6 +42,21 @@ enum cc_attr {
>>   	 */
>>   	CC_ATTR_HOST_MEM_ENCRYPT,
>>   
> 
> This goes to the end of the enum.
> 
>> +	/**
>> +	 * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be
>> +	 * incoherent
> 
> "...can leave caches in an incoherent state."
> 
>> +	 *
>> +	 * The platform/OS is running as a bare-metal system or a hypervisor.
>> +	 * The memory encryption engine might have left non-cache-coherent
>> +	 * data in the caches that needs to be flushed.
>> +	 *
>> +	 * Use this in places where the cache coherency of the memory matters
>> +	 * but the encryption status does not.
>> +	 *
>> +	 * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT.
> 
> If that is the case, why do you even need a new CC_ATTR define?
> 
> Might as well do:
> 
> 	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> 		native_wbinvd();
That won't work, because the current system may not have SME active. The 
cases that needs to be caught are kexec'ing from a mem_encrypt=off to a 
mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off.
For the first case, you need to determine if the system is SME capable, 
because cc_platform_has(CC_ATTR_MEM_ENCRYPT) will return false if SME is 
not active.
Thanks,
Tom
> 
> ?
> 
>> +	 */
>> +	CC_ATTR_HOST_MEM_INCOHERENT,
>> +
> 
> 
Powered by blists - more mailing lists
 
