[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9164b0e3-f520-4aab-8b80-520131f0e4db@amd.com>
Date: Mon, 19 Feb 2024 13:45:37 -0600
From: Tom Lendacky <thomas.lendacky@....com>
To: Borislav Petkov <bp@...en8.de>, "Huang, Kai" <kai.huang@...el.com>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org, dave.hansen@...el.com,
kirill.shutemov@...ux.intel.com, tglx@...utronix.de, mingo@...hat.com,
hpa@...or.com, luto@...nel.org, peterz@...radead.org, chao.gao@...el.com,
bhe@...hat.com, nik.borisov@...e.com, pbonzini@...hat.com
Subject: Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush
during kexec
On 2/19/24 10:16, Borislav Petkov wrote:
> On Wed, Jan 31, 2024 at 11:31:53AM +0000, Huang, Kai wrote:
>> From: Kai Huang <kai.huang@...el.com>
>>
>> Currently on AMD SME platforms, during kexec() caches are flushed
>> manually before jumping to the new kernel due to memory encryption.
>> Intel TDX needs to flush cachelines of TDX private memory before
>> jumping to the second kernel too, otherwise they may silently corrupt
>> the new kernel.
>>
>> Instead of sprinkling both AMD and Intel's specific checks around,
>> introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both
>> Intel and AMD, and simplify the logic:
>>
>> Could the old kernel leave incoherent caches around?
>
> "Is it possible that the kernel could leave caches in incoherent state?"
>
>> If so, do WBINVD.
>>
>> Convert the AMD SME to use this new CC attribute.
>
>> A later patch will
>> utilize this new attribute for Intel TDX too.
>
> Yeah, once those are all in git, the concept of "subsequent patch"
> becomes ambiguous depending on how you're sorting them. So try to read
> that commit message out of the context of all those "subsequent patches"
> and see if it still makes sense.
>
> IOW, you should strive for your commit messages to make sense on their
> own, without referencing other patches.
>
> In this particular case, that "later patch" can go.
>
>> Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu();
>> 2) relocate_kernel(). stop_this_cpu() checks the CPUID directly to do
>> WBINVD for the reason that the current kernel's SME enabling status may
>> not match the new kernel's choice. However the relocate_kernel() only
>> does the WBINVD when the current kernel has enabled SME for the reason
>> that the new kernel is always placed in an "unencrypted" area.
>>
>> To simplify the logic, for AMD SME change to always use the way that is
>> done in stop_this_cpu(). This will cause an additional WBINVD in
>> relocate_kernel() when the current kernel hasn't enabled SME (e.g.,
>> disabled by kernel command line), but this is acceptable for the sake of
>> having less complicated code (see [1] for the relevant discussion).
>>
>> Note currently the kernel only advertises CC vendor for AMD SME when SME
>> is actually enabled by the kernel. To always advertise the new
>> CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling
>> status, change to set CC vendor as long as the hardware has enabled SME.
>>
>> Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has
>> enabled SME" is still different from "checking the CPUID" (the way that
>> is done in stop_this_cpu()), but technically the former also serves the
>> purpose and is actually more accurate.
>>
>> Such change allows sme_me_mask to be 0 while CC vendor reports as AMD.
>> But this doesn't impact other CC attributes on AMD platforms, nor does
>> it impact the cc_mkdec()/cc_mkenc().
>>
>> [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/
>>
>> Suggested-by: Dave Hansen <dave.hansen@...el.com>
>> Signed-off-by: Kai Huang <kai.huang@...el.com>
>> ---
>> arch/x86/coco/core.c | 13 +++++++++++++
>> arch/x86/kernel/machine_kexec_64.c | 2 +-
>> arch/x86/kernel/process.c | 14 +++-----------
>> arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++-
>> include/linux/cc_platform.h | 15 +++++++++++++++
>> 5 files changed, 42 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
>> index eeec9986570e..8d6d727e6e18 100644
>> --- a/arch/x86/coco/core.c
>> +++ b/arch/x86/coco/core.c
>> @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
>> case CC_ATTR_HOST_MEM_ENCRYPT:
>> return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
>>
>> + case CC_ATTR_HOST_MEM_INCOHERENT:
>> + /*
>> + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
>> + * enabled on the platform regardless whether the kernel
>> + * has actually enabled the SME.
>> +
>
> "represents whether SME has [been] enabled ... regardless whether the
> kernel has enabled SME"?!?
>
> I think this needs to be:
>
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index d07be9d05cd0..e3653465e532 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -67,6 +67,13 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
>
> switch (attr) {
> case CC_ATTR_MEM_ENCRYPT:
> +
> + /*
> + * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
> + * enabled on the platform regardless whether the kernel
> + * has actually enabled the SME.
> + */
> + case CC_ATTR_HOST_MEM_INCOHERENT:
This change won't return the correct answer. The check needs to remain
against the sev_status value.
> return sme_me_mask;
>
> case CC_ATTR_HOST_MEM_ENCRYPT:
>
>
>> + return !(sev_status & MSR_AMD64_SEV_ENABLED);
>> +
>> + /*
>> + * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask
>> + * as it must be true when there's any SEV enable bit set in
>> + * sev_status.
>> + */
>
> Superfluous comment.
>
>> case CC_ATTR_GUEST_MEM_ENCRYPT:
>> return sev_status & MSR_AMD64_SEV_ENABLED;
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index bc0a5348b4a6..c9c6974e2e9c 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image)
>> (unsigned long)page_list,
>> image->start,
>> image->preserve_context,
>> - cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
>> + cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT));
>>
>> #ifdef CONFIG_KEXEC_JUMP
>> if (image->preserve_context)
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index ab49ade31b0d..2c7e8d9889c0 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy)
>> mcheck_cpu_clear(c);
>>
>> /*
>> - * Use wbinvd on processors that support SME. This provides support
>> - * for performing a successful kexec when going from SME inactive
>> - * to SME active (or vice-versa). The cache must be cleared so that
>> - * if there are entries with the same physical address, both with and
>> - * without the encryption bit, they don't race each other when flushed
>> - * and potentially end up with the wrong entry being committed to
>> - * memory.
>> - *
>> - * Test the CPUID bit directly because the machine might've cleared
>> - * X86_FEATURE_SME due to cmdline options.
>> + * Use wbinvd on processors that the first kernel *could*
>> + * potentially leave incoherent cachelines.
>
> No need for that comment anymore - people can grep for
> CC_ATTR_HOST_MEM_INCOHERENT's definition simply.
>
>> */
>> - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
>> + if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT))
>> native_wbinvd();
>>
>> /*
>> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
>> index 7f72472a34d6..87e4fddab770 100644
>> --- a/arch/x86/mm/mem_encrypt_identity.c
>> +++ b/arch/x86/mm/mem_encrypt_identity.c
>> @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp)
>> msr = __rdmsr(MSR_AMD64_SYSCFG);
>> if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
>> return;
>> +
>> + /*
>> + * Always set CC vendor when the platform has SME enabled
>> + * regardless whether the kernel will actually activates the
>> + * SME or not. This reports the CC_ATTR_HOST_MEM_INCOHERENT
>> + * being true as long as the platform has SME enabled so that
>> + * stop_this_cpu() can do necessary WBINVD during kexec().
>> + */
>> + cc_vendor = CC_VENDOR_AMD;
>> } else {
>> /* SEV state cannot be controlled by a command line option */
>> sme_me_mask = me_mask;
>> + cc_vendor = CC_VENDOR_AMD;
>> goto out;
>> }
>>
>
> So you can't put it before the if - just slap it in both branches. Geez!
I think that will still work because sme_me_mask and sev_status will both
be 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't
evaluate to true. However, that will cause any platform that hasn't
enabled memory encryption (see SYS_CFG MSR), to also perform the WBINVD.
The idea looks like it was to keep existing behavior where cc_vendor
wasn't set if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' is false.
>
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index 0166ab1780cc..1e4566cc233f 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -549,6 +549,8 @@ void __init sme_enable(struct boot_params *bp)
> if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
> snp_abort();
>
> + cc_vendor = CC_VENDOR_AMD;
> +
> /* Check if memory encryption is enabled */
> if (feature_mask == AMD_SME_BIT) {
> /*
> @@ -597,6 +599,5 @@ void __init sme_enable(struct boot_params *bp)
> out:
> RIP_REL_REF(sme_me_mask) = me_mask;
> physical_mask &= ~me_mask;
> - cc_vendor = CC_VENDOR_AMD;
> cc_set_mask(me_mask);
> }
>
> Btw, pls do your patches ontop of tip/master as other patches in there
> are touching this file.
>
>> @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp)
>> out:
>> if (sme_me_mask) {
>> physical_mask &= ~sme_me_mask;
>> - cc_vendor = CC_VENDOR_AMD;
>> cc_set_mask(sme_me_mask);
>> }
>> }
>> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
>> index cb0d6cd1c12f..2f7273596102 100644
>> --- a/include/linux/cc_platform.h
>> +++ b/include/linux/cc_platform.h
>> @@ -42,6 +42,21 @@ enum cc_attr {
>> */
>> CC_ATTR_HOST_MEM_ENCRYPT,
>>
>
> This goes to the end of the enum.
>
>> + /**
>> + * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be
>> + * incoherent
>
> "...can leave caches in an incoherent state."
>
>> + *
>> + * The platform/OS is running as a bare-metal system or a hypervisor.
>> + * The memory encryption engine might have left non-cache-coherent
>> + * data in the caches that needs to be flushed.
>> + *
>> + * Use this in places where the cache coherency of the memory matters
>> + * but the encryption status does not.
>> + *
>> + * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT.
>
> If that is the case, why do you even need a new CC_ATTR define?
>
> Might as well do:
>
> if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> native_wbinvd();
That won't work, because the current system may not have SME active. The
cases that needs to be caught are kexec'ing from a mem_encrypt=off to a
mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off.
For the first case, you need to determine if the system is SME capable,
because cc_platform_has(CC_ATTR_MEM_ENCRYPT) will return false if SME is
not active.
Thanks,
Tom
>
> ?
>
>> + */
>> + CC_ATTR_HOST_MEM_INCOHERENT,
>> +
>
>
Powered by blists - more mailing lists