[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SN6PR2101MB1693C405F2DA48F078D38CF3D7CD9@SN6PR2101MB1693.namprd21.prod.outlook.com>
Date: Sat, 28 Jan 2023 19:26:05 +0000
From: "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To: Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
Tianyu Lan <Tianyu.Lan@...rosoft.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
Brijesh Singh <brijesh.singh@....com>,
Michael Roth <michael.roth@....com>,
Ashish Kalra <ashish.kalra@....com>,
Tom Lendacky <thomas.lendacky@....com>
Subject: RE: [RFC PATCH v1 1/6] x86/hyperv: Allocate RMP table during boot
From: Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com> Sent: Monday, January 23, 2023 8:51 AM
>
> Hyper-V VMs can be capable of hosting SNP isolated nested VMs on AMD
> CPUs. One of the pieces of SNP is the RMP (Reverse Map) table which
> tracks page assignment to firmware, hypervisor or guest. On bare-metal
> this table is allocated by UEFI, but on Hyper-V it is the respnsibility
s/respnsibility/responsibility/
> of the OS to allocate one if necessary. The nested_feature
> 'HV_X64_NESTED_NO_RMP_TABLE' will be set to communicate that no rmp is
> available. The actual RMP table is exclusively controlled by the Hyper-V
> hypervisor and is not virtualized to the VM. The SNP code in the kernel
> uses the RMP table for its own tracking and so it is necessary for init
> code to allocate one.
>
> While not strictly necessary, follow the requirements defined by "SEV
> Secure Nested Paging Firmware ABI Specification" Rev 1.54, section 8.8.2
> when allocating the RMP:
>
> - RMP_BASE and RMP_END must be set identically across all cores.
> - RMP_BASE must be 1 MB aligned
> - RMP_END – RMP_BASE + 1 must be a multiple of 1 MB
> - RMP is large enough to protect itself
>
> The allocation is done in the init_mem_mapping() hook, which is the
> earliest hook I found that has both max_pfn and memblock initialized. At
> this point we are still under the
> memblock_set_current_limit(ISA_END_ADDRESS) condition, but explicitly
> passing the end to memblock_phys_alloc_range() allows us to allocate
> past that value.
>
> Signed-off-by: Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>
> ---
> arch/x86/hyperv/hv_init.c | 5 ++++
> arch/x86/include/asm/hyperv-tlfs.h | 3 +++
> arch/x86/include/asm/mshyperv.h | 3 +++
> arch/x86/include/asm/sev.h | 2 ++
> arch/x86/kernel/cpu/mshyperv.c | 41 ++++++++++++++++++++++++++++++
> arch/x86/kernel/sev.c | 1 -
> 6 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 29774126e931..e7f5ac075e6d 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -117,6 +117,11 @@ static int hv_cpu_init(unsigned int cpu)
> }
> }
>
> + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && hv_needs_snp_rmp()) {
Could the IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) condition be
folded into the implementation of hv_needs_snp_rmp() so that only one
test is needed?
> + wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
> + wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
> + }
> +
> return hyperv_init_ghcb();
> }
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index e3efaf6e6b62..01cc2c3f9f20 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -152,6 +152,9 @@
> */
> #define HV_X64_NESTED_ENLIGHTENED_TLB BIT(22)
>
> +/* Nested SNP on Hyper-V */
> +#define HV_X64_NESTED_NO_RMP_TABLE BIT(23)
> +
Just for my clarification, is this flag always set in an SNP guest when
running on a version of Hyper-V that supports nested SNP? I'm
presuming "yes". But there may be older versions of Hyper-V
that support SNP guests, but not nested SNP guests, in which case
this flag would be clear.
> /* HYPERV_CPUID_ISOLATION_CONFIG.EAX bits. */
> #define HV_PARAVISOR_PRESENT BIT(0)
>
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 61f0c206bff0..3533b002cede 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -190,6 +190,9 @@ static inline void hv_ghcb_terminate(unsigned int set, unsigned
> int reason) {}
>
> extern bool hv_isolation_type_snp(void);
>
> +extern struct resource rmp_res;
> +bool hv_needs_snp_rmp(void);
> +
> static inline bool hv_is_synic_reg(unsigned int reg)
> {
> if ((reg >= HV_REGISTER_SCONTROL) &&
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 2916f4150ac7..db5438663229 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -83,6 +83,8 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
> /* RMUPDATE detected 4K page and 2MB page overlap. */
> #define RMPUPDATE_FAIL_OVERLAP 7
>
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> +
> /* RMP page size */
> #define RMP_PG_SIZE_4K 0
> #define RMP_PG_SIZE_2M 1
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 831613959a92..e7f02412f3a1 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -17,6 +17,7 @@
> #include <linux/irq.h>
> #include <linux/kexec.h>
> #include <linux/i8253.h>
> +#include <linux/memblock.h>
> #include <linux/random.h>
> #include <linux/swiotlb.h>
> #include <asm/processor.h>
> @@ -31,6 +32,7 @@
> #include <asm/timer.h>
> #include <asm/reboot.h>
> #include <asm/nmi.h>
> +#include <asm/sev.h>
> #include <clocksource/hyperv_timer.h>
> #include <asm/numa.h>
> #include <asm/coco.h>
> @@ -488,6 +490,44 @@ static bool __init ms_hyperv_msi_ext_dest_id(void)
> return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
> }
>
> +struct resource rmp_res = {
> + .name = "RMP",
> + .start = 0,
> + .end = 0,
> + .flags = IORESOURCE_SYSTEM_RAM,
> +};
> +
> +bool hv_needs_snp_rmp(void)
> +{
> + return boot_cpu_has(X86_FEATURE_SEV_SNP) &&
> + (ms_hyperv.nested_features & HV_X64_NESTED_NO_RMP_TABLE);
> +}
> +
> +
> +static void __init ms_hyperv_init_mem_mapping(void)
> +{
> + phys_addr_t addr;
> + u64 calc_rmp_sz;
> +
> + if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
> + return;
> + if (!hv_needs_snp_rmp())
> + return;
Another case where it would be cleaner if all the
conditions could be folded into hv_needs_snp_rmp().
> +
> + calc_rmp_sz = (max_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> + calc_rmp_sz = round_up(calc_rmp_sz, SZ_1M);
> + addr = memblock_phys_alloc_range(calc_rmp_sz, SZ_1M, 0, max_pfn << PAGE_SHIFT);
> + if (!addr) {
> + pr_warn("Unable to allocate RMP table\n");
> + return;
> + }
> + rmp_res.start = addr;
> + rmp_res.end = addr + calc_rmp_sz - 1;
> + wrmsrl(MSR_AMD64_RMP_BASE, rmp_res.start);
> + wrmsrl(MSR_AMD64_RMP_END, rmp_res.end);
> + insert_resource(&iomem_resource, &rmp_res);
> +}
> +
> const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
> .name = "Microsoft Hyper-V",
> .detect = ms_hyperv_platform,
> @@ -495,4 +535,5 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
> .init.x2apic_available = ms_hyperv_x2apic_available,
> .init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id,
> .init.init_platform = ms_hyperv_init_platform,
> + .init.init_mem_mapping = ms_hyperv_init_mem_mapping,
On versions of Hyper-V that support nested SNP guests, it appears that every
L1 SNP guest will allocate memory for the RMP, even if it never runs an L2 guest.
The amount of memory allocated is 16 bytes per 4K page, or 0.4% of the total
memory size of the L1 VM. In most cases the memory will be unused. For
example, that works out to be 256 Mbytes in a 64 Gbyte VM, which to me is
a fairly big chunk of memory to waste, even though the percentage isn't huge.
Should we have a CONFIG option that controls whether the RMP is allocated?
L1 guests that intend to run their own L2 guests would need to be built with
this CONFIG option. Presumably the allocation must be done early to ensure
the ability to get a big chunk of contiguous physical memory. Allocating
the RMP later only if it is needed isn't an option.
> };
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 1dd1b36bdfea..7fa39dc17edd 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -87,7 +87,6 @@ struct rmpentry {
> * The first 16KB from the RMP_BASE is used by the processor for the
> * bookkeeping, the range needs to be added during the RMP entry lookup.
> */
> -#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> #define RMPENTRY_SHIFT 8
> #define rmptable_page_offset(x) (RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))
>
> --
> 2.25.1
Powered by blists - more mailing lists