[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180904223200.GA7248@linux.intel.com>
Date: Tue, 4 Sep 2018 15:32:00 -0700
From: Sean Christopherson <sean.j.christopherson@...el.com>
To: Brijesh Singh <brijesh.singh@....com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Tom Lendacky <thomas.lendacky@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...e.de>, "H. Peter Anvin" <hpa@...or.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Radim Krčmář <rkrcmar@...hat.com>
Subject: Re: [PATCH v4 4/4] x86/kvm: use __decrypted attribute in shared
variables
On Mon, Sep 03, 2018 at 08:29:42PM -0500, Brijesh Singh wrote:
> Commit: 368a540e0232 (x86/kvmclock: Remove memblock dependency)
> caused SEV guest regression. When SEV is active, we map the shared
> variables (wall_clock and hv_clock_boot) with C=0 to ensure that both
> the guest and the hypervisor are able to access the data. To map the
> variables we use kernel_physical_mapping_init() to split the large pages,
> but splitting large pages requires allocating a new PMD, which fails now
> that kvmclock initialization is called early during boot.
>
> Recently we added a special .data..decrypted section to hold the shared
> variables. This section is mapped with C=0 early during boot. Use
> __decrypted attribute to put the wall_clock and hv_clock_boot in
> .data..decrypted section so that they are mapped with C=0.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@....com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@....com>
> Fixes: 368a540e0232 ("x86/kvmclock: Remove memblock dependency")
> Cc: Tom Lendacky <thomas.lendacky@....com>
> Cc: kvm@...r.kernel.org
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Borislav Petkov <bp@...e.de>
> Cc: "H. Peter Anvin" <hpa@...or.com>
> Cc: linux-kernel@...r.kernel.org
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Cc: Sean Christopherson <sean.j.christopherson@...el.com>
> Cc: kvm@...r.kernel.org
> Cc: "Radim Krčmář" <rkrcmar@...hat.com>
> ---
> arch/x86/kernel/kvmclock.c | 30 +++++++++++++++++++++++++-----
> 1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 1e67646..08f5f8a 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -28,6 +28,7 @@
> #include <linux/sched/clock.h>
> #include <linux/mm.h>
> #include <linux/slab.h>
> +#include <linux/set_memory.h>
>
> #include <asm/hypervisor.h>
> #include <asm/mem_encrypt.h>
> @@ -61,8 +62,8 @@ early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
> (PAGE_SIZE / sizeof(struct pvclock_vsyscall_time_info))
>
> static struct pvclock_vsyscall_time_info
> - hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __aligned(PAGE_SIZE);
> -static struct pvclock_wall_clock wall_clock;
> + hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
> +static struct pvclock_wall_clock wall_clock __decrypted;
> static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
>
> static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
> @@ -267,10 +268,29 @@ static int kvmclock_setup_percpu(unsigned int cpu)
> return 0;
>
> /* Use the static page for the first CPUs, allocate otherwise */
> - if (cpu < HVC_BOOT_ARRAY_SIZE)
> + if (cpu < HVC_BOOT_ARRAY_SIZE) {
> p = &hv_clock_boot[cpu];
> - else
> - p = kzalloc(sizeof(*p), GFP_KERNEL);
> + } else {
> + int rc;
> + unsigned int sz = sizeof(*p);
> +
> + if (sev_active())
> + sz = PAGE_ALIGN(sz);
Hmm, again we're wasting a fairly sizable amount of memory since each
CPU is doing a separate 4k allocation. What if we defined an auxilary
array in __decrypted to be used for cpus > HVC_BOOT_ARRAY_SIZE when
SEV is active? struct pvclock_vsyscall_time_info is 32 bytes so we
could handle the max of 8192 CPUs with 256kb of data (252kb if you
subtract the pre-existing 4k page), i.e. the SEV case wouldn't need
additional memory beyond the 2mb page that's reserved for __decrypted.
The non-SEV case could do free_kernel_image_pages() on the unused
array (which would need to be page sized) so it wouldn't waste memory.
> +
> + p = kzalloc(sz, GFP_KERNEL);
> +
> + /*
> + * The physical address of per-cpu variable will be shared with
> + * the hypervisor. Let's clear the C-bit before we assign the
> + * memory to per_cpu variable.
> + */
> + if (p && sev_active()) {
> + rc = set_memory_decrypted((unsigned long)p, sz >> PAGE_SHIFT);
> + if (rc)
@p is being leaked if set_memory_decrypted() fails.
> + return rc;
> + memset(p, 0, sz);
> + }
> + }
>
> per_cpu(hv_clock_per_cpu, cpu) = p;
> return p ? 0 : -ENOMEM;
> --
> 2.7.4
>
Powered by blists - more mailing lists