linux-kernel - Re: [PATCH v7 0/5] x86: Fix SEV guest regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1809131622180.1473@nanos.tec.linutronix.de>
Date:   Thu, 13 Sep 2018 18:22:17 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Brijesh Singh <brijesh.singh@....com>
cc:     x86@...nel.org, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Tom Lendacky <thomas.lendacky@....com>,
        Borislav Petkov <bp@...e.de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Radim Krčmář <rkrcmar@...hat.com>
Subject: Re: [PATCH v7 0/5] x86: Fix SEV guest regression

On Mon, 10 Sep 2018, Brijesh Singh wrote:
> x86/kvmclock: Remove memblock dependency
> introduced SEV guest regression.
> 
> The guest physical address holding the wall_clock and hv_clock_boot
> are shared with the hypervisor must be mapped with C=0 when SEV
> is active. To clear the C-bit we use  kernel_physical_mapping_init() to
> split the large pages. The above commit moved the kvmclock initialization
> very early and kernel_physical_mapping_init() fails to allocate memory
> while spliting the large page.
> 
> To solve it, we add a special .data..decrypted section, this section can be
> used to hold the shared variables. Early boot code maps this section with
> C=0. The section is pmd aligned and sized to avoid the need to split the pages.
> Caller can use __decrypted attribute to add the variables in .data..decrypted
> section. 

I went through this patch series and to be honest this is in no way -rc3+
material.

I really have to ask why this whole change to sme_encrypt_kernel() is
required at all.

Lets look at the problem at hand:

  We need exactly TWO pages to be shareable at early boot when the
  kernel runs in a guest and wants to utilize kvmclock.

  No, we do not need the whole NR_CPUS sized thing at that point at
  all. The secondary CPUs are brought up way later and there is absolutely
  no point in having the ugly decrypted..aux hack. That conditional freeing
  is really horrible and prone to break sooner than later.

  Further the TWO pages are BSS pages which does not require to handle them
  special other than fixing up the PMD and doing a memset(0) on the
  actually utilized TWO pages.

We really can enforce that such decrypted variables have to be in the BSS
for now and revisit this when there is an absolute requirement to have
statically initialized ones, which I doubt we have.

So the way simpler solution is:

1) Add a .bss..decrypted section which is PMD aligned and mark the
   kvmclock hv_clock_boot and wallclock struct with __bss_decrypted.

2) Fixup the .bss..decrypted section PMD in the SEV case at the end of
   sme_encrypt_kernel() and do a memset(0) on that. That does not require
   any of the restructuring, really.

3) Have a function which is called after the page allocator is up which
   does:

static struct pvclock_vsyscall_time_info *hvclock_mem;

void __init kvmclock_init_mem(void)
{
	unsigned int ncpus = num_possible_cpus() - HVC_BOOT_ARRAY_SIZE;
	unsigned int order = get_order(ncpus * sizeof(*hvclock_mem));
	struct page *p;

	p = alloc_pages(order, GFP_KERNEL);
	if (p) {
		hvclock_mem = page_address(p);
		set_memory_decrypted((unsigned long) hvclock_mem,
				     PAGE_SIZE << order);
	}
}

4) In kvmclock_setup_percpu() do the following:

        if (cpu < HVC_BOOT_ARRAY_SIZE)
                p = &hv_clock_boot[cpu];
	else if (hvclock_mem)
		p = hvclock_mem + cpu - HVC_BOOT_ARRAY_SIZE;
	else
		return -ENOMEM;

5) Unconditionally free the pages after the used .bbs..decrypted variables,
   so the unused rest of the 2M page is not lost.

That should be a halfways slim sized and non instrusive changeset.

Thanks,

	tglx