lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <43c95311-dd8b-77c7-0949-28fc3f961bef@amd.com>
Date:   Thu, 13 Sep 2018 12:22:34 -0500
From:   Brijesh Singh <brijesh.singh@....com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     brijesh.singh@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Tom Lendacky <thomas.lendacky@....com>,
        Borislav Petkov <bp@...e.de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Radim Krčmář <rkrcmar@...hat.com>
Subject: Re: [PATCH v7 0/5] x86: Fix SEV guest regression



On 09/13/2018 11:22 AM, Thomas Gleixner wrote:
> On Mon, 10 Sep 2018, Brijesh Singh wrote:
>> x86/kvmclock: Remove memblock dependency
>> introduced SEV guest regression.
>>
>> The guest physical address holding the wall_clock and hv_clock_boot
>> are shared with the hypervisor must be mapped with C=0 when SEV
>> is active. To clear the C-bit we use  kernel_physical_mapping_init() to
>> split the large pages. The above commit moved the kvmclock initialization
>> very early and kernel_physical_mapping_init() fails to allocate memory
>> while spliting the large page.
>>
>> To solve it, we add a special .data..decrypted section, this section can be
>> used to hold the shared variables. Early boot code maps this section with
>> C=0. The section is pmd aligned and sized to avoid the need to split the pages.
>> Caller can use __decrypted attribute to add the variables in .data..decrypted
>> section.
> 
> I went through this patch series and to be honest this is in no way -rc3+
> material.
> 
> I really have to ask why this whole change to sme_encrypt_kernel() is
> required at all.


The main reason behind making the changes in sme_encrypt_kernel() was
to perverse the initialized values (if any). Since we are adding
  __decrypted attribute hence I thought it would make sense to support
the usecase where in future someone may use this attribute on 
initialized variables.

Now the question is, do we really need this to fix the regression? the
answer is NO. Since there is only one user of this new attribute and
lucky it does not initialize the variables hence it is safe to move the
variable in .bss..decrypted section and memset(0).

> 
> Lets look at the problem at hand:
> 
>    We need exactly TWO pages to be shareable at early boot when the
>    kernel runs in a guest and wants to utilize kvmclock.
> 
>    No, we do not need the whole NR_CPUS sized thing at that point at
>    all. The secondary CPUs are brought up way later and there is absolutely
>    no point in having the ugly decrypted..aux hack. That conditional freeing
>    is really horrible and prone to break sooner than later.
> 
>    Further the TWO pages are BSS pages which does not require to handle them
>    special other than fixing up the PMD and doing a memset(0) on the
>    actually utilized TWO pages.
> 
> We really can enforce that such decrypted variables have to be in the BSS
> for now and revisit this when there is an absolute requirement to have
> statically initialized ones, which I doubt we have.
> 
> So the way simpler solution is:
> 
> 1) Add a .bss..decrypted section which is PMD aligned and mark the
>     kvmclock hv_clock_boot and wallclock struct with __bss_decrypted.
>  > 2) Fixup the .bss..decrypted section PMD in the SEV case at the end of
>     sme_encrypt_kernel() and do a memset(0) on that. That does not require
>     any of the restructuring, really.
> 

Somewhere during the discussion, I was asked to make sure that
  __decrypted attribute can be used by others in future and don't tie it
with just the kvmclock.

To fix the regression we don't need to have this complexity. I am okay
to implement your proposal. I would like to fix this regression sooner
than later ;)


> 3) Have a function which is called after the page allocator is up which
>     does:
> 

I am glad you are pointing this one. In my initial patch I was doing a
kmalloc() of page-size, during review we did discussed to do allocation
once using num_possible_cpus()[like what you have proposed]. But then
discussion moved towards saving memory and that when the secondary array
concept came into the picture. Since .data..decrypted has some unused 
memory hence we were getting creative in reusing it.



> static struct pvclock_vsyscall_time_info *hvclock_mem;
> 
> void __init kvmclock_init_mem(void)
> {
> 	unsigned int ncpus = num_possible_cpus() - HVC_BOOT_ARRAY_SIZE;
> 	unsigned int order = get_order(ncpus * sizeof(*hvclock_mem));
> 	struct page *p;
> 
> 	p = alloc_pages(order, GFP_KERNEL);
> 	if (p) {
> 		hvclock_mem = page_address(p);
> 		set_memory_decrypted((unsigned long) hvclock_mem,
> 				     PAGE_SIZE << order);
> 	}
> }
> 
> 4) In kvmclock_setup_percpu() do the following:
> 
>          if (cpu < HVC_BOOT_ARRAY_SIZE)
>                  p = &hv_clock_boot[cpu];
> 	else if (hvclock_mem)
> 		p = hvclock_mem + cpu - HVC_BOOT_ARRAY_SIZE;
> 	else
> 		return -ENOMEM;
> 
> 5) Unconditionally free the pages after the used .bbs..decrypted variables,
>     so the unused rest of the 2M page is not lost.
> 
> That should be a halfways slim sized and non instrusive changeset.
> 

I am okay to implement and test your recommendation, if anyone
disagree then please let me know. thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ