linux-kernel - Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210513065703.GA8173@ashkalra_ubuntu_server>
Date:   Thu, 13 May 2021 06:57:03 +0000
From:   Ashish Kalra <ashish.kalra@....com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Borislav Petkov <bp@...en8.de>, pbonzini@...hat.com,
        tglx@...utronix.de, mingo@...hat.com, hpa@...or.com,
        joro@...tes.org, thomas.lendacky@....com, x86@...nel.org,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        srutherford@...gle.com, venu.busireddy@...cle.com,
        brijesh.singh@....com
Subject: Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption
 status is changed

Hello Sean,

On Wed, May 12, 2021 at 03:51:10PM +0000, Sean Christopherson wrote:
> On Wed, May 12, 2021, Borislav Petkov wrote:
> > On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
> > > +static inline void notify_page_enc_status_changed(unsigned long pfn,
> > > +						  int npages, bool enc)
> > > +{
> > > +	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
> > > +}
> > 
> > Now the question is whether something like that is needed for TDX, and,
> > if so, could it be shared by both.
> 
> Yes, TDX needs this same hook, but "can't" reuse the hypercall verbatime.  Ditto
> for SEV-SNP.  I wanted to squish everything into a single common hypercall, but
> that didn't pan out.
> 
> The problem is that both TDX and SNP define their own versions of this so that
> any guest kernel that complies with the TDX|SNP specification will run cleanly
> on a hypervisor that also complies with the spec.  This KVM-specific hook doesn't
> meet those requires because non-Linux guest support will be sketchy at best, and
> non-KVM hypervisor support will be non-existent.
> 
> The best we can do, short of refusing to support TDX or SNP, is to make this
> KVM-specific hypercall compatible with TDX and SNP so that the bulk of the
> control logic is identical.  The mechanics of actually invoking the hypercall
> will differ, but if done right, everything else should be reusable without
> modification.
> 
> I had an in-depth analysis of this, but it was all off-list.  Pasted below. 
> 
>   TDX uses GPRs to communicate with the host, so it can tunnel "legacy" hypercalls
>   from time zero.  SNP could technically do the same (with a revised GHCB spec),
>   but it'd be butt ugly.  And of course trying to go that route for either TDX or
>   SNP would run into the problem of having to coordinate the ABI for the "legacy"
>   hypercall across all guests and hosts.  So yeah, trying to remove any of the
>   three (KVM vs. SNP vs. TDX) interfaces is sadly just wishful thinking.
> 
>   That being said, I do think we can reuse the KVM specific hypercall for TDX and
>   SNP.  Both will still need a {TDX,SNP}-specific GCH{I,B} protocol so that cross-
>   vendor compatibility is guaranteed, but that shouldn't preclude a guest that is
>   KVM enlightened from switching to the KVM specific hypercall once it can do so.
>   More thoughts later on.
> 
>   > I guess a common structure could be used along the lines of what is in the
>   > GHCB spec today, but that seems like overkill for SEV/SEV-ES, which will
>   > only ever really do a single page range at a time (through
>   > set_memory_encrypted() and set_memory_decrypted()). The reason for the
>   > expanded form for SEV-SNP is that the OS can (proactively) adjust multiple
>   > page ranges in advance. Will TDX need to do something similar?
> 
>   Yes, TDX needs the exact same thing.  All three (SEV, SNP, and TDX) have more or
>   less the exact same hook in the guest (Linux, obviously) kernel.
> 
>   > If so, the only real common piece in KVM is a function to track what pages
>   > are shared vs private, which would only require a simple interface.
> 
>   It's not just KVM, it's also the relevant code in the guest kernel(s) and other
>   hypervisors.  And the MMU side of KVM will likely be able to share code, e.g. to
>   act on the page size hint.
> 
>   > So for SEV/SEV-ES, a simpler hypercall interface to specify a single page
>   > range is really all that is needed, but it must be common across
>   > hypervisors. I think that was one Sean's points originally, we don't want
>   > one hypercall for KVM, one for Hyper-V, one for VMware, one for Xen, etc.
> 
>   For the KVM defined interface (required for SEV/SEV-ES), I think it makes sense
>   to make it a superset of the SNP and TDX protocols so that it _can_ be used in
>   lieu of the SNP/TDX specific protocol.  I don't know for sure whether or not
>   that will actually yield better code and/or performance, but it costs us almost
>   nothing and at least gives us the option of further optimizing the Linux+KVM
>   combination.
> 
>   It probably shouldn't be a strict superset, as in practice I don't think SNP
>   approach of having individual entries when batching multiple pages will yield
>   the best performance.  E.g. the vast majority (maybe all?) of conversions for a
>   Linux guest will be physically contiguous and will have the same preferred page
>   size, at which point there will be less overhead if the guest specifies a
>   massive range as opposed to having to santize and fill a large buffer.
> 
>   TL;DR: I think the KVM hypercall should be something like this, so that it can
>   be used for SNP and TDX, and possibly for other purposes, e.g. for paravirt
>   performance enhancements or something.
> 
>     8. KVM_HC_MAP_GPA_RANGE
>     -----------------------
>     :Architecture: x86
>     :Status: active
>     :Purpose: Request KVM to map a GPA range with the specified attributes.
> 
>     a0: the guest physical address of the start page
>     a1: the number of (4kb) pages (must be contiguous in GPA space)
>     a2: attributes
> 
>   where 'attributes' could be something like:
> 
>     bits  3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
>     bit     4 - plaintext = 0, encrypted = 1
>     bits 63:5 - reserved (must be zero)
> 

Ok. Will modify page encryption status hypercall to be compatible with
the above defined interface.

Thanks,
Ashish