linux-kernel - Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YJv5bjd0xThIahaa@google.com>
Date:   Wed, 12 May 2021 15:51:10 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Ashish Kalra <Ashish.Kalra@....com>, pbonzini@...hat.com,
        tglx@...utronix.de, mingo@...hat.com, hpa@...or.com,
        joro@...tes.org, thomas.lendacky@....com, x86@...nel.org,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        srutherford@...gle.com, venu.busireddy@...cle.com,
        brijesh.singh@....com
Subject: Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption
 status is changed

On Wed, May 12, 2021, Borislav Petkov wrote:
> On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
> > +static inline void notify_page_enc_status_changed(unsigned long pfn,
> > +						  int npages, bool enc)
> > +{
> > +	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
> > +}
> 
> Now the question is whether something like that is needed for TDX, and,
> if so, could it be shared by both.

Yes, TDX needs this same hook, but "can't" reuse the hypercall verbatime.  Ditto
for SEV-SNP.  I wanted to squish everything into a single common hypercall, but
that didn't pan out.

The problem is that both TDX and SNP define their own versions of this so that
any guest kernel that complies with the TDX|SNP specification will run cleanly
on a hypervisor that also complies with the spec.  This KVM-specific hook doesn't
meet those requires because non-Linux guest support will be sketchy at best, and
non-KVM hypervisor support will be non-existent.

The best we can do, short of refusing to support TDX or SNP, is to make this
KVM-specific hypercall compatible with TDX and SNP so that the bulk of the
control logic is identical.  The mechanics of actually invoking the hypercall
will differ, but if done right, everything else should be reusable without
modification.

I had an in-depth analysis of this, but it was all off-list.  Pasted below. 

  TDX uses GPRs to communicate with the host, so it can tunnel "legacy" hypercalls
  from time zero.  SNP could technically do the same (with a revised GHCB spec),
  but it'd be butt ugly.  And of course trying to go that route for either TDX or
  SNP would run into the problem of having to coordinate the ABI for the "legacy"
  hypercall across all guests and hosts.  So yeah, trying to remove any of the
  three (KVM vs. SNP vs. TDX) interfaces is sadly just wishful thinking.

  That being said, I do think we can reuse the KVM specific hypercall for TDX and
  SNP.  Both will still need a {TDX,SNP}-specific GCH{I,B} protocol so that cross-
  vendor compatibility is guaranteed, but that shouldn't preclude a guest that is
  KVM enlightened from switching to the KVM specific hypercall once it can do so.
  More thoughts later on.

  > I guess a common structure could be used along the lines of what is in the
  > GHCB spec today, but that seems like overkill for SEV/SEV-ES, which will
  > only ever really do a single page range at a time (through
  > set_memory_encrypted() and set_memory_decrypted()). The reason for the
  > expanded form for SEV-SNP is that the OS can (proactively) adjust multiple
  > page ranges in advance. Will TDX need to do something similar?

  Yes, TDX needs the exact same thing.  All three (SEV, SNP, and TDX) have more or
  less the exact same hook in the guest (Linux, obviously) kernel.

  > If so, the only real common piece in KVM is a function to track what pages
  > are shared vs private, which would only require a simple interface.

  It's not just KVM, it's also the relevant code in the guest kernel(s) and other
  hypervisors.  And the MMU side of KVM will likely be able to share code, e.g. to
  act on the page size hint.

  > So for SEV/SEV-ES, a simpler hypercall interface to specify a single page
  > range is really all that is needed, but it must be common across
  > hypervisors. I think that was one Sean's points originally, we don't want
  > one hypercall for KVM, one for Hyper-V, one for VMware, one for Xen, etc.

  For the KVM defined interface (required for SEV/SEV-ES), I think it makes sense
  to make it a superset of the SNP and TDX protocols so that it _can_ be used in
  lieu of the SNP/TDX specific protocol.  I don't know for sure whether or not
  that will actually yield better code and/or performance, but it costs us almost
  nothing and at least gives us the option of further optimizing the Linux+KVM
  combination.

  It probably shouldn't be a strict superset, as in practice I don't think SNP
  approach of having individual entries when batching multiple pages will yield
  the best performance.  E.g. the vast majority (maybe all?) of conversions for a
  Linux guest will be physically contiguous and will have the same preferred page
  size, at which point there will be less overhead if the guest specifies a
  massive range as opposed to having to santize and fill a large buffer.

  TL;DR: I think the KVM hypercall should be something like this, so that it can
  be used for SNP and TDX, and possibly for other purposes, e.g. for paravirt
  performance enhancements or something.

    8. KVM_HC_MAP_GPA_RANGE
    -----------------------
    :Architecture: x86
    :Status: active
    :Purpose: Request KVM to map a GPA range with the specified attributes.

    a0: the guest physical address of the start page
    a1: the number of (4kb) pages (must be contiguous in GPA space)
    a2: attributes

  where 'attributes' could be something like:

    bits  3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
    bit     4 - plaintext = 0, encrypted = 1
    bits 63:5 - reserved (must be zero)