linux-kernel - Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201211225542.GA30409@ashkalra_ubuntu_server>
Date:   Fri, 11 Dec 2020 22:55:42 +0000
From:   Ashish Kalra <ashish.kalra@....com>
To:     Brijesh Singh <brijesh.singh@....com>
Cc:     Steve Rutherford <srutherford@...gle.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Joerg Roedel <joro@...tes.org>,
        Borislav Petkov <bp@...e.de>,
        Tom Lendacky <thomas.lendacky@....com>,
        X86 ML <x86@...nel.org>, KVM list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, dovmurik@...ux.vnet.ibm.com,
        tobin@....com, jejb@...ux.ibm.com, frankeh@...ibm.com,
        dgilbert@...hat.com
Subject: Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3

Hello All,

On Tue, Dec 08, 2020 at 10:29:05AM -0600, Brijesh Singh wrote:
> 
> On 12/7/20 9:09 PM, Steve Rutherford wrote:
> > On Mon, Dec 7, 2020 at 12:42 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >> On Sun, Dec 06, 2020, Paolo Bonzini wrote:
> >>> On 03/12/20 01:34, Sean Christopherson wrote:
> >>>> On Tue, Dec 01, 2020, Ashish Kalra wrote:
> >>>>> From: Brijesh Singh <brijesh.singh@....com>
> >>>>>
> >>>>> KVM hypercall framework relies on alternative framework to patch the
> >>>>> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> >>>>> apply_alternative() is called then it defaults to VMCALL. The approach
> >>>>> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> >>>>> will be able to decode the instruction and do the right things. But
> >>>>> when SEV is active, guest memory is encrypted with guest key and
> >>>>> hypervisor will not be able to decode the instruction bytes.
> >>>>>
> >>>>> Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
> >>>>> will be used by the SEV guest to notify encrypted pages to the hypervisor.
> >>>> What if we invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL
> >>>> and opt into VMCALL?  It's a synthetic feature flag either way, and I don't
> >>>> think there are any existing KVM hypercalls that happen before alternatives are
> >>>> patched, i.e. it'll be a nop for sane kernel builds.
> >>>>
> >>>> I'm also skeptical that a KVM specific hypercall is the right approach for the
> >>>> encryption behavior, but I'll take that up in the patches later in the series.
> >>> Do you think that it's the guest that should "donate" memory for the bitmap
> >>> instead?
> >> No.  Two things I'd like to explore:
> >>
> >>   1. Making the hypercall to announce/request private vs. shared common across
> >>      hypervisors (KVM, Hyper-V, VMware, etc...) and technologies (SEV-* and TDX).
> >>      I'm concerned that we'll end up with multiple hypercalls that do more or
> >>      less the same thing, e.g. KVM+SEV, Hyper-V+SEV, TDX, etc...  Maybe it's a
> >>      pipe dream, but I'd like to at least explore options before shoving in KVM-
> >>      only hypercalls.
> >>
> >>
> >>   2. Tracking shared memory via a list of ranges instead of a using bitmap to
> >>      track all of guest memory.  For most use cases, the vast majority of guest
> >>      memory will be private, most ranges will be 2mb+, and conversions between
> >>      private and shared will be uncommon events, i.e. the overhead to walk and
> >>      split/merge list entries is hopefully not a big concern.  I suspect a list
> >>      would consume far less memory, hopefully without impacting performance.
> > For a fancier data structure, I'd suggest an interval tree. Linux
> > already has an rbtree-based interval tree implementation, which would
> > likely work, and would probably assuage any performance concerns.
> >
> > Something like this would not be worth doing unless most of the shared
> > pages were physically contiguous. A sample Ubuntu 20.04 VM on GCP had
> > 60ish discontiguous shared regions. This is by no means a thorough
> > search, but it's suggestive. If this is typical, then the bitmap would
> > be far less efficient than most any interval-based data structure.
> >
> > You'd have to allow userspace to upper bound the number of intervals
> > (similar to the maximum bitmap size), to prevent host OOMs due to
> > malicious guests. There's something nice about the guest donating
> > memory for this, since that would eliminate the OOM risk.
> 
> 
> Tracking the list of ranges may not be bad idea, especially if we use
> the some kind of rbtree-based data structure to update the ranges. It
> will certainly be better than bitmap which grows based on the guest
> memory size and as you guys see in the practice most of the pages will
> be guest private. I am not sure if guest donating a memory will cover
> all the cases, e.g what if we do a memory hotplug (increase the guest
> ram from 2GB to 64GB), will donated memory range will be enough to store
> the metadata.
> 
>. 

With reference to internal discussions regarding the above, i am going
to look into specific items as listed below :

1). "hypercall" related :
a). Explore the SEV-SNP page change request structure (included in GHCB),
see if there is something common there than can be re-used for SEV/SEV-ES
page encryption status hypercalls.
b). Explore if there is any common hypercall framework i can use in 
Linux/KVM.

2). related to the "backing" data structure - explore using a range-based
list or something like rbtree-based interval tree data structure
(as mentioned by Steve above) to replace the current bitmap based
implementation.

Thanks,
Ashish