linux-kernel - Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210402110924.GA17630@ashkalra_ubuntu_server>
Date:   Fri, 2 Apr 2021 11:09:24 +0000
From:   Ashish Kalra <ashish.kalra@....com>
To:     Steve Rutherford <srutherford@...gle.com>
Cc:     Will Deacon <will@...nel.org>,
        Sean Christopherson <seanjc@...gle.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "joro@...tes.org" <joro@...tes.org>,
        "Lendacky, Thomas" <Thomas.Lendacky@....com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "venu.busireddy@...cle.com" <venu.busireddy@...cle.com>,
        "Singh, Brijesh" <brijesh.singh@....com>,
        Quentin Perret <qperret@...gle.com>, maz@...nel.org
Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST
 ioctl

Hello Steve,

On Thu, Apr 01, 2021 at 06:40:06PM -0700, Steve Rutherford wrote:
> On Fri, Mar 19, 2021 at 11:00 AM Ashish Kalra <ashish.kalra@....com> wrote:
> >
> > On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote:
> > > On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra <ashish.kalra@....com> wrote:
> > > >
> > > > On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote:
> > > > > [+Marc]
> > > > >
> > > > > On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> > > > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@....com> wrote:
> > > > > > > > > Thanks for grabbing the data!
> > > > > > > > >
> > > > > > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > > > > > hypercall exiting, so I think that would be the current consensus.
> > > > > > >
> > > > > > > Yep, though it'd be good to get Paolo's input, too.
> > > > > > >
> > > > > > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > > > > > series where we implement something more generic, e.g. a hypercall
> > > > > > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > > > > > exit route, we can drop the kvm side of the hypercall.
> > > > > > >
> > > > > > > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > > > > > > rather, I think hypercall interception should be an orthogonal implementation.
> > > > > > >
> > > > > > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > > > > > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > > > > > > implement a common ABI is an unnecessary risk.
> > > > > > >
> > > > > > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > > > > > require further VMM intervention.  But, I just don't see the point, it would
> > > > > > > save only a few lines of code.  It would also limit what KVM could do in the
> > > > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > > > > > then mandatory interception would essentially make it impossible for KVM to do
> > > > > > > bookkeeping while still honoring the interception request.
> > > > > > >
> > > > > > > However, I do think it would make sense to have the userspace exit be a generic
> > > > > > > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > > > > > > just not used anywhere.
> > > > > > >
> > > > > > >   /* KVM_EXIT_HYPERCALL */
> > > > > > >   struct {
> > > > > > >           __u64 nr;
> > > > > > >           __u64 args[6];
> > > > > > >           __u64 ret;
> > > > > > >           __u32 longmode;
> > > > > > >           __u32 pad;
> > > > > > >   } hypercall;
> > > > > > >
> > > > > > >
> > > > > > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > > > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > > > > > >
> > > > > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > > > > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > > > > > that will result in truly heinous guest code, and extensibility issues.
> > > > > > >
> >
> > We may also need to pass-through the MSR to userspace, as it is a part of this
> > complete host (userspace/kernel), OVMF and guest kernel negotiation of
> > the SEV live migration feature.
> >
> > Host (userspace/kernel) advertises it's support for SEV live migration
> > feature via the CPUID bits, which is queried by OVMF and which in turn
> > adds a new UEFI runtime variable to indicate support for SEV live
> > migration, which is later queried during guest kernel boot and
> > accordingly the guest does a wrmrsl() to custom MSR to complete SEV
> > live migration negotiation and enable it.
> >
> > Now, the GET_SHARED_REGION_LIST ioctl returns error, until this MSR write
> > enables SEV live migration, hence, preventing userspace to start live
> > migration before the feature support has been negotiated and enabled on
> > all the three components - host, guest OVMF and kernel.
> >
> > But, now with this ioctl not existing anymore, we will need to
> > pass-through the MSR to userspace too, for it to only initiate live
> > migration once the feature negotiation has been completed.
> 
> I can't tell if you were waiting for feedback on this before posting
> the follow-up patch series.

Actually, i am going to post the follow-up patch series upstream early
next week. 

I have already added support for MSR handling and exit to userspace, the
current implementation looks like this :

The custom MSR is hooked both in svm_get_msr() and svm_set_msr():

@@ -2800,6 +2800,17 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
        case MSR_F10H_DECFG:
                msr_info->data = svm->msr_decfg;
                break;
+       case MSR_KVM_SEV_LIVE_MIGRATION:
+               if (!sev_guest(vcpu->kvm))
+                       return 1;
+
+               if (!guest_cpuid_has(vcpu, KVM_FEATURE_SEV_LIVE_MIGRATION))
+                       return 1;
+
+               /*
+                * Let userspace handle the MSR using MSR filters.
+                */
+               return KVM_MSR_RET_FILTERED;

So there is special check added for both sev_guest() and requisite CPUID
bit set in guest CPUID, if either fails this will signal #GP to guest.

Otherwise, it returns MSR_FILTER return code, which will allow userspace
to use msr intercepts to handle the reads and writes via userspace exits
using KVM_EXIT_X86_RDMSR/KVM_EXIT_X86_WRMSR.

Let me know if you have any feedback/comments on the above handling.

Thanks,
Ashish

> 
> Here are a few options:
> 1) Add the MSR explicitly to the list of custom kvm MSRs, but don't
> have it hooked up anywhere. The expectation would be for the VMM to
> use msr intercepts to handle the reads and writes. If that seems
> weird, have svm_set_msr (or whatever) explicitly ignore it.
> 2) Add a getter and setter for the MSR. Only allow guests to use it if
> they are sev_guests with the requisite CPUID bit set.
> 
> I think I prefer the former, and it should work fine from my
> understanding of the msr intercepts implementation. I'm also open to
> other ideas. You could also have the MSR write trigger a KVM_EXIT of
> the same type as the hypercall, but have it just say "the msr value
> changed to XYZ", but that design sounds awkward.
> 
> Thanks,
> Steve