[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZkN25BPuLtTUmDKk@google.com>
Date: Tue, 14 May 2024 07:36:20 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Michael Roth <michael.roth@....com>
Cc: kvm@...r.kernel.org, linux-coco@...ts.linux.dev, linux-mm@...ck.org,
linux-crypto@...r.kernel.org, x86@...nel.org, linux-kernel@...r.kernel.org,
tglx@...utronix.de, mingo@...hat.com, jroedel@...e.de,
thomas.lendacky@....com, hpa@...or.com, ardb@...nel.org, pbonzini@...hat.com,
vkuznets@...hat.com, jmattson@...gle.com, luto@...nel.org,
dave.hansen@...ux.intel.com, slp@...hat.com, pgonda@...gle.com,
peterz@...radead.org, srinivas.pandruvada@...ux.intel.com,
rientjes@...gle.com, dovmurik@...ux.ibm.com, tobin@....com, bp@...en8.de,
vbabka@...e.cz, kirill@...temov.name, ak@...ux.intel.com, tony.luck@...el.com,
sathyanarayanan.kuppuswamy@...ux.intel.com, alpergun@...gle.com,
jarkko@...nel.org, ashish.kalra@....com, nikunj.dadhania@....com,
pankaj.gupta@....com, liam.merwick@...cle.com
Subject: Re: [PATCH v15 19/20] KVM: SEV: Provide support for
SNP_EXTENDED_GUEST_REQUEST NAE event
On Mon, May 13, 2024, Michael Roth wrote:
> On Mon, May 13, 2024 at 04:48:25PM -0700, Sean Christopherson wrote:
> > Actually, I take that back, this isn't even an optimization, it's literally a
> > non-generic implementation of kvm_run.immediate_exit.
>
> Relying on a generic -EINTR response resulting from kvm_run.immediate_exit
> doesn't seem like a very robust way to ensure the attestation request
> was made to firmware. It seems fully possible that future code changes
> could result in EINTR being returned for other reasons. So how do you
> reliably detect that the EINTR is a result of immediate_exit being called
> after the attestation request is made to firmware? We could squirrel something
> away in struct kvm_run to probe for, but delivering another
> KVM_EXIT_SNP_REQ_CERT with an extra flag set seems to be reasonably
> userspace-friendly.
And unnecessarily specific to a single exit. But it's a non-issue (except
possibly on ARM).
I doubt it's formally documented anywhere, but userspace absolutely relies on
kvm_run.immediate_exit to be processed _after_ complete_userspace_io(). If KVM
exits with -EINTR before invoking cui(), live migration will break due to taking
a snapshot of vCPU state in the middle of an instruction.
Given that userspace has likely built up rigid expectations for immediate_exit,
I don't see any problem formally documenting KVM's behavior, i.e. signing a
contract guaranteeing that KVM will complete the "back half" of emulation if
immediate_exit is set and KVM_RUN return -EINTR.
ARM is the only arch that is at all suspect, due to its rather massive
kvm_arch_vcpu_run_pid_change() hook. At a quick glance, it seems to be ok, too.
And if it's not, we need to fix that asap, because it's like a bug waiting to
happen.
> > If this were an optimization, i.e. KVM truly notified userspace without exiting,
> > then it would need to be a lot more robust, e.g. to ensure userspace actually
> > received the notification before KVM moved on.
>
> Right, this does rely on exiting via , not userspace polling for flags or
> anything along that line.
>
> >
> > > + __u8 flags;
> > > + #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING 0
> > > + #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE 1
> >
> > This is also a weird reimplementation of generic functionality. KVM nullifies
> > vcpu->arch.complete_userspace_io _before_ invoking the callback. So if a callback
> > needs to run again on the next KVM_RUN, it can simply set complete_userspace_io
> > again. In other words, literally doing nothing will get you what you want :-)
>
> We could just have the completion callback set complete_userspace_io
> again, but then you'd always get 2 userspace exit events per attestation
> request. There could be some userspaces that don't implement the
> file-locking scheme, in which case they wouldn't need the 2nd notification.
Then they don't set immediate_exit.
> That's why the KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE flag is provided
> as an opt-in.
>
> The pending/done status bits are so userspace can distinguish between the
> start of a certificate request and the completion side of it after it gets
> bound a completed attestation request and the filelock can be released.
Powered by blists - more mailing lists