[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAH4kHZkuD4UsXRGED6qecfAkeFpd8sLc+0osDhnKP4T5VmSYQ@mail.gmail.com>
Date: Wed, 28 May 2025 11:25:23 -0700
From: Dionna Amalie Glaze <dionnaglaze@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-coco@...ts.linux.dev, Thomas Lendacky <Thomas.Lendacky@....com>,
Paolo Bonzini <pbonzini@...hat.com>, Joerg Roedel <jroedel@...e.de>, Peter Gonda <pgonda@...gle.com>,
Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH v5 1/2] kvm: sev: Add SEV-SNP guest request throttling
On Wed, May 21, 2025 at 11:19 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Fri, May 16, 2025, Dionna Amalie Glaze wrote:
> > > > @@ -4015,6 +4042,12 @@ static int snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_
> > > >
> > > > mutex_lock(&sev->guest_req_mutex);
> > > >
> > > > + if (!__ratelimit(&sev->snp_guest_msg_rs)) {
> > > > + svm_vmgexit_no_action(svm, SNP_GUEST_ERR(SNP_GUEST_VMM_ERR_BUSY, 0));
> > > > + ret = 1;
> > > > + goto out_unlock;
> > >
> > > Can you (or anyone) explain what a well-behaved guest will do in in response to
> > > BUSY? And/or explain why KVM injecting an error into the guest is better than
> > > exiting to userspace.
> >
> > Ah, I missed this question. The guest is meant to back off and try again
> > after waiting a bit. This is the behavior added in
> > https://lore.kernel.org/all/20230214164638.1189804-2-dionnaglaze@google.com/
>
> Nice, it's already landed and considered legal VMM behavior.
>
> > If KVM returns to userspace with an exit type that the guest request was
> > throttled, then what is user space supposed to do with that?
>
> The userspace exit doesn't have to notify userspace that the guest was throttled,
> e.g. KVM could exit on _every_ request and let userspace do its own throttling.
>
> I have no idea whether or not that's sane/useful, which is why I'm asking. The
> cover letter, changelog, and documentation are all painfully sparse with respect
> to explaining why *this* uAPI is the right uAPI.
>
> > It could wait a bit before trying KVM_RUN again, but with the enlightened
> > method, the guest could at least work on other kernel tasks while it waits
> > for its turn to get an attestation report.
>
> Nothing prevents KVM from providing userspace a way to communicate VMM_ERR_BUSY,
> e.g. as done for KVM_EXIT_SNP_REQ_CERTS:
>
> https://lore.kernel.org/all/20250428195113.392303-2-michael.roth@amd.com
>
> > Perhaps this is me not understanding the preferred KVM way of doing things.
>
> The only real preference at play is to not end up with uAPI and ABI that doesn't
> fit "everyone's" needs. It's impossible to fully future-proof KVM's ABI, but we
> can at least perform due diligence to ensure we didn't simply pick the the path
> of least resistance.
>
> The bar gets lowered a tiny bit if we go with a module param (which I think we
> should do), but I'd still like an explanation of why a fairly simple ratelimiting
> mechanism is the best overall approach.
Before I send out a revised patchset with changed commit text, what do
you think about the following
The AMD-SP is a precious resource that doesn't have a scheduler other
than a mutex lock queue. To avoid customers from causing a DoS, a
kernel module parameter for rate limiting guest requests is added.
[Addition:]
The kernel module parameter is a lower bound kernel-imposed rate limit
for any SEV-SNP VM-initiated guest request. This does not preclude the
addition of a new KVM exit type for SEV-SNP guest requests for
userspace to impose any additional throttling logic. The default value of
0 maintains the previous behavior that there is no imposed rate limit on
guest requests.
We could still ask Michael to change KVM_EXIT_SNP_REQ_CERTS to
KVM_EXIT_SNP_GUEST_REQ
and for the exit structure to include msg_type as well as the
gfn+npages when the kind is an extended request for an attestation
report so that we don't need to have two exit types.
Regardless of that change for additional throttling opportunities, I
think the system-wide imposed lower bound is important for quelling
noisy neighbors to some degree.
--
-Dionna Glaze, PhD, CISSP, CCSP (she/her)
Powered by blists - more mailing lists