[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMiDn1BJsOxtJwep@google.com>
Date: Mon, 15 Sep 2025 14:22:39 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Tom Lendacky <thomas.lendacky@....com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Mathias Krause <minipli@...ecurity.net>, John Allen <john.allen@....com>,
Rick Edgecombe <rick.p.edgecombe@...el.com>, Chao Gao <chao.gao@...el.com>,
Maxim Levitsky <mlevitsk@...hat.com>, Xiaoyao Li <xiaoyao.li@...el.com>,
Zhang Yi Z <yi.z.zhang@...ux.intel.com>
Subject: Re: [PATCH v15 03/41] KVM: SEV: Validate XCR0 provided by guest in GHCB
On Mon, Sep 15, 2025, Tom Lendacky wrote:
> On 9/12/25 18:22, Sean Christopherson wrote:
> (successfully booted and ran some quick tests against the first 3
> patches without any issues on both an SEV-ES and SEV-SNP guest).
Nice, thanks!
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 37abbda28685..0cd77a87dd84 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3303,10 +3303,8 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> >
> > svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
> >
> > - if (kvm_ghcb_xcr0_is_valid(svm)) {
> > - vcpu->arch.xcr0 = kvm_ghcb_get_xcr0(ghcb);
> > - vcpu->arch.cpuid_dynamic_bits_dirty = true;
> > - }
> > + if (kvm_ghcb_xcr0_is_valid(svm))
> > + __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
>
> Would a vcpu_unimpl() be approprite here if __kvm_set_xcr() returns
> something other than 0? It might help with debugging if the guest is
> doing something it shouldn't.
I don't want to use vcpu_unimpl(), because this isn't likely to occur due to lack
of KVM support. Hrm, but I see now that sev.c abuses vcpu_unimpl() for a whole
pile of things that aren't due to lack of KVM support. I can certainly appreciate
how painful it can be to debug random -EINVAL failures, but printing via
vcpu_unimpl() isn't an approach that I want to encourage as it's inherently
flawed (requires access to kernel logs, is ratelimited, and can still cause
jitter even though it's ratelimited if multiple CPUs happen to contend the printk
locks at just the wrong time).
For now, I think it's best to do nothing. Long term, I think we need to figure
out a solution for logging errors of this nature, because this is far from the
first time this sort of thing has popped up. E.g. nested VMX consistency check
failures are another case where having precise logs would be super helpful, but
telling userspace to enable tracepoints and scrape the tracefs logs (or run BPF
programs) isn't exactly a great experience.
Powered by blists - more mailing lists