[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZGzfWQub4FQOrEtw@google.com>
Date: Tue, 23 May 2023 08:44:25 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Alexey Kardashevskiy <aik@....com>
Cc: kvm@...r.kernel.org, x86@...nel.org, linux-kernel@...r.kernel.org,
Tom Lendacky <thomas.lendacky@....com>,
Pankaj Gupta <pankaj.gupta@....com>,
Nikunj A Dadhania <nikunj@....com>,
Santosh Shukla <santosh.shukla@....com>,
Carlos Bilbao <carlos.bilbao@....com>
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
>
>
> On 23/5/23 09:39, Sean Christopherson wrote:
> > On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> > > Prior to SEV-ES, KVM saved/restored host debug registers upon switching
> > > to/from a VM. Changing those registers inside a running SEV VM
> > > triggered #VMEXIT to KVM.
> >
> > Please, please don't make it sound like some behavior is *the* one and only behavior.
> > *KVM* *chooses* to intercept debug register accesses. Though I would omit this
> > paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
> > a basic understanding of how KVM deals with DRs and #DBs.
>
> Out of curiosity - is ARM similar in this regard, interceptions and stuff?
Yes. The granularity of interceptions varies based on the architectural revision,
and presumably there are things that always trap. But the core concept of letting
software decide what to intercept is the same.
> > > SEV-ES added encrypted state (ES) which uses an encrypted page
> > > for the VM state (VMSA). The hardware saves/restores certain registers
> > > on VMRUN/VMEXIT according to a swap type (A, B, C), see
> > > "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> > > volume 2.
> > >
> > > The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
> >
> > Please rewrite this to just state what the behavior is instead of "Type A" vs.
> > "Type B". Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
> > enough to justify obfuscating super simple concepts.
> >
> > Actually, this feature really has nothing to do with Type A vs. Type B, since
> > that's purely about host state.
>
> Mmm. Nothing? If the feature is enabled and DR[0-3] are not saved in HOSTSA,
> then the host looses debug state because of the working feature.
>
> > I assume the switch from Type A to Type B is a
> > side effect, or an opportunistic optimization?
>
> There is no switch. DR[67] were and are one type, and the other DRs were not
> swapped and now are, but with a different Swap Type.
Ah, this is what I missed.
> And the patch saves DR[0-3] in HOSTSA but not DR[67] and this deserves some
> explaining why is that.
>
> > Regardless, something like this is a lot easier for a human to read. It's easy
> > enough to find DebugSwap in the APM (literally took me longer to find my copy of
> > the PDF).
>
> It is also easy to find "Swap Types" in the APM...
Redirecting readers to specs for gory details is fine. Redirecting for basic,
simple functionality is not. Maybe the swap types will someday be common knowledge
in KVM, and an explanation will no longer be necessary, but we are nowhere near
that point.
> > Add support for "DebugSwap for SEV-ES guests", which provides support
> > for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
> > allows KVM to expose debug capabilities to SEV-ES guests. Without
> > DebugSwap support, the CPU doesn't save/load _guest_ debug registers,
>
> But it does save/load DR6 and DR7.
>
> > and KVM cannot manually context switch guest DRs due the VMSA being
> > encrypted.
> >
> > Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
> > which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
> > vectoring a #DB.
>
> (english question) What does "vectoring" mean here precisely? Handling?
> Processing?
It's not really an English thing. "Vectoring" is, or at least was, Intel terminology
for describing the flow from the initial detection of an exception to the exception's
delivery to software, e.g. covers the IDT lookup, any GDT/LDT lookups, pushing
information on the stack, fetching the software exception handler, etc. Importantly,
it describes the period where there are no instructions retired and thus ucode doesn't
open event windows, i.e. where triggering another, non-contributory exception will
effectively "hang" the CPU (at least on CPUs without Intel's "notify" VM-Exit support).
> > the host by putting the CPU into an infinite loop of vectoring #DBs
> > (see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)
>
> Good one, thanks for the link.
>
> >
> > Set the features bit in sev_es_sync_vmsa() which is the last point
> > when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> > init happens) is called not only when VCPU is initialized but also on
> > intrahost migration when VMSA is encrypted.
> >
> > > guests, but a new feature is available, identified via
> > > CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
> > > support for swapping additional debug registers. DR[0-3] and
> > > DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
> > > is set.
> > >
> > > Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
> > > ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> > > supported by the SOC as otherwise a malicious SEV-ES guest can set up
> > > data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
> > > Set the features bit in sev_es_sync_vmsa() which is the last point
> > > when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> > > init happens) is called not only when VCPU is initialized but also on
> > > intrahost migration when VMSA is encrypted.
> > >
> > > Eliminate DR7 and #DB intercepts as:
> > > - they are not needed when DebugSwap is supported;
> >
> > "not needed" isn't sufficient justification. KVM doesn't strictly need to do a
> > lot of things, but does them anyways for a variety of reasons. E.g. #DB intercept
> > is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
> > this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
> > when NoNestedDataBp is support. Presumably the answer is "because it's simpler
> > than toggling #DB interception for guest_debug.
>
> TBH I did not have a good answer but from the above I'd say we want to
> disable #DB intercepts in a separate patch, just as you say below.
>
> > Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
> > separate patch? KVM can still inject #DBs for SEV-ES guests, no?
>
> Sorry for my ignorance but what is the point of injecting #DB if there is no
> way of changing the guest's DR7?
Well, _injecting_ the #DB is necessary for correctness from the guest's perspective.
"What's the point of _intercepting_ #DB" is the real question. And for SEV-ES guests
with DebugSwap, there is no point, which is why I agree that KVM should disable
interception in that case. What I'm calling out is that disabling #Db interception
isn't _necessary_ for correctness (unless I'm missing something), which means that
it can and should go in a separate patch.
> > As for DR7, the real justification is that, as above, KVM can't modify guest DR7,
> > and intercepting DR7 would completely defeat the purpose of enabling DebugSwap.
> >
> > > - #VC for these intercepts is most likely not supported anyway and
> > > kills the VM.
> >
> > I don't see how this is relevant or helpful. What the guest may or may not do in
> > response to a #VC on a DR7 write has no bearing on KVM's behavior.
>
> Readers of the patch may wonder of the chances of breaks guests. Definitely
> helps me to realize there is likely to be some sort of panic() in the guest
> if such intercept happens.
But that's irrelevant. Intercepting DR7 writes will break the guest regardless
of how the guest deals with the #VC. If the guest eats the #VC and continues on,
the debug capabilities expected by the guest will still be missing, i.e. KVM has
broken the functionality of the guest. I am total ok if the changelog describes
the _possible_ scenarios (within reason), e.g. that the guest will either panic
on an unexpected #VC or lose debug capabilities that were promised. What I'm
objecting to is speculating on what the guest is _likely_ to do, and especially
using that speculation as justification for doing something in KVM.
> > > Signed-off-by: Alexey Kardashevskiy <aik@....com>
> > > Reviewed-by: Tom Lendacky <thomas.lendacky@....com>
> > > ---
> >
> > ...
> >
> > > @@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
> > > /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
> > > hostsa->xss = host_xss;
> > > +
> > > + /* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
> >
> > Since dangling a carrot didn't work[*], I'm going to resort to extortion :-)
> > Can you fold the below somewhere before this patch, and then tweak this comment
> > to say something like:
> >
> > /*
> > * If DebugSwap is enabled, debug registers are loaded but NOT saved by
> > * the CPU (Type-B). If DebugSwap is disabled/unsupported, the CPU both
> > * saves and loads debug registers (Type-A).
> > */
>
> Sure but...
>
> >
> > [*] https://lore.kernel.org/all/YzOTOzUzKPQSqURo@google.com/
> >
> >
> > From: Sean Christopherson <seanjc@...gle.com>
> > Date: Mon, 22 May 2023 16:29:52 -0700
> > Subject: [PATCH] KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment
> > about swap types
>
>
> ... I am missing the point here. You already wrote the patch below which 1)
> you are happy about 2) you can push out right away to the upstream. Are you
> suggesting that I repost it?
I am asking you to include it in your series because the comment I suggested above
(for DebugSwap) loosely depends on the revamped comment for sev_es_prepare_switch_to_guest()
as a whole. I would like to settle on the exact wording for all of the comments
in sev_es_prepare_switch_to_guest() in a single series instead of having disjoint
threads.
> > Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
> > swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
> > needs to save a seemingly random subset of host state, and to provide a
> > decoder for the APM's Type-A/B/C terminology.
> >
> > Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> > ---
> > arch/x86/kvm/svm/sev.c | 25 +++++++++++++++----------
> > 1 file changed, 15 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 69ae5e1b3120..1e0e9b08e491 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3017,19 +3017,24 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
> > void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
> > {
> > /*
> > - * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
> > - * of which one step is to perform a VMLOAD. KVM performs the
> > - * corresponding VMSAVE in svm_prepare_guest_switch for both
> > - * traditional and SEV-ES guests.
>
>
> When I see "traditional", I assume there was one single x86 KVM before say
> 2010 which was slow, emulated a lot but worked exactly the same on Intel and
> AMD. Which does not seem to ever be the case. May be say "SVM" here?
This is the code being removed. I agree that "traditional" is ambiguous, which
is why I want to delete it :-)
> > + * All host state for SEV-ES guests is categorized into three swap types
> > + * based on how it is handled by hardware during a world switch:
> > + *
> > + * A: VMRUN: Host state saved in host save area
> > + * VMEXIT: Host state loaded from host save area
> > + *
> > + * B: VMRUN: Host state _NOT_ saved in host save area
> > + * VMEXIT: Host state loaded from host save area
> > + *
> > + * C: VMRUN: Host state _NOT_ saved in host save area
> > + * VMEXIT: Host state initialized to default(reset) values
> > + *
> > + * Manually save type-B state, i.e. state that is loaded by VMEXIT but
> > + * isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
> > + * by common SVM code).
Powered by blists - more mailing lists