[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ac98150acd77f4c09167bc1bb1c552db68925cf2.camel@redhat.com>
Date: Wed, 23 Jun 2021 16:07:50 +0300
From: Maxim Levitsky <mlevitsk@...hat.com>
To: Paolo Bonzini <pbonzini@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>, kvm@...r.kernel.org
Cc: Sean Christopherson <seanjc@...gle.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Cathy Avery <cavery@...hat.com>,
Emanuele Giuseppe Esposito <eesposit@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] KVM: nSVM: Fix L1 state corruption upon return from
SMM
On Wed, 2021-06-23 at 16:01 +0300, Maxim Levitsky wrote:
> On Wed, 2021-06-23 at 11:39 +0200, Paolo Bonzini wrote:
> > On 23/06/21 09:44, Vitaly Kuznetsov wrote:
> > > - RFC: I'm not 100% sure my 'smart' idea to use currently-unused HSAVE area
> > > is that smart. Also, we don't even seem to check that L1 set it up upon
> > > nested VMRUN so hypervisors which don't do that may remain broken. A very
> > > much needed selftest is also missing.
> >
> > It's certainly a bit weird, but I guess it counts as smart too. It
> > needs a few more comments, but I think it's a good solution.
> >
> > One could delay the backwards memcpy until vmexit time, but that would
> > require a new flag so it's not worth it for what is a pretty rare and
> > already expensive case.
> >
> > Paolo
> >
>
> Hi!
>
> I did some homework on this now and I would like to share few my thoughts on this:
>
> First of all my attention caught the way we intercept the #SMI
> (this isn't 100% related to the bug but still worth talking about IMHO)
>
> A. Bare metal: Looks like SVM allows to intercept SMI, with SVM_EXIT_SMI,
> with an intention of then entering the BIOS SMM handler manually using the SMM_CTL msr.
> On bare metal we do set the INTERCEPT_SMI but we emulate the exit as a nop.
> I guess on bare metal there are some undocumented bits that BIOS set which
> make the CPU to ignore that SMI intercept and still take the #SMI handler,
> normally but I wonder if we could still break some motherboard
> code due to that.
>
>
> B. Nested: If #SMI is intercepted, then it causes nested VMEXIT.
> Since KVM does enable SMI intercept, when it runs nested it means that all SMIs
> that nested KVM gets are emulated as NOP, and L1's SMI handler is not run.
>
>
> About the issue that was fixed in this patch. Let me try to understand how
> it would work on bare metal:
>
> 1. A guest is entered. Host state is saved to VM_HSAVE_PA area (or stashed somewhere
> in the CPU)
>
> 2. #SMI (without intercept) happens
>
> 3. CPU has to exit SVM, and start running the host SMI handler, it loads the SMM
> state without touching the VM_HSAVE_PA runs the SMI handler, then once it RSMs,
> it restores the guest state from SMM area and continues the guest
>
> 4. Once a normal VMexit happens, the host state is restored from VM_HSAVE_PA
>
> So host state indeed can't be saved to VMC01.
>
> I to be honest think would prefer not to use the L1's hsave area but rather add back our
> 'hsave' in KVM and store there the L1 host state on the nested entry always.
>
> This way we will avoid touching the vmcb01 at all and both solve the issue and
> reduce code complexity.
> (copying of L1 host state to what basically is L1 guest state area and back
> even has a comment to explain why it (was) possible to do so.
> (before you discovered that this doesn't work with SMM).
I need more coffee today. The comment is somwhat wrong actually.
When L1 switches to L2, then its HSAVE area is L1 guest state, but
but L1 is a "host" vs L2, so it is host state.
The copying is more between kvm's register cache and the vmcb.
So maybe backing it up as this patch does is the best solution yet.
I will take more in depth look at this soon.
Best regards,
Maxim Levitsky
>
> Thanks again for fixing this bug!
>
> Best regards,
> Maxim Levitsky
Powered by blists - more mailing lists