[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <866df4004138ba18c6503266b61661a2ed8536c6.camel@redhat.com>
Date: Wed, 10 Aug 2022 16:25:29 +0300
From: Maxim Levitsky <mlevitsk@...hat.com>
To: Thomas Lamprecht <t.lamprecht@...xmox.com>, kvm@...r.kernel.org
Cc: Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
linux-kernel@...r.kernel.org, Wanpeng Li <wanpengli@...cent.com>,
Ingo Molnar <mingo@...hat.com>,
Sean Christopherson <seanjc@...gle.com>, x86@...nel.org,
Jim Mattson <jmattson@...gle.com>,
Kees Cook <keescook@...omium.org>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>, Joerg Roedel <joro@...tes.org>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH v3 00/13] SMM emulation and interrupt shadow fixes
On Wed, 2022-08-10 at 14:00 +0200, Thomas Lamprecht wrote:
> On 03/08/2022 17:49, Maxim Levitsky wrote:
> > This patch series is a result of long debug work to find out why
> > sometimes guests with win11 secure boot
> > were failing during boot.
> >
> > During writing a unit test I found another bug, turns out
> > that on rsm emulation, if the rsm instruction was done in real
> > or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
> >
> > I also refactored the way we write SMRAM so it is easier
> > now to understand what is going on.
> >
> > The main bug in this series which I fixed is that we
> > allowed #SMI to happen during the STI interrupt shadow,
> > and we did nothing to both reset it on #SMI handler
> > entry and restore it on RSM.
> >
> > V3: addressed most of the review feedback from Sean (thanks!)
> >
> > Best regards,
> > Maxim Levitsky
> >
> > Maxim Levitsky (13):
> > bug: introduce ASSERT_STRUCT_OFFSET
> > KVM: x86: emulator: em_sysexit should update ctxt->mode
> > KVM: x86: emulator: introduce emulator_recalc_and_set_mode
> > KVM: x86: emulator: update the emulation mode after rsm
> > KVM: x86: emulator: update the emulation mode after CR0 write
> > KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on
> > the image format
> > KVM: x86: emulator/smm: add structs for KVM's smram layout
> > KVM: x86: emulator/smm: use smram structs in the common code
> > KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore
> > KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore
> > KVM: x86: SVM: use smram structs
> > KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode
> > capable
> > KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM
> >
> > arch/x86/include/asm/kvm_host.h | 11 +-
> > arch/x86/kvm/emulate.c | 305 +++++++++++++++++---------------
> > arch/x86/kvm/kvm_emulate.h | 223 ++++++++++++++++++++++-
> > arch/x86/kvm/svm/svm.c | 30 ++--
> > arch/x86/kvm/vmx/vmcs12.h | 5 +-
> > arch/x86/kvm/vmx/vmx.c | 4 +-
> > arch/x86/kvm/x86.c | 175 +++++++++---------
> > include/linux/build_bug.h | 9 +
> > 8 files changed, 497 insertions(+), 265 deletions(-)
> >
>
> FWIW, we tested the v2 on 5.19 and backported it to 5.15 with minimal adaption
> required (mostly unrelated context change) and now also updated that backport
> to the v3 of this patch series.
>
> Our reproducer got fixed with either, but v3 now also avoids triggering logs like:
>
> Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle
> Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle
> Jul 29 07:15:46 mits4 kernel: kvm: vcpu 1: requested 191999 ns lapic timer period limited to 200000 ns
> Jul 29 11:06:31 mits4 kernel: kvm: vcpu 1: requested 105786 ns lapic timer period limited to 200000 ns
>
> which happened earlier (not sure how deep that correlates with the v2 vs. v3, but
> it stuck out, so mentioning for sake of completeness).
This is likely just a coincidence because V3 should not contain any functional changes vs v2.
(If I remember correctly)
>
> For the backport to 5.15 we skipped "KVM: x86: emulator/smm: number of GPRs in
> the SMRAM image depends on the image format", as that constant was there yet and
> the actual values stayed the same for our case FWICT and adapted to slight context
> changes for the others.
>
> So, the approach seems to fix our issue and we are already rolling out a kernel
> to users for testing and got positive feedback there too.
>
> With above in mind:
>
> Tested-by: Thomas Lamprecht <t.lamprecht@...xmox.com>
Thank you very much for testing!
>
> It would be also great to see this backported to still supported upstream stable kernels
> from 5.15 onwards, as there the TDP MMU got by default enabled, and that is at least
> increasing the chance of our reproducer to trigger dramatically.
Best regards,
Maxim Levitsky
>
> thx & cheers
> Thomas
>
Powered by blists - more mailing lists