lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 10 Aug 2022 14:00:23 +0200
From:   Thomas Lamprecht <t.lamprecht@...xmox.com>
To:     Maxim Levitsky <mlevitsk@...hat.com>, kvm@...r.kernel.org
Cc:     Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        linux-kernel@...r.kernel.org, Wanpeng Li <wanpengli@...cent.com>,
        Ingo Molnar <mingo@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>, x86@...nel.org,
        Jim Mattson <jmattson@...gle.com>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "H. Peter Anvin" <hpa@...or.com>, Joerg Roedel <joro@...tes.org>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH v3 00/13] SMM emulation and interrupt shadow fixes

On 03/08/2022 17:49, Maxim Levitsky wrote:
> This patch series is a result of long debug work to find out why
> sometimes guests with win11 secure boot
> were failing during boot.
> 
> During writing a unit test I found another bug, turns out
> that on rsm emulation, if the rsm instruction was done in real
> or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
> 
> I also refactored the way we write SMRAM so it is easier
> now to understand what is going on.
> 
> The main bug in this series which I fixed is that we
> allowed #SMI to happen during the STI interrupt shadow,
> and we did nothing to both reset it on #SMI handler
> entry and restore it on RSM.
> 
> V3: addressed most of the review feedback from Sean (thanks!)
> 
> Best regards,
> 	Maxim Levitsky
> 
> Maxim Levitsky (13):
>   bug: introduce ASSERT_STRUCT_OFFSET
>   KVM: x86: emulator: em_sysexit should update ctxt->mode
>   KVM: x86: emulator: introduce emulator_recalc_and_set_mode
>   KVM: x86: emulator: update the emulation mode after rsm
>   KVM: x86: emulator: update the emulation mode after CR0 write
>   KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on
>     the image format
>   KVM: x86: emulator/smm: add structs for KVM's smram layout
>   KVM: x86: emulator/smm: use smram structs in the common code
>   KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore
>   KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore
>   KVM: x86: SVM: use smram structs
>   KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode
>     capable
>   KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM
> 
>  arch/x86/include/asm/kvm_host.h |  11 +-
>  arch/x86/kvm/emulate.c          | 305 +++++++++++++++++---------------
>  arch/x86/kvm/kvm_emulate.h      | 223 ++++++++++++++++++++++-
>  arch/x86/kvm/svm/svm.c          |  30 ++--
>  arch/x86/kvm/vmx/vmcs12.h       |   5 +-
>  arch/x86/kvm/vmx/vmx.c          |   4 +-
>  arch/x86/kvm/x86.c              | 175 +++++++++---------
>  include/linux/build_bug.h       |   9 +
>  8 files changed, 497 insertions(+), 265 deletions(-)
> 

FWIW, we tested the v2 on 5.19 and backported it to 5.15 with minimal adaption
required (mostly unrelated context change) and now also updated that backport
to the v3 of this patch series.

Our reproducer got fixed with either, but v3 now also avoids triggering logs like:

 Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle
 Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle
 Jul 29 07:15:46 mits4 kernel: kvm: vcpu 1: requested 191999 ns lapic timer period limited to 200000 ns
 Jul 29 11:06:31 mits4 kernel: kvm: vcpu 1: requested 105786 ns lapic timer period limited to 200000 ns

which happened earlier (not sure how deep that correlates with the v2 vs. v3, but
it stuck out, so mentioning for sake of completeness).

For the backport to 5.15 we skipped "KVM: x86: emulator/smm: number of GPRs in
the SMRAM image depends on the image format", as that constant was there yet and
the actual values stayed the same for our case FWICT and adapted to slight context
changes for the others.

So, the approach seems to fix our issue and we are already rolling out a kernel
to users for testing and got positive feedback there too.

With above in mind:

Tested-by: Thomas Lamprecht <t.lamprecht@...xmox.com>

It would be also great to see this backported to still supported upstream stable kernels
from 5.15 onwards, as there the TDP MMU got by default enabled, and that is at least
increasing the chance of our reproducer to trigger dramatically.

thx & cheers
Thomas

Powered by blists - more mailing lists