lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220727153124.1afdad67@rosa.proxmox.com>
Date:   Wed, 27 Jul 2022 15:31:24 +0200
From:   Stoiko Ivanov <s.ivanov@...xmox.com>
To:     Maxim Levitsky <mlevitsk@...hat.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, bgardon@...gle.com,
        Jim Mattson <jmattson@...gle.com>, t.lamprecht@...xmox.com
Subject: Re: [PATCH] KVM: x86: enable TDP MMU by default

On Wed, 27 Jul 2022 13:22:48 +0300
Maxim Levitsky <mlevitsk@...hat.com> wrote:

> On Tue, 2022-07-26 at 17:43 +0200, Paolo Bonzini wrote:
> > On 7/26/22 16:57, Stoiko Ivanov wrote:  
> > > Hi,
> > > 
> > > Proxmox[0] recently switched to the 5.15 kernel series (based on the one
> > > for Ubuntu 22.04), which includes this commit.
> > > While it's working well on most installations, we have a few users who
> > > reported that some of their guests shutdown with
> > > `KVM: entry failed, hardware error 0x80000021` being logged under certain
> > > conditions and environments[1]:
> > > * The issue is not deterministically reproducible, and only happens
> > >    eventually with certain loads (e.g. we have only one system in our
> > >    office which exhibits the issue - and this only by repeatedly installing
> > >    Windows 2k22 ~ one out of 10 installs will cause the guest-crash)
> > > * While most reports are referring to (newer) Windows guests, some users
> > >    run into the issue with Linux VMs as well
> > > * The affected systems are from a quite wide range - our affected machine
> > >    is an old IvyBridge Xeon with outdated BIOS (an equivalent system with
> > >    the latest available BIOS is not affected), but we have
> > >    reports of all kind of Intel CPUs (up to an i5-12400). It seems AMD CPUs
> > >    are not affected.
> > > 
> > > Disabling tdp_mmu seems to mitigate the issue, but I still thought you
> > > might want to know that in some cases tdp_mmu causes problems, or that you
> > > even might have an idea of how to fix the issue without explicitly
> > > disabling tdp_mmu?  
> > 
> > If you don't need secure boot, you can try disabling SMM.  It should not 
> > be related to TDP MMU, but the logs (thanks!) point at an SMM entry (RIP 
> > = 0x8000, CS base=0x7ffc2000).  
> 
> No doubt about it. It is the issue.
> 
> > 
> > This is likely to be fixed by 
> > https://lore.kernel.org/kvm/20220621150902.46126-1-mlevitsk@redhat.com/.
Thanks to both of you for the quick feedback and the patches!

We ran our reproducer with the patch-series above applied on top of
5.19-rc8 from
git://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/kinetic
* without the patches the issue occurred within 20 minutes,
* with the patches applied issues did not occur for 3 hours (it usually
  does within 1-2 hours at most)

so fwiw it seems to fix the issue on our setup.
we'll do some more internal tests and would then make this available
(backported to our 5.15 kernel) to our users, who are affected by this.

Kind regards,
stoiko


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ