lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 23 Nov 2017 16:59:46 +0100
From:   Radim Krčmář <rkrcmar@...hat.com>
To:     Marc Haber <mh+linux-kernel@...schlus.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        "KVM-ML (kvm@...r.kernel.org)" <kvm@...r.kernel.org>,
        Wanpeng Li <kernellwp@...il.com>
Subject: Re: VMs freezing when host is running 4.14

2017-11-23 16:20+0100, Marc Haber:
> On Wed, Nov 22, 2017 at 05:43:13PM +0100, Radim Krčmář wrote:
> > 2017-11-22 16:52+0100, Marc Haber:
> > > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> > > > So all guest kernels are 4.14, or also other older kernel?
> > > 
> > > Guest kernels are also 4.14, but the issue disappears when the host is
> > > downgraded to an older kernel. I therefore reckoned that the guest
> > > kernel doesn't matter, but that was before I saw the trace in the log.
> > 
> > The two most suspicious patches since 4.13 (which I assume works) are
> > 
> >   664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been
> >   injected")
> 
> That one does not revert cleanly, the line in questions seems to have
> been removed a bit later.
> 
> Reject is:
> 141 [24/5001]mh@fan:~/linux/git/linux ((v4.14.1) %) $ cat arch/x86/kvm/vmx.c.rej--- arch/x86/kvm/vmx.c
> +++ arch/x86/kvm/vmx.c
> @@ -2516,7 +2516,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
>         struct vcpu_vmx *vmx = to_vmx(vcpu);
>         unsigned nr = vcpu->arch.exception.nr;
>         bool has_error_code = vcpu->arch.exception.has_error_code;
> -       bool reinject = vcpu->arch.exception.injected;
> +       bool reinject = vcpu->arch.exception.reinject;
>         u32 error_code = vcpu->arch.exception.error_code;
>         u32 intr_info = nr | INTR_INFO_VALID_MASK;

This line one can be deleted as reinject isn't used in the function.

Btw. there have been already many fixes from Liran Alon for that patch
and your case could be the one adressed in
https://www.spinics.net/lists/kvm/msg159158.html

The patch is incorrect, but you might be able to see only its benefits.

> > and
> > 
> >   9a6e7c39810e ("KVM: async_pf: Fix #DF due to inject "Page not Present"
> >   and "Page Ready" exceptions simultaneously")
> > 
> > please try reverting them to see if it helps,
> 
> That one reverted cleanly. I am now running the new kernel on the
> affected machine, and I think that a second machine has joined the
> market of being affected.

That one had much lower chances of being the culprit.

> Would this matter on the host only or on the guests as well?

Only on the host.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ