lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8f6bc21-c4e6-40de-838a-d374adb4e888@proxmox.com>
Date: Wed, 17 Jan 2024 14:09:28 +0100
From: Friedrich Weber <f.weber@...xmox.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org
Subject: Re: Temporary KVM guest hangs connected to KSM and NUMA balancer

On 16/01/2024 18:20, Sean Christopherson wrote:
>> Does this make sense to you? Happy to double-check or run more tests if
>> anything seems off.
>  
> Ha!  It too me a few minutes to realize what went sideways with v2.  KVM has an
> in-flight change that switches from host virtual addresses (HVA) to guest physical
> frame numbers (GFN) for the retry check, commit 8569992d64b8 ("KVM: Use gfn instead
> of hva for mmu_notifier_retry").
> 
> That commit is in the KVM pull request for 6.8, and so v2 is based on top of a
> branch that contains said commit.  But for better or worse (probably worse), the
> switch from HVA=GFN didn't change the _names_ of mmu_invalidate_range_{start,end},
> only the type.  So v2 applies and compiles cleanly on 6.7, but it's subtly broken
> because checking for a GFN match against an HVA range is all but guaranteed to get
> false negatives.

Oof, that's nifty, good catch! I'll pay more attention to the
base-commit when testing next time. :)

> If you can try v2 on top of `git://git.kernel.org/pub/scm/virt/kvm/kvm.git next`,
> that would be helpful to confirm that I didn't screw up something else.

Pulled that repository and can confirm:

* 1c6d984f ("x86/kvm: Do not try to disable kvmclock if it was not
enabled", current `next`): reproducer hangs
* v2 [1] ("KVM: x86/mmu: Retry fault before acquiring mmu_lock if
mapping is changing") applied on top of 1c6d984f: no hangs anymore

If I understand the discussion on [1] correctly, there might be a v3 --
if so, I'll happily test that too.

> Thanks very much for reporting back!  I'm pretty sure we would have missed the
> semantic conflict when backporting the fix to 6.7 and earlier, i.e. you likely
> saved us from another round of bug reports for various stable trees.

Sure! Thanks a lot for taking a look at this!

Best wishes,

Friedrich

[1] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com/


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ