[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8f6bc21-c4e6-40de-838a-d374adb4e888@proxmox.com>
Date: Wed, 17 Jan 2024 14:09:28 +0100
From: Friedrich Weber <f.weber@...xmox.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: Temporary KVM guest hangs connected to KSM and NUMA balancer
On 16/01/2024 18:20, Sean Christopherson wrote:
>> Does this make sense to you? Happy to double-check or run more tests if
>> anything seems off.
>
> Ha! It too me a few minutes to realize what went sideways with v2. KVM has an
> in-flight change that switches from host virtual addresses (HVA) to guest physical
> frame numbers (GFN) for the retry check, commit 8569992d64b8 ("KVM: Use gfn instead
> of hva for mmu_notifier_retry").
>
> That commit is in the KVM pull request for 6.8, and so v2 is based on top of a
> branch that contains said commit. But for better or worse (probably worse), the
> switch from HVA=GFN didn't change the _names_ of mmu_invalidate_range_{start,end},
> only the type. So v2 applies and compiles cleanly on 6.7, but it's subtly broken
> because checking for a GFN match against an HVA range is all but guaranteed to get
> false negatives.
Oof, that's nifty, good catch! I'll pay more attention to the
base-commit when testing next time. :)
> If you can try v2 on top of `git://git.kernel.org/pub/scm/virt/kvm/kvm.git next`,
> that would be helpful to confirm that I didn't screw up something else.
Pulled that repository and can confirm:
* 1c6d984f ("x86/kvm: Do not try to disable kvmclock if it was not
enabled", current `next`): reproducer hangs
* v2 [1] ("KVM: x86/mmu: Retry fault before acquiring mmu_lock if
mapping is changing") applied on top of 1c6d984f: no hangs anymore
If I understand the discussion on [1] correctly, there might be a v3 --
if so, I'll happily test that too.
> Thanks very much for reporting back! I'm pretty sure we would have missed the
> semantic conflict when backporting the fix to 6.7 and earlier, i.e. you likely
> saved us from another round of bug reports for various stable trees.
Sure! Thanks a lot for taking a look at this!
Best wishes,
Friedrich
[1] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com/
Powered by blists - more mailing lists