lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef81ff36-64bb-4cfe-ae9b-e3acf47bff24@proxmox.com>
Date: Tue, 16 Jan 2024 16:37:19 +0100
From: Friedrich Weber <f.weber@...xmox.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org
Subject: Re: Temporary KVM guest hangs connected to KSM and NUMA balancer

Hi Sean,

On 11/01/2024 17:00, Sean Christopherson wrote:
> This is a known issue.  It's mostly a KVM bug[...] (fix posted[...]), but I suspect
> that a bug in the dynamic preemption model logic[...] is also contributing to the
> behavior by causing KVM to yield on preempt models where it really shouldn't.

I tried the following variants now, each applied on top of 6.7 (0dd3ee31):

* [1], the initial patch series mentioned in the bugreport ("[PATCH 0/2]
KVM: Pre-check mmu_notifier retry on x86")
* [2], its v2 that you linked above ("[PATCH v2] KVM: x86/mmu: Retry
fault before acquiring mmu_lock if mapping is changing")
* [3], the scheduler patch you linked above ("[PATCH] sched/core: Drop
spinlocks on contention iff kernel is preemptible")
* both [2] & [3]

My kernel is PREEMPT_DYNAMIC and, according to
/sys/kernel/debug/sched/preempt, defaults to preempt=voluntary. For case
[3], I additionally tried manually switching to preempt=full.

Provided I did not mess up, I get the following results for the
reproducer I posted:

* [1] (the initial patch series): no hangs
* [2] (its v2): hangs
* [3] (the scheduler patch) with preempt=voluntary: no hangs
* [3] (the scheduler patch) with preempt=full: hangs
* [2] & [3]: no hangs

So it seems like:

* [1] (the initial patch series) fixes the hangs, which is consistent
with the feedback in the bugreport [4].
* But weirdly, its v2 [2] does not fix the hangs.
* As long as I stay with preempt=voluntary, [3] (the scheduler patch)
alone is already enough to fix the hangs in my case -- this I did not
expect :)

Does this make sense to you? Happy to double-check or run more tests if
anything seems off.

Best wishes,

Friedrich

[1] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com/
[2] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com/
[3] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com/
[4] https://bugzilla.kernel.org/show_bug.cgi?id=218259#c6


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ