linux-kernel - Re: [PATCH] sched/core: Drop spinlocks on contention iff kernel is preemptible

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <34292692-67e9-4132-be1c-eba79dd3a84f@proxmox.com>
Date: Thu, 1 Feb 2024 16:22:11 +0100
From: Friedrich Weber <f.weber@...xmox.com>
To: Sean Christopherson <seanjc@...gle.com>, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>
Cc: linux-kernel@...r.kernel.org,
 Valentin Schneider <valentin.schneider@....com>,
 Marco Elver <elver@...gle.com>, Frederic Weisbecker <frederic@...nel.org>,
 David Matlack <dmatlack@...gle.com>
Subject: Re: [PATCH] sched/core: Drop spinlocks on contention iff kernel is
 preemptible

On 10/01/2024 22:47, Sean Christopherson wrote:
> Use preempt_model_preemptible() to detect a preemptible kernel when
> deciding whether or not to reschedule in order to drop a contended
> spinlock or rwlock.  Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
> built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
> preemption model is "none" or "voluntary".  In short, make kernels with
> dynamically selected models behave the same as kernels with statically
> selected models.
> 
> Somewhat counter-intuitively, NOT yielding a lock can provide better
> latency for the relevant tasks/processes.  E.g. KVM x86's mmu_lock, a
> rwlock, is often contended between an invalidation event (takes mmu_lock
> for write) and a vCPU servicing a guest page fault (takes mmu_lock for
> read).  For _some_ setups, letting the invalidation task complete even
> if there is mmu_lock contention provides lower latency for *all* tasks,
> i.e. the invalidation completes sooner *and* the vCPU services the guest
> page fault sooner.

I've been testing this patch for some time now:

Applied on top of Linux 6.7 (0dd3ee31) on a PREEMPT_DYNAMIC kernel with
preempt=voluntary, it fixes an issue for me where KVM guests would
temporarily freeze if NUMA balancing and KSM are active on a NUMA host.
See [1] for more details.

In addition, I've been running with this patch on my (non-NUMA)
workstation with (admittedly fairly light) VM workloads for two weeks
now and so far didn't notice any negative effects (this is on top of a
modified 6.5.11 kernel though).

Side note: I noticed the patch doesn't apply anymore on 6.8-rc2, seems
like sched.h was refactored in the meantime.

[1]
https://lore.kernel.org/kvm/ef81ff36-64bb-4cfe-ae9b-e3acf47bff24@proxmox.com/