lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a1e5a8db-8382-4f52-8ef2-3b62b0c031ab@linux.ibm.com>
Date: Mon, 10 Nov 2025 13:02:11 +0100
From: Christian Borntraeger <borntraeger@...ux.ibm.com>
To: Wanpeng Li <kernellwp@...il.com>, Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Thomas Gleixner <tglx@...utronix.de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>
Cc: Steven Rostedt <rostedt@...dmis.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, Wanpeng Li <wanpengli@...cent.com>,
        Ilya Leoshkevich <iii@...ux.ibm.com>, Mete Durlu <meted@...ux.ibm.com>
Subject: Re: [PATCH 00/10] sched/kvm: Semantics-aware vCPU scheduling for
 oversubscribed KVM

Am 10.11.25 um 04:32 schrieb Wanpeng Li:
> From: Wanpeng Li <wanpengli@...cent.com>
> 
> This series addresses long-standing yield_to() inefficiencies in
> virtualized environments through two complementary mechanisms: a vCPU
> debooster in the scheduler and IPI-aware directed yield in KVM.
> 
> Problem Statement
> -----------------
> 
> In overcommitted virtualization scenarios, vCPUs frequently spin on locks
> held by other vCPUs that are not currently running. The kernel's
> paravirtual spinlock support detects these situations and calls yield_to()
> to boost the lock holder, allowing it to run and release the lock.
> 
> However, the current implementation has two critical limitations:
> 
> 1. Scheduler-side limitation:
> 
>     yield_to_task_fair() relies solely on set_next_buddy() to provide
>     preference to the target vCPU. This buddy mechanism only offers
>     immediate, transient preference. Once the buddy hint expires (typically
>     after one scheduling decision), the yielding vCPU may preempt the target
>     again, especially in nested cgroup hierarchies where vruntime domains
>     differ.
> 
>     This creates a ping-pong effect: the lock holder runs briefly, gets
>     preempted before completing critical sections, and the yielding vCPU
>     spins again, triggering another futile yield_to() cycle. The overhead
>     accumulates rapidly in workloads with high lock contention.

I can certainly confirm that on s390 we do see that yield_to does not always
work as expected. Our spinlock code is lock holder aware so our KVM always yield
correctly but often enought the hint is ignored our bounced back as you describe.
So I am certainly interested in that part.

I need to look more closely into the other part.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ