[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <xhsmho6smhrgo.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Mon, 11 Aug 2025 16:18:31 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Xin Zhao
<jackzxcui1989@....com>
Cc: tj@...nel.org, hannes@...xchg.org, mkoutny@...e.com, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, will@...nel.org, boqun.feng@...il.com,
longman@...hat.com, clrkwllms@...nel.org, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev
Subject: Re: [PATCH] sched/cgroup: Lock optimize for cgroup cpu throttle
On 11/08/25 10:36, Sebastian Andrzej Siewior wrote:
> On 2025-08-11 15:08:38 [+0800], Xin Zhao wrote:
>> After enabling PREEMPT_RT, ordinary spinlocks can also be subject to cgroup
>> limits during the lock-holding period. This can lead to seemingly unrelated
>> threads experiencing timing dependencies due to underlying logic, such as
>> memory allocation, resulting in delayed wake-up behaviors that are difficult
>> to understand when analyzing traces captured by tools like Perfetto.
>> Due to the prevalence of this performance issue when using cgroup CPU
>> throttling with PREEMPT_RT, the CGROUP_LOCK_OPTIMIZE configuration will be
>> enabled by default when both PREEMPT_RT and CFS_BANDWIDTH are activated.
>> This configuration option temporarily increases the priority of tasks to
>> SCHED_RR 1 if they hold a lock (excluding raw spinlocks, RCU, and seqlock)
>> and are limited by cgroup, provided they are SCHED_NORMAL. Once the lock is
>> released, the priority will be restored.
>> This patch is a derivative of the priority inheritance patch. While priority
>> inheritance can cover scenarios involving spinlocks and mutexes, it cannot
>> address the timing dependency issues between two SCHED_NORMAL tasks caused
>> by underlying locks. Additionally, the lazy_preempt feature does not cover
>> scenarios where a real-time task, such as a ktimer, interrupts a lock-holding
>> SCHED_NORMAL task, which is then throttled by cgroup cpu.
>> This patch not only addresses the issue of cgroup limits affecting spinlocks
>> under PREEMPT_RT but also resolves issues related to holding mutex or
>> semaphore locks, as well as other core rt_mutex locks under PREEMPT_RT.
>> The following stack trace illustrates the delayed wake-up behavior caused by
>> two seemingly unrelated threads due to underlying logic:
>
> urgh.
>
> What about using task_work_add() and throttling the task on its way to
> userland? The callback will be invoked without any locks held.
>
Yeah, please have a look at:
https://lore.kernel.org/lkml/20250715071658.267-1-ziqianlu@bytedance.com/
> Sebastian
Powered by blists - more mailing lists