[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4a1bcea-fb00-6f3f-6bf6-d876393190e4@gmail.com>
Date: Tue, 14 Oct 2025 15:43:10 +0800
From: Hao Jia <jiahao.kernel@...il.com>
To: Aaron Lu <ziqianlu@...edance.com>,
Valentin Schneider <vschneid@...hat.com>, Ben Segall <bsegall@...gle.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Peter Zijlstra <peterz@...radead.org>,
Chengming Zhou <chengming.zhou@...ux.dev>, Josh Don <joshdon@...gle.com>,
Ingo Molnar <mingo@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Xi Wang <xii@...gle.com>
Cc: linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
Chuyi Zhou <zhouchuyi@...edance.com>, Jan Kiszka <jan.kiszka@...mens.com>,
Florian Bezdeka <florian.bezdeka@...mens.com>,
Songtang Liu <liusongtang@...edance.com>, Chen Yu <yu.c.chen@...el.com>,
Matteo Martelli <matteo.martelli@...ethink.co.uk>,
Michal Koutný <mkoutny@...e.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH] sched/fair: Prevent cfs_rq from being unthrottled with
zero runtime_remaining
Hello Aaron,
On 2025/9/29 15:46, Aaron Lu wrote:
> When a cfs_rq is to be throttled, its limbo list should be empty and
> that's why there is a warn in tg_throttle_down() for non empty
> cfs_rq->throttled_limbo_list.
>
> When running a test with the following hierarchy:
>
> root
> / \
> A* ...
> / | \ ...
> B
> / \
> C*
>
> where both A and C have quota settings, that warn on non empty limbo list
> is triggered for a cfs_rq of C, let's call it cfs_rq_c(and ignore the cpu
> part of the cfs_rq for the sake of simpler representation).
>
I encountered a similar warning a while ago and fixed it. I have a
question I'd like to ask. tg_unthrottle_up(cfs_rq_C) calls
enqueue_task_fair(p) to enqueue a task, which requires that the
runtime_remaining of task p's entire task_group hierarchy be greater than 0.
In addition to the case you fixed above,
When bandwidth is running normally, Is it possible that there's a corner
case where cfs_A->runtime_remaining > 0, but cfs_B->runtime_remaining <
0 could trigger a similar warning?
So, I previously tried to fix this issue using the following code,
adding the ENQUEUE_THROTTLE flag to ensure that tasks enqueued in
tg_unthrottle_up() aren't throttled.
---
kernel/sched/fair.c | 6 ++++--
kernel/sched/sched.h | 1 +
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df8dc389af8e..128efa2eba57 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5290,7 +5290,9 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct
sched_entity *se, int flags)
se->on_rq = 1;
if (cfs_rq->nr_queued == 1) {
- check_enqueue_throttle(cfs_rq);
+ if (!(flags & ENQUEUE_THROTTLE))
+ check_enqueue_throttle(cfs_rq);
+
list_add_leaf_cfs_rq(cfs_rq);
#ifdef CONFIG_CFS_BANDWIDTH
if (cfs_rq->pelt_clock_throttled) {
@@ -5905,7 +5907,7 @@ static int tg_unthrottle_up(struct task_group *tg,
void *data)
list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list,
throttle_node) {
list_del_init(&p->throttle_node);
p->throttled = false;
- enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP);
+ enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP | ENQUEUE_THROTTLE);
}
/* Add cfs_rq with load or one or more already running entities to
the list */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b5367c514c14..871dfb761676 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2358,6 +2358,7 @@ extern const u32 sched_prio_to_wmult[40];
#define ENQUEUE_MIGRATING 0x100
#define ENQUEUE_DELAYED 0x200
#define ENQUEUE_RQ_SELECTED 0x400
+#define ENQUEUE_THROTTLE 0x800
#define RETRY_TASK ((void *)-1UL)
---
Unfortunately, I tried to build some tests locally and didn't reproduce
this corner case.
Thanks,
Hao
Powered by blists - more mailing lists