[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260120032549.186733-1-quzicheng@huawei.com>
Date: Tue, 20 Jan 2026 03:25:49 +0000
From: Zicheng Qu <quzicheng@...wei.com>
To: <kprateek.nayak@....com>
CC: <bsegall@...gle.com>, <dhaval@...ux.vnet.ibm.com>,
<dietmar.eggemann@....com>, <juri.lelli@...hat.com>,
<linux-kernel@...r.kernel.org>, <mgorman@...e.de>, <mingo@...hat.com>,
<peterz@...radead.org>, <quzicheng@...wei.com>, <rostedt@...dmis.org>,
<tanghui20@...wei.com>, <vatsa@...ux.vnet.ibm.com>,
<vincent.guittot@...aro.org>, <vschneid@...hat.com>, <zhangqiao22@...wei.com>
Subject: [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
Consider the following sequence on a CPU configured with nohz_full:
1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS
bandwidth control. The gse (cgroup A) where the task P attached is
dequeued and the CPU switches to idle.
2) Before cgroup A is unthrottled, task P is migrated from cgroup A to
another cgroup B (not throttled).
During sched_move_task(), the task P is observed as queued but not
running, and therefore no resched_curr() is triggered.
3) Since the CPU is nohz_full, it remains in do_idle() waiting for an
explicit scheduling event, i.e., resched_curr().
4) Later, cgroup A is unthrottled. However, the task P has already been
migrated out of cgroup A, so unthrottle_cfs_rq() may observe
load_weight == 0 and return early without resched_curr() called.
At this point, the task P is runnable in cgroup B (not throttled), but
the CPU remains in do_idle() with no pending reschedule point. The
system stays in this state until an unrelated event (e.g. a new task
wakeup or any cases) that can trigger a resched_curr() breaks the
nohz_full idle state, and then the task P finally gets scheduled.
The root cause is that sched_move_task() may classify the task as only
queued, not running, and therefore fails to trigger a resched_curr(),
while the later unthrottling path no longer has visibility of the
migrated task.
Preserve the existing behavior for running tasks by issuing
resched_curr(), and explicitly invoke check_preempt_curr() for tasks
that were queued at the time of migration. This ensures that runnable
tasks are reconsidered for scheduling even when nohz_full suppresses
periodic ticks.
Fixes: 29f59db3a74b ("sched: group-scheduler core")
Signed-off-by: Zicheng Qu <quzicheng@...wei.com>
Reviewed-by: K Prateek Nayak <kprateek.nayak@....com>
---
kernel/sched/core.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 045f83ad261e..04271b77101c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9110,6 +9110,7 @@ static void sched_change_group(struct task_struct *tsk)
void sched_move_task(struct task_struct *tsk, bool for_autogroup)
{
unsigned int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
+ bool queued = false;
bool resched = false;
struct rq *rq;
@@ -9122,10 +9123,13 @@ void sched_move_task(struct task_struct *tsk, bool for_autogroup)
scx_cgroup_move_task(tsk);
if (scope->running)
resched = true;
+ queued = scope->queued;
}
if (resched)
resched_curr(rq);
+ else if (queued)
+ wakeup_preempt(rq, tsk, 0);
__balance_callbacks(rq, &rq_guard.rf);
}
--
2.34.1
Powered by blists - more mailing lists