linux-kernel - [tip: sched/core] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <177011751256.2495410.3734365532314031073.tip-bot2@tip-bot2>
Date: Tue, 03 Feb 2026 11:18:32 -0000
From: "tip-bot2 for Zicheng Qu" <tip-bot2@...utronix.de>
To: linux-tip-commits@...r.kernel.org
Cc: Zicheng Qu <quzicheng@...wei.com>,
 "Peter Zijlstra (Intel)" <peterz@...radead.org>,
 K Prateek Nayak <kprateek.nayak@....com>, Aaron Lu <ziqianlu@...edance.com>,
 x86@...nel.org, linux-kernel@...r.kernel.org
Subject: [tip: sched/core] sched: Re-evaluate scheduling when migrating queued
 tasks out of throttled cgroups

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     e34881c84c255bc300f24d9fe685324be20da3d1
Gitweb:        https://git.kernel.org/tip/e34881c84c255bc300f24d9fe685324be20da3d1
Author:        Zicheng Qu <quzicheng@...wei.com>
AuthorDate:    Fri, 30 Jan 2026 08:34:38 
Committer:     Peter Zijlstra <peterz@...radead.org>
CommitterDate: Tue, 03 Feb 2026 12:04:19 +01:00

sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups

Consider the following sequence on a CPU configured with nohz_full:

1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS
   bandwidth control. The gse (cgroup A) where the task P attached is
dequeued and the CPU switches to idle.

2) Before cgroup A is unthrottled, task P is migrated from cgroup A to
   another cgroup B (not throttled).

   During sched_move_task(), the task P is observed as queued but not
running, and therefore no resched_curr() is triggered.

3) Since the CPU is nohz_full, it remains in do_idle() waiting for an
   explicit scheduling event, i.e., resched_curr().

4) For kernel <= 5.10: Later, cgroup A is unthrottled. However, the task
   P has already been migrated out of cgroup A, so unthrottle_cfs_rq()
may observe load_weight == 0 and return early without resched_curr()
called. For kernel >= 6.6: The unthrottling path normally triggers
`resched_curr()` almost cases even when no runnable tasks remain in the
unthrottled cgroup, preventing the idle stall described above. However,
if cgroup A is removed before it gets unthrottled, the unthrottling path
for cgroup A is never executed. In a result, no `resched_curr()` can be
called.

5) At this point, the task P is runnable in cgroup B (not throttled), but
the CPU remains in do_idle() with no pending reschedule point. The
system stays in this state until an unrelated event (e.g. a new task
wakeup or any cases) that can trigger a resched_curr() breaks the
nohz_full idle state, and then the task P finally gets scheduled.

The root cause is that sched_move_task() may classify the task as only
queued, not running, and therefore fails to trigger a resched_curr(),
while the later unthrottling path no longer has visibility of the
migrated task.

Preserve the existing behavior for running tasks by issuing
resched_curr(), and explicitly invoke check_preempt_curr() for tasks
that were queued at the time of migration. This ensures that runnable
tasks are reconsidered for scheduling even when nohz_full suppresses
periodic ticks.

Fixes: 29f59db3a74b ("sched: group-scheduler core")
Signed-off-by: Zicheng Qu <quzicheng@...wei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@....com>
Reviewed-by: Aaron Lu <ziqianlu@...edance.com>
Tested-by: Aaron Lu <ziqianlu@...edance.com>
Link: https://patch.msgid.link/20260130083438.1122457-1-quzicheng@huawei.com
---
 kernel/sched/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8f2dc0a..b411e4f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9126,6 +9126,7 @@ void sched_move_task(struct task_struct *tsk, bool for_autogroup)
 {
 	unsigned int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
 	bool resched = false;
+	bool queued = false;
 	struct rq *rq;

 	CLASS(task_rq_lock, rq_guard)(tsk);
@@ -9137,10 +9138,13 @@ void sched_move_task(struct task_struct *tsk, bool for_autogroup)
 			scx_cgroup_move_task(tsk);
 		if (scope->running)
 			resched = true;
+		queued = scope->queued;
 	}

 	if (resched)
 		resched_curr(rq);
+	else if (queued)
+		wakeup_preempt(rq, tsk, 0);

 	__balance_callbacks(rq, &rq_guard.rf);
 }