lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee48e0f3-7b48-4f1c-a1d8-07779e35e949@amd.com>
Date: Wed, 21 Jan 2026 10:54:11 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Aaron Lu <ziqianlu@...edance.com>, Zicheng Qu <quzicheng@...wei.com>
CC: <bsegall@...gle.com>, <dhaval@...ux.vnet.ibm.com>,
	<dietmar.eggemann@....com>, <juri.lelli@...hat.com>,
	<linux-kernel@...r.kernel.org>, <mgorman@...e.de>, <mingo@...hat.com>,
	<peterz@...radead.org>, <rostedt@...dmis.org>, <tanghui20@...wei.com>,
	<vatsa@...ux.vnet.ibm.com>, <vincent.guittot@...aro.org>,
	<vschneid@...hat.com>, <zhangqiao22@...wei.com>
Subject: Re: [PATCH] sched: Re-evaluate scheduling when migrating queued tasks
 out of throttled cgroups

Hello Aaron,

On 1/21/2026 9:19 AM, Aaron Lu wrote:
> On Tue, Jan 20, 2026 at 03:25:49AM +0000, Zicheng Qu wrote:
>> Consider the following sequence on a CPU configured with nohz_full:
>>
>> 1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS
>>    bandwidth control. The gse (cgroup A) where the task P attached is
>> dequeued and the CPU switches to idle.
>>
>> 2) Before cgroup A is unthrottled, task P is migrated from cgroup A to
>>    another cgroup B (not throttled).
>>
>>    During sched_move_task(), the task P is observed as queued but not
>> running, and therefore no resched_curr() is triggered.
>>
>> 3) Since the CPU is nohz_full, it remains in do_idle() waiting for an
>>    explicit scheduling event, i.e., resched_curr().
>>
>> 4) Later, cgroup A is unthrottled. However, the task P has already been
>>    migrated out of cgroup A, so unthrottle_cfs_rq() may observe
>> load_weight == 0 and return early without resched_curr() called.
> 
> I suppose this is only possible when the unthrottled cfs_rq has been
> fully decayed, i.e. !cfs_rq->on_list is true?

Ack! Since we detach the task from cfs_rq during
task_change_group_fair(), the cfs_rq_is_decayed() during
tg_unthrottle_up can return true and we skip putting the cfs_rq on the
leaf_cfs_rq_list and unthrottle_cfs_rq() will skip the resched.

> Because only in that case,
> it will skip the resched_curr() in the bottom of unthrottle_cfs_rq() for
> the scenario you have described.

Indeed. Happy coincidence that we checked for "rq->cfs.nr_queued" and an
unrelated unthrottle could still force a resched for a missed one :-)

> 
> Looking at this logic,  I feel the early return due to
> (!cfs_rq->load.weight) && (!cfs_rq->on_list) is strange, because the
> resched in bottom:
> 
> 	/* Determine whether we need to wake up potentially idle CPU: */
> 		if (rq->curr == rq->idle && rq->cfs.nr_queued)
> 			resched_curr(rq);
> 
> should not depend on whether cfs_rq is fully decayed or not...

But if it is off list, then it doesn't have any tasks to resched
anyways and in Zicheng scenario too, the cfs_rq won't have any
tasks either at the time of unthrottle.

> 
> I think it should be something like this:
> - complete the branch if no task enqueued but still on_list;
> - only resched_curr() if task gets enqueued
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e71302282671c..e09da54a5d117 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6009,9 +6009,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>  	/* update hierarchical throttle state */
>  	walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);
>  
> -	if (!cfs_rq->load.weight) {
> -		if (!cfs_rq->on_list)
> -			return;
> +	if (!cfs_rq->load.weight && cfs_rq->on_list) {
>  		/*
>  		 * Nothing to run but something to decay (on_list)?
>  		 * Complete the branch.
> @@ -6025,7 +6023,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>  	assert_list_leaf_cfs_rq(rq);
>  
>  	/* Determine whether we need to wake up potentially idle CPU: */
> -	if (rq->curr == rq->idle && rq->cfs.nr_queued)
> +	if (rq->curr == rq->idle && cfs_rq->nr_queued)
>  		resched_curr(rq);
>  }
>  
> 
> Thoughts?

Yes, checking for cfs_rq->nr_queued should indicate if new tasks were
woken on this unthrottled hierarchy. Should make it easier to spot the
scenarios like the one that Zicheng experienced if there are any more
of those lurking around.

-- 
Thanks and Regards,
Prateek


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ