linux-kernel - Re: [PATCH] sched/fair: Prevent cfs_rq from being unthrottled with zero runtime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <84382429-02c1-12d5-bdf4-23e880246cf3@gmail.com>
Date: Tue, 14 Oct 2025 19:01:15 +0800
From: Hao Jia <jiahao.kernel@...il.com>
To: Aaron Lu <ziqianlu@...edance.com>
Cc: Valentin Schneider <vschneid@...hat.com>, Ben Segall
 <bsegall@...gle.com>, K Prateek Nayak <kprateek.nayak@....com>,
 Peter Zijlstra <peterz@...radead.org>,
 Chengming Zhou <chengming.zhou@...ux.dev>, Josh Don <joshdon@...gle.com>,
 Ingo Molnar <mingo@...hat.com>, Vincent Guittot
 <vincent.guittot@...aro.org>, Xi Wang <xii@...gle.com>,
 linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
 Chuyi Zhou <zhouchuyi@...edance.com>, Jan Kiszka <jan.kiszka@...mens.com>,
 Florian Bezdeka <florian.bezdeka@...mens.com>,
 Songtang Liu <liusongtang@...edance.com>, Chen Yu <yu.c.chen@...el.com>,
 Matteo Martelli <matteo.martelli@...ethink.co.uk>,
 Michal Koutný <mkoutny@...e.com>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH] sched/fair: Prevent cfs_rq from being unthrottled with
 zero runtime_remaining


Hello Aaron,

Thank you for your reply.

On 2025/10/14 17:11, Aaron Lu wrote:
> Hi Hao,
> 
> On Tue, Oct 14, 2025 at 03:43:10PM +0800, Hao Jia wrote:
>>
>> Hello Aaron,
>>
>> On 2025/9/29 15:46, Aaron Lu wrote:
>>> When a cfs_rq is to be throttled, its limbo list should be empty and
>>> that's why there is a warn in tg_throttle_down() for non empty
>>> cfs_rq->throttled_limbo_list.
>>>
>>> When running a test with the following hierarchy:
>>>
>>>             root
>>>           /      \
>>>           A*     ...
>>>        /  |  \   ...
>>>           B
>>>          /  \
>>>         C*
>>>
>>> where both A and C have quota settings, that warn on non empty limbo list
>>> is triggered for a cfs_rq of C, let's call it cfs_rq_c(and ignore the cpu
>>> part of the cfs_rq for the sake of simpler representation).
>>>
>>
>> I encountered a similar warning a while ago and fixed it. I have a question
>> I'd like to ask. tg_unthrottle_up(cfs_rq_C) calls enqueue_task_fair(p) to
>> enqueue a task, which requires that the runtime_remaining of task p's entire
>> task_group hierarchy be greater than 0.
>>
>> In addition to the case you fixed above,
>> When bandwidth is running normally, Is it possible that there's a corner
>> case where cfs_A->runtime_remaining > 0, but cfs_B->runtime_remaining < 0
>> could trigger a similar warning?
> 
> Do you mean B also has quota set and cfs_B's runtime_remaining < 0?
> In this case, B should be throttled and C is a descendent of B so should
> also be throttled, i.e. C can't be unthrottled when B is in throttled
> state. Do I understand you correctly?
>
Yes, both A and B have quota set.

Is there a possible corner case?
Asynchronous unthrottling causes other running entities to completely 
consume cfs_B->runtime_remaining (cfs_B->runtime_remaining < 0) but not 
completely consume cfs_A->runtime_remaining (cfs_A->runtime_remaining > 
0) when we call unthrottle_cfs_rq(cfs_rq_A) .

When we unthrottle_cfs_rq(cfs_rq_A), cfs_A->runtime_remaining > 0, but 
if cfs_B->runtime_remaining < 0 at this time,
therefore, when 
enqueue_task_fair(p)->check_enqueue_throttle(cfs_rq_B)->throttle_cfs_rq(cfs_rq_B), 
an warnning may be triggered.

My core question is:
When we call unthrottle_cfs_rq(cfs_rq_A), we only check 
cfs_rq_A->runtime_remaining. However, 
enqueue_task_fair(p)->enqueue_entity(C->B->A)->check_enqueue_throttle() 
does require that the runtime_remaining of each task_group level of task 
p is greater than 0.

Can we guarantee this?

Thanks,
Hao

>>
>> So, I previously tried to fix this issue using the following code, adding
>> the ENQUEUE_THROTTLE flag to ensure that tasks enqueued in
>> tg_unthrottle_up() aren't throttled.
>>
> 
> Yeah I think this can also fix the warning.
> I'm not sure if it is a good idea though, because on unthrottle, the
> expectation is, this cfs_rq should have runtime_remaining > 0 and if
> it's not the case, I think it is better to know why.
> 
> Thanks.