[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DENNDH3ON03P.2C20N76HOLQ6C@siemens.com>
Date: Tue, 02 Dec 2025 11:09:48 +0100
From: Florian Bezdeka <florian.bezdeka@...mens.com>
To: "Aaron Lu" <ziqianlu@...edance.com>, "Bezdeka, Florian"
<florian.bezdeka@...mens.com>
Cc: "bsegall@...gle.com" <bsegall@...gle.com>, "vschneid@...hat.com"
<vschneid@...hat.com>, "xii@...gle.com" <xii@...gle.com>,
"chengming.zhou@...ux.dev" <chengming.zhou@...ux.dev>, "mingo@...hat.com"
<mingo@...hat.com>, "joshdon@...gle.com" <joshdon@...gle.com>,
"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"kprateek.nayak@....com" <kprateek.nayak@....com>, "peterz@...radead.org"
<peterz@...radead.org>, "bigeasy@...utronix.de" <bigeasy@...utronix.de>,
"yu.c.chen@...el.com" <yu.c.chen@...el.com>, "dietmar.eggemann@....com"
<dietmar.eggemann@....com>, "rostedt@...dmis.org" <rostedt@...dmis.org>,
"juri.lelli@...hat.com" <juri.lelli@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mkoutny@...e.com" <mkoutny@...e.com>, "mgorman@...e.de" <mgorman@...e.de>,
"zhouchuyi@...edance.com" <zhouchuyi@...edance.com>, "Kiszka, Jan"
<jan.kiszka@...mens.com>, "liusongtang@...edance.com"
<liusongtang@...edance.com>, "matteo.martelli@...ethink.co.uk"
<matteo.martelli@...ethink.co.uk>
Subject: Re: [PATCH v4 0/5] Defer throttle when task exits to user
On Tue Dec 2, 2025 at 10:43 AM CET, Aaron Lu wrote:
[snip]
>> Hi all,
>>
>> as this all has arrived in 6.18 now - thanks for all the work - I would
>> like to start a discussion about backporting this series - and some more
>> related work, see below - to older stable releases. Especially
>> PREEMPT_RT enabled systems are of interest as this series fixes a
>> serious system freeze.
>>
>> Has someone already looked into the backporting topic?
>>
>> I can remember from the previous discussion that everything below 6.12
>> is hard, as scheduler internals have changed (EEVDF, vlag). Still, 6.12
>> would be valuable.
>>
>> I have the following commits on my radar:
>>
>> This series:
>>
>> 2cd571245b43 ("sched/fair: Add related data structure for task based throttle")
>> 7fc2d1439247 ("sched/fair: Implement throttle task work and related helpers")
>> e1fad12dcb66 ("sched/fair: Switch to task based throttle model")
>> eb962f251fbb ("sched/fair: Task based throttle time accounting")
>> 5b726e9bf954 ("sched/fair: Get rid of throttled_lb_pair()")
>>
>> Follow up series:
>> https://lore.kernel.org/all/20250910095044.278-1-ziqianlu@bytedance.com/
>>
>> fe8d238e646e ("sched/fair: Propagate load for throttled cfs_rq")
>> fcd394866e3d ("sched/fair: update_cfs_group() for throttled cfs_rqs")
>> 253b3f587241 ("sched/fair: Do not special case tasks in throttled hierarchy")
>> 0d4eaf8caf8c ("sched/fair: Do not balance task to a throttled cfs_rq")
>>
>
> There is one more fix before the next fix:
> https://lore.kernel.org/all/20251021053522.37583-1-kprateek.nayak@amd.com/
>
> 0e4a169d1a2b ("sched/fair: Start a cfs_rq on throttled hierarchy with
> PELT clock throttled")
>
Thanks for the heads-up! Highly appreciated.
>> Another follow up:
>> https://lore.kernel.org/all/20250929074645.416-1-ziqianlu@bytedance.com/
>>
>> 956dfda6a708 ("sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining")
>>
>>
>> That should hopefully be enough, right?
>>
>
> I think so.
>
>> Any concerns, additional thoughts, missing pieces? Please let me know!
>
> 1 if the base does not have Josh's async unthrottle:
> 8ad075c2eb1f ("sched: Async unthrottling for cfs bandwidth"),
> make sure to backport that too or the distribute runtime timer handler
> can be time consuming.
Thanks, noted.
>
> 2 if the base uses cfs, in dequeue_throttled_task(), the task's vruntime
> has to be adjusted like below:
>
> static void dequeue_throttled_task(struct task_struct *p, int flags)
> {
> WARN_ON_ONCE(p->se.on_rq);
> list_del_init(&p->throttle_node);
>
> /* task blocked after throttled */
> if (flags & DEQUEUE_SLEEP)
> p->throttled = false;
> else {
> struct sched_entity *se = &p->se;
> struct cfs_rq *cfs_rq;
>
> /*
> * We are leaving this cfs_rq but our vruntime is not
> * normalized yet as that is only done for tasks dequeued
> * with !DEQUEUE_SLEEP in dequeue_entity(), so we have to:
> * Fix up our vruntime so that the current sleep doesn't
> * cause 'unlimited' sleep bonus.
> */
> cfs_rq = cfs_rq_of(se);
> place_entity(cfs_rq, se, 0);
> se->vruntime -= cfs_rq->min_vruntime;
> }
> }
>
> 3 Also in this dequeue_throttled_task() function, if the base doesn't
> have commit e1f078f50478("sched/fair: Combine detach into dequeue
> when migrating task"), then it's not necessary to do the following
> because migrate_task_rq_fair() have already dealed with that:
> /*
> * task is migrating off its old cfs_rq, detach
> * the task's load from its old cfs_rq.
> */
> if (task_on_rq_migrating(p))
> detach_task_cfs_rq(p);
>
> That's what I can think of right now.
>
> I did a backport for 5.15 based kernel, I can probably post it somewhere
> if it is useful, just let me know.
So you backported the entire logic back to older releases already. Wow.
Would 6.1 be possible as well, or any other blockers ahead? Asking
because 6.1 is the baseline for the affected systems.
I think 6.12 should be doable with the patches mentioned above (your
comments included). If your backport would help us to come down to 6.1
that would be even better.
Powered by blists - more mailing lists