[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251202094322.GA3378032@bytedance.com>
Date: Tue, 2 Dec 2025 17:43:22 +0800
From: "Aaron Lu" <ziqianlu@...edance.com>
To: "Bezdeka, Florian" <florian.bezdeka@...mens.com>
Cc: "bsegall@...gle.com" <bsegall@...gle.com>,
"vschneid@...hat.com" <vschneid@...hat.com>,
"xii@...gle.com" <xii@...gle.com>,
"chengming.zhou@...ux.dev" <chengming.zhou@...ux.dev>,
"mingo@...hat.com" <mingo@...hat.com>,
"joshdon@...gle.com" <joshdon@...gle.com>,
"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"kprateek.nayak@....com" <kprateek.nayak@....com>,
"peterz@...radead.org" <peterz@...radead.org>,
"bigeasy@...utronix.de" <bigeasy@...utronix.de>,
"yu.c.chen@...el.com" <yu.c.chen@...el.com>,
"dietmar.eggemann@....com" <dietmar.eggemann@....com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"juri.lelli@...hat.com" <juri.lelli@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mkoutny@...e.com" <mkoutny@...e.com>,
"mgorman@...e.de" <mgorman@...e.de>,
"zhouchuyi@...edance.com" <zhouchuyi@...edance.com>, "Kiszka,
Jan" <jan.kiszka@...mens.com>,
"liusongtang@...edance.com" <liusongtang@...edance.com>,
"matteo.martelli@...ethink.co.uk" <matteo.martelli@...ethink.co.uk>
Subject: Re: [PATCH v4 0/5] Defer throttle when task exits to user
On Tue, Dec 02, 2025 at 08:59:15AM +0000, Bezdeka, Florian wrote:
> On Fri, 2025-08-29 at 16:11 +0800, Aaron Lu wrote:
> > v4:
> > - Add cfs_bandwidth_used() in task_is_throttled() and remove unlikely
> > for task_is_throttled(), suggested by Valetin Schneider;
> > - Add a warn for non empty throttle_node in enqueue_throttled_task(),
> > suggested by Valetin Schneider;
> > - Improve comments in enqueue_throttled_task() by Valetin Schneider;
> > - Clear throttled for to-be-unthrottled tasks in tg_unthrottle_up();
> > - Change throttled and pelt_clock_throttled fields in cfs_rq from int to
> > bool, reported by LKP;
> > - Improve changelog for patch4 by Valetin Schneider.
> >
> > Thanks a lot for all the reviews and tests, I hope I didn't miss any of
> > them but if I do, please let me know. I've also run Jan's rt reproducer
> > and songtang's stress test and didn't notice any problem.
> >
> > Apply on top of sched/core, head commit 1b5f1454091e("sched/idle: Remove
> > play_idle()").
> >
>
> Hi all,
>
> as this all has arrived in 6.18 now - thanks for all the work - I would
> like to start a discussion about backporting this series - and some more
> related work, see below - to older stable releases. Especially
> PREEMPT_RT enabled systems are of interest as this series fixes a
> serious system freeze.
>
> Has someone already looked into the backporting topic?
>
> I can remember from the previous discussion that everything below 6.12
> is hard, as scheduler internals have changed (EEVDF, vlag). Still, 6.12
> would be valuable.
>
> I have the following commits on my radar:
>
> This series:
>
> 2cd571245b43 ("sched/fair: Add related data structure for task based throttle")
> 7fc2d1439247 ("sched/fair: Implement throttle task work and related helpers")
> e1fad12dcb66 ("sched/fair: Switch to task based throttle model")
> eb962f251fbb ("sched/fair: Task based throttle time accounting")
> 5b726e9bf954 ("sched/fair: Get rid of throttled_lb_pair()")
>
> Follow up series:
> https://lore.kernel.org/all/20250910095044.278-1-ziqianlu@bytedance.com/
>
> fe8d238e646e ("sched/fair: Propagate load for throttled cfs_rq")
> fcd394866e3d ("sched/fair: update_cfs_group() for throttled cfs_rqs")
> 253b3f587241 ("sched/fair: Do not special case tasks in throttled hierarchy")
> 0d4eaf8caf8c ("sched/fair: Do not balance task to a throttled cfs_rq")
>
There is one more fix before the next fix:
https://lore.kernel.org/all/20251021053522.37583-1-kprateek.nayak@amd.com/
0e4a169d1a2b ("sched/fair: Start a cfs_rq on throttled hierarchy with
PELT clock throttled")
> Another follow up:
> https://lore.kernel.org/all/20250929074645.416-1-ziqianlu@bytedance.com/
>
> 956dfda6a708 ("sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining")
>
>
> That should hopefully be enough, right?
>
I think so.
> Any concerns, additional thoughts, missing peaces? Please let me know!
1 if the base does not have Josh's async unthrottle:
8ad075c2eb1f ("sched: Async unthrottling for cfs bandwidth"),
make sure to backport that too or the distribute runtime timer handler
can be time consuming.
2 if the base uses cfs, in dequeue_throttled_task(), the task's vruntime
has to be adjusted like below:
static void dequeue_throttled_task(struct task_struct *p, int flags)
{
WARN_ON_ONCE(p->se.on_rq);
list_del_init(&p->throttle_node);
/* task blocked after throttled */
if (flags & DEQUEUE_SLEEP)
p->throttled = false;
else {
struct sched_entity *se = &p->se;
struct cfs_rq *cfs_rq;
/*
* We are leaving this cfs_rq but our vruntime is not
* normalized yet as that is only done for tasks dequeued
* with !DEQUEUE_SLEEP in dequeue_entity(), so we have to:
* Fix up our vruntime so that the current sleep doesn't
* cause 'unlimited' sleep bonus.
*/
cfs_rq = cfs_rq_of(se);
place_entity(cfs_rq, se, 0);
se->vruntime -= cfs_rq->min_vruntime;
}
}
3 Also in this dequeue_throttled_task() function, if the base doesn't
have commit e1f078f50478("sched/fair: Combine detach into dequeue
when migrating task"), then it's not necessary to do the following
because migrate_task_rq_fair() have already dealed with that:
/*
* task is migrating off its old cfs_rq, detach
* the task's load from its old cfs_rq.
*/
if (task_on_rq_migrating(p))
detach_task_cfs_rq(p);
That's what I can think of right now.
I did a backport for 5.15 based kernel, I can probably post it somewhere
if it is useful, just let me know.
Powered by blists - more mailing lists