[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250910030329.GB90@bytedance>
Date: Wed, 10 Sep 2025 11:03:29 +0800
From: Aaron Lu <ziqianlu@...edance.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Benjamin Segall <bsegall@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Valentin Schneider <vschneid@...hat.com>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
Chuyi Zhou <zhouchuyi@...edance.com>,
Jan Kiszka <jan.kiszka@...mens.com>,
Florian Bezdeka <florian.bezdeka@...mens.com>,
Songtang Liu <liusongtang@...edance.com>,
Chen Yu <yu.c.chen@...el.com>,
Matteo Martelli <matteo.martelli@...ethink.co.uk>,
Michal Koutn?? <mkoutny@...e.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH v4 3/5] sched/fair: Switch to task based throttle model
On Thu, Sep 04, 2025 at 07:05:04PM +0800, Aaron Lu wrote:
> On Thu, Sep 04, 2025 at 03:21:06PM +0530, K Prateek Nayak wrote:
> > Hello Aaron,
> >
> > On 9/4/2025 1:46 PM, Aaron Lu wrote:
> > > @@ -8722,15 +8730,6 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> > > if (unlikely(se == pse))
> > > return;
> > >
> > > - /*
> > > - * This is possible from callers such as attach_tasks(), in which we
> > > - * unconditionally wakeup_preempt() after an enqueue (which may have
> > > - * lead to a throttle). This both saves work and prevents false
> > > - * next-buddy nomination below.
> > > - */
> > > - if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
> > > - return;
> >
> > I think we should have a:
> >
> > if (task_is_throttled(p))
> > return;
> >
> > here. I can see at least one possibility via prio_changed_fair()
>
> Ah right. I didn't realize wakeup_preempt() can be called for a throttled
> task, I think it is not expected. What about forbid that :)
> (not tested in anyway, just to show the idea and get feedback)
>
Turned out there are other places that also call wakeup_preempt() for a
throttled task, like sched_setaffinity() -> move_queued_task() ->
wakeup_preempt(), and it's not possible to change the logic there as
it's outside of fair, so I'll go with your suggestion to add a
task_is_throttled() check in check_preempt_wakeup_fair().
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index cb93e74a850e8..f1383aede764f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -13135,7 +13135,11 @@ static void task_fork_fair(struct task_struct *p)
> static void
> prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
> {
> - if (!task_on_rq_queued(p))
> + /*
> + * p->on_rq can be set for throttled task but there is no need to
> + * check wakeup preempt for throttled task, so use p->se.on_rq instead.
> + */
> + if (!p->se.on_rq)
> return;
>
> if (rq->cfs.nr_queued == 1)
>
> > where a throttled task might reach here. Rest looks good. I'll
> > still wait on Ben for the update_cfs_group() bits :)
> >
> > > -
> > > if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
> > > set_next_buddy(pse);
> > > }
Powered by blists - more mailing lists