[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250523105222.GJ24938@noisy.programming.kicks-ass.net>
Date: Fri, 23 May 2025 12:52:22 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Aaron Lu <ziqianlu@...edance.com>
Cc: Valentin Schneider <vschneid@...hat.com>,
Ben Segall <bsegall@...gle.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Chuyi Zhou <zhouchuyi@...edance.com>,
Jan Kiszka <jan.kiszka@...mens.com>,
Florian Bezdeka <florian.bezdeka@...mens.com>,
Paul McKenney <paulmck@...nel.org>
Subject: Re: [PATCH 2/7] sched/fair: prepare throttle path for task based
throttle
On Fri, May 23, 2025 at 05:53:50PM +0800, Aaron Lu wrote:
> On Thu, May 22, 2025 at 08:40:02PM +0800, Aaron Lu wrote:
> > On Thu, May 22, 2025 at 01:54:18PM +0200, Peter Zijlstra wrote:
> > > On Thu, May 22, 2025 at 07:44:55PM +0800, Aaron Lu wrote:
> > > > On Thu, May 22, 2025 at 12:48:43PM +0200, Peter Zijlstra wrote:
> > > > > On Tue, May 20, 2025 at 06:41:05PM +0800, Aaron Lu wrote:
> > > > >
> > > > > > static void throttle_cfs_rq_work(struct callback_head *work)
> > > > > > {
> > > > > > + struct task_struct *p = container_of(work, struct task_struct, sched_throttle_work);
> > > > > > + struct sched_entity *se;
> > > > > > + struct cfs_rq *cfs_rq;
> > > > > > + struct rq *rq;
> > > > > > +
> > > > > > + WARN_ON_ONCE(p != current);
> > > > > > + p->sched_throttle_work.next = &p->sched_throttle_work;
> > > > > > +
> > > > > > + /*
> > > > > > + * If task is exiting, then there won't be a return to userspace, so we
> > > > > > + * don't have to bother with any of this.
> > > > > > + */
> > > > > > + if ((p->flags & PF_EXITING))
> > > > > > + return;
> > > > > > +
> > > > > > + scoped_guard(task_rq_lock, p) {
> > > > > > + se = &p->se;
> > > > > > + cfs_rq = cfs_rq_of(se);
> > > > > > +
> > > > > > + /* Raced, forget */
> > > > > > + if (p->sched_class != &fair_sched_class)
> > > > > > + return;
> > > > > > +
> > > > > > + /*
> > > > > > + * If not in limbo, then either replenish has happened or this
> > > > > > + * task got migrated out of the throttled cfs_rq, move along.
> > > > > > + */
> > > > > > + if (!cfs_rq->throttle_count)
> > > > > > + return;
> > > > > > + rq = scope.rq;
> > > > > > + update_rq_clock(rq);
> > > > > > + WARN_ON_ONCE(!list_empty(&p->throttle_node));
> > > > > > + dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL);
> > > > > > + list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
> > > > > > + resched_curr(rq);
> > > > > > + }
> > > > > > +
> > > > > > + cond_resched_tasks_rcu_qs();
> > > > > > }
> > > > >
> > > > > What's that cond_resched thing about? The general plan is to make
> > > > > cond_resched go away.
> > > >
> > > > Got it.
> > > >
> > > > The purpose is to let throttled task schedule and also mark a task rcu
> > > > quiescent state. Without this cond_resched_tasks_rcu_qs(), this task
> > > > will be scheduled by cond_resched() in task_work_run() and since that is
> > > > a preempt schedule, it didn't mark a task rcu quiescent state.
> > > >
> > > > Any suggestion here? Perhaps a plain schedule()? Thanks.
> > >
> > > I am confused, this is task_work_run(), that is ran from
> > > exit_to_user_mode_loop(), which contains a schedule().
> >
>
> I should probably have added that the schedule() call contained in
> exit_to_user_mode_loop() is early in that loop, where the to-be-throttled
> task doesn't have need_resched bit set yet.
No, but if it does get set, it will get picked up at:
ti_work = read_thread_flags();
and since TIF_NEED_RESCHED is part of EXIT_TO_USER_MODE_WORK, we'll get
another cycle, and do the schedule() thing.
> > There is a cond_resched() in task_work_run() loop:
> >
> > do {
> > next = work->next;
> > work->func(work);
> > work = next;
> > cond_resched();
> > } while (work);
That cond_resched() is equally going away.
> > And when this throttle work returns with need_resched bit set,
> > cond_resched() will cause a schedule but that didn't mark a task
> > quiescent state...
>
> Another approach I can think of is to add a test of task_is_throttled()
> in rcu_tasks_is_holdout(). I remembered when I tried this before, I can
> hit the following path:
So this really is about task_rcu needing something? Let me go look at
task-rcu.
So AFAICT, exit_to_user_mode_loop() will do schedule(), which will call
__schedule(SM_NONE), which then will have preempt = false and call:
rcu_note_context_switch(false) which in turn will do:
rcu_task_rq(current, false).
This should be sufficient, no?
Powered by blists - more mailing lists