[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xm26jz2ftfw7.fsf@google.com>
Date: Wed, 03 Sep 2025 13:55:36 -0700
From: Benjamin Segall <bsegall@...gle.com>
To: Aaron Lu <ziqianlu@...edance.com>
Cc: Valentin Schneider <vschneid@...hat.com>, K Prateek Nayak
<kprateek.nayak@....com>, Peter Zijlstra <peterz@...radead.org>,
Chengming Zhou <chengming.zhou@...ux.dev>, Josh Don
<joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Xi Wang <xii@...gle.com>,
linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>, Chuyi Zhou
<zhouchuyi@...edance.com>, Jan Kiszka <jan.kiszka@...mens.com>, Florian
Bezdeka <florian.bezdeka@...mens.com>, Songtang Liu
<liusongtang@...edance.com>, Chen Yu <yu.c.chen@...el.com>, Matteo
Martelli <matteo.martelli@...ethink.co.uk>, Michal Koutný
<mkoutny@...e.com>, Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH v4 3/5] sched/fair: Switch to task based throttle model
Aaron Lu <ziqianlu@...edance.com> writes:
> +static bool enqueue_throttled_task(struct task_struct *p)
> +{
> + struct cfs_rq *cfs_rq = cfs_rq_of(&p->se);
> +
> + /* @p should have gone through dequeue_throttled_task() first */
> + WARN_ON_ONCE(!list_empty(&p->throttle_node));
> +
> + /*
> + * If the throttled task @p is enqueued to a throttled cfs_rq,
> + * take the fast path by directly putting the task on the
> + * target cfs_rq's limbo list.
> + *
> + * Do not do that when @p is current because the following race can
> + * cause @p's group_node to be incorectly re-insterted in its rq's
> + * cfs_tasks list, despite being throttled:
> + *
> + * cpuX cpuY
> + * p ret2user
> + * throttle_cfs_rq_work() sched_move_task(p)
> + * LOCK task_rq_lock
> + * dequeue_task_fair(p)
> + * UNLOCK task_rq_lock
> + * LOCK task_rq_lock
> + * task_current_donor(p) == true
> + * task_on_rq_queued(p) == true
> + * dequeue_task(p)
> + * put_prev_task(p)
> + * sched_change_group()
> + * enqueue_task(p) -> p's new cfs_rq
> + * is throttled, go
> + * fast path and skip
> + * actual enqueue
> + * set_next_task(p)
> + * list_move(&se->group_node, &rq->cfs_tasks); // bug
> + * schedule()
> + *
> + * In the above race case, @p current cfs_rq is in the same rq as
> + * its previous cfs_rq because sched_move_task() only moves a task
> + * to a different group from the same rq, so we can use its current
> + * cfs_rq to derive rq and test if the task is current.
> + */
> + if (throttled_hierarchy(cfs_rq) &&
> + !task_current_donor(rq_of(cfs_rq), p)) {
> + list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
> + return true;
> + }
> +
> + /* we can't take the fast path, do an actual enqueue*/
> + p->throttled = false;
> + return false;
> +}
> +
Is there a reason that __set_next_task_fair cannot check p->se.on_rq as
well as (or instead of) task_on_rq_queued()? All of the _entity parts of
set_next/put_prev check se.on_rq for this sort of thing, so that seems
fairly standard. And se.on_rq should exactly match if the task is on
cfs_tasks since that add/remove is done in account_entity_{en,de}queue.
Powered by blists - more mailing lists