linux-kernel - Re: [PATCH v4 3/5] sched/fair: Switch to task based throttle model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm26jz2ftfw7.fsf@google.com>
Date: Wed, 03 Sep 2025 13:55:36 -0700
From: Benjamin Segall <bsegall@...gle.com>
To: Aaron Lu <ziqianlu@...edance.com>
Cc: Valentin Schneider <vschneid@...hat.com>,  K Prateek Nayak
 <kprateek.nayak@....com>,  Peter Zijlstra <peterz@...radead.org>,
  Chengming Zhou <chengming.zhou@...ux.dev>,  Josh Don
 <joshdon@...gle.com>,  Ingo Molnar <mingo@...hat.com>,  Vincent Guittot
 <vincent.guittot@...aro.org>,  Xi Wang <xii@...gle.com>,
  linux-kernel@...r.kernel.org,  Juri Lelli <juri.lelli@...hat.com>,
  Dietmar Eggemann <dietmar.eggemann@....com>,  Steven Rostedt
 <rostedt@...dmis.org>,  Mel Gorman <mgorman@...e.de>,  Chuyi Zhou
 <zhouchuyi@...edance.com>,  Jan Kiszka <jan.kiszka@...mens.com>,  Florian
 Bezdeka <florian.bezdeka@...mens.com>,  Songtang Liu
 <liusongtang@...edance.com>,  Chen Yu <yu.c.chen@...el.com>,  Matteo
 Martelli <matteo.martelli@...ethink.co.uk>,  Michal Koutný
 <mkoutny@...e.com>,  Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH v4 3/5] sched/fair: Switch to task based throttle model

Aaron Lu <ziqianlu@...edance.com> writes:

> +static bool enqueue_throttled_task(struct task_struct *p)
> +{
> +	struct cfs_rq *cfs_rq = cfs_rq_of(&p->se);
> +
> +	/* @p should have gone through dequeue_throttled_task() first */
> +	WARN_ON_ONCE(!list_empty(&p->throttle_node));
> +
> +	/*
> +	 * If the throttled task @p is enqueued to a throttled cfs_rq,
> +	 * take the fast path by directly putting the task on the
> +	 * target cfs_rq's limbo list.
> +	 *
> +	 * Do not do that when @p is current because the following race can
> +	 * cause @p's group_node to be incorectly re-insterted in its rq's
> +	 * cfs_tasks list, despite being throttled:
> +	 *
> +	 *     cpuX                       cpuY
> +	 *   p ret2user
> +	 *  throttle_cfs_rq_work()  sched_move_task(p)
> +	 *  LOCK task_rq_lock
> +	 *  dequeue_task_fair(p)
> +	 *  UNLOCK task_rq_lock
> +	 *                          LOCK task_rq_lock
> +	 *                          task_current_donor(p) == true
> +	 *                          task_on_rq_queued(p) == true
> +	 *                          dequeue_task(p)
> +	 *                          put_prev_task(p)
> +	 *                          sched_change_group()
> +	 *                          enqueue_task(p) -> p's new cfs_rq
> +	 *                                             is throttled, go
> +	 *                                             fast path and skip
> +	 *                                             actual enqueue
> +	 *                          set_next_task(p)
> +	 *                    list_move(&se->group_node, &rq->cfs_tasks); // bug
> +	 *  schedule()
> +	 *
> +	 * In the above race case, @p current cfs_rq is in the same rq as
> +	 * its previous cfs_rq because sched_move_task() only moves a task
> +	 * to a different group from the same rq, so we can use its current
> +	 * cfs_rq to derive rq and test if the task is current.
> +	 */
> +	if (throttled_hierarchy(cfs_rq) &&
> +	    !task_current_donor(rq_of(cfs_rq), p)) {
> +		list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
> +		return true;
> +	}
> +
> +	/* we can't take the fast path, do an actual enqueue*/
> +	p->throttled = false;
> +	return false;
> +}
> +

Is there a reason that __set_next_task_fair cannot check p->se.on_rq as
well as (or instead of) task_on_rq_queued()? All of the _entity parts of
set_next/put_prev check se.on_rq for this sort of thing, so that seems
fairly standard. And se.on_rq should exactly match if the task is on
cfs_tasks since that add/remove is done in account_entity_{en,de}queue.