[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z_OW9a_U4WQFWEBH@jlelli-thinkpadt14gen4.remote.csb>
Date: Mon, 7 Apr 2025 11:12:21 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Harshit Agarwal <harshit@...anix.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2] sched/deadline: Fix race in push_dl_task
Hi,
I think I like this version better, but others feel free to disagree. :)
A few comments inline below.
On 17/03/25 02:23, Harshit Agarwal wrote:
> This fix is the deadline version of the change made to the rt scheduler
> titled: "sched/rt: Fix race in push_rt_task".
> Here is the summary of the issue:
I would probably remove the paragraph above and simply describe what the
issues is like you do below.
> When a CPU chooses to call push_dl_task and picks a task to push to
> another CPU's runqueue then it will call find_lock_later_rq method
> which would take a double lock on both CPUs' runqueues. If one of the
> locks aren't readily available, it may lead to dropping the current
> runqueue lock and reacquiring both the locks at once. During this window
> it is possible that the task is already migrated and is running on some
> other CPU. These cases are already handled. However, if the task is
> migrated and has already been executed and another CPU is now trying to
> wake it up (ttwu) such that it is queued again on the runqeue
> (on_rq is 1) and also if the task was run by the same CPU, then the
> current checks will pass even though the task was migrated out and is no
> longer in the pushable tasks list.
> Please go through the original change for more details on the issue.
>
> In this fix, after the lock is obtained inside the find_lock_later_rq we
Please use imperative mode
https://elixir.bootlin.com/linux/v6.13.7/source/Documentation/process/submitting-patches.rst#L94
> ensure that the task is still at the head of pushable tasks list. Also
> removed some checks that are no longer needed with the addition this new
> check.
> However, the check of pushable tasks list only applies when
> find_lock_later_rq is called by push_dl_task. For the other caller i.e.
> dl_task_offline_migration, we use the existing checks.
>
> Signed-off-by: Harshit Agarwal <harshit@...anix.com>
> Cc: stable@...r.kernel.org
> ---
> Changes in v2:
> - As per Juri's suggestion, moved the check inside find_lock_later_rq
> similar to rt change. Here we distinguish among the push_dl_task
> caller vs dl_task_offline_migration by checking if the task is
> throttled or not.
> - Fixed the commit message to refer to the rt change by title.
> - Link to v1:
> https://lore.kernel.org/lkml/20250307204255.60640-1-harshit@nutanix.com/
> ---
> kernel/sched/deadline.c | 66 ++++++++++++++++++++++++++---------------
> 1 file changed, 42 insertions(+), 24 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 38e4537790af..2366801b4557 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2621,6 +2621,25 @@ static int find_later_rq(struct task_struct *task)
> return -1;
> }
>
> +static struct task_struct *pick_next_pushable_dl_task(struct rq *rq)
> +{
> + struct task_struct *p;
> +
> + if (!has_pushable_dl_tasks(rq))
> + return NULL;
> +
> + p = __node_2_pdl(rb_first_cached(&rq->dl.pushable_dl_tasks_root));
> +
> + WARN_ON_ONCE(rq->cpu != task_cpu(p));
> + WARN_ON_ONCE(task_current(rq, p));
> + WARN_ON_ONCE(p->nr_cpus_allowed <= 1);
> +
> + WARN_ON_ONCE(!task_on_rq_queued(p));
> + WARN_ON_ONCE(!dl_task(p));
> +
> + return p;
> +}
> +
> /* Locks the rq it finds */
> static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
> {
> @@ -2648,12 +2667,30 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
>
> /* Retry if something changed. */
> if (double_lock_balance(rq, later_rq)) {
> - if (unlikely(task_rq(task) != rq ||
> + /*
> + * We had to unlock the run queue. In the meantime,
Maybe rephrase as "doble_lock_balance might have released rq->lock ...".
> + * task could have migrated already or had its affinity
> + * changed.
> + * It is possible the task was scheduled, set
> + * "migrate_disabled" and then got preempted, so we must
> + * check the task migration disable flag here too.
> + * For throttled task (dl_task_offline_migration), we
> + * check if the task is migrated to a different rq or
> + * is not a dl task anymore.
> + * For the non-throttled task (push_dl_task), the check
> + * to ensure that this task is still at the head of the
> + * pushable tasks list is enough.
Maybe you can make a bullet point list out of this comment so that it's
even easier to associate it to the conditions below?
> + */
> + if (unlikely(is_migration_disabled(task) ||
> !cpumask_test_cpu(later_rq->cpu, &task->cpus_mask) ||
> - task_on_cpu(rq, task) ||
> - !dl_task(task) ||
> - is_migration_disabled(task) ||
> - !task_on_rq_queued(task))) {
> + (task->dl.dl_throttled &&
> + (task_rq(task) != rq ||
> + task_on_cpu(rq, task) ||
> + !dl_task(task) ||
> + !task_on_rq_queued(task))) ||
> + (!task->dl.dl_throttled &&
> + task != pick_next_pushable_dl_task(rq)))) {
> +
Thanks!
Juri
Powered by blists - more mailing lists