[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250211163952.7e3a82a1@gandalf.local.home>
Date: Tue, 11 Feb 2025 16:39:52 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Harshit Agarwal <harshit@...anix.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin
Schneider <vschneid@...hat.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, Jon Kohler <jon@...anix.com>, Gauri
Patwardhan <gauri.patwardhan@...anix.com>, Rahul Chunduru
<rahul.chunduru@...anix.com>, Will Ton <william.ton@...anix.com>
Subject: Re: [PATCH] sched/rt: Fix race in push_rt_task
On Tue, 11 Feb 2025 21:08:22 +0000
Harshit Agarwal <harshit@...anix.com> wrote:
> Thanks Steve for taking a look. Yes we should ideally remove any conditions that are
> subsumed by task != pick_next_pushable_task condition, however I am not sure if anyone (say ttwu())
> will try to modify p’s state without removing it from the pushable tasks list first. In such a case
> pick_next_pushable_task will still pick p again and then it will match and will proceed to do the migration
> while ttwu() is trying to wake it up. Most likely, some asserts like !task_on_rq_queued etc will be hit in
> pick_next_pushable_task as soon as p is picked. If we can be sure that p’s state is modified by others
> only after it being removed from pushable tasks list of this CPU then it should be safe to remove it
> else not.
Note that all tasks on the pick_next_pushable list is in the running state.
Nothing should be trying to wake it up while its on that list. That is, if
p is still on the pushable tasks then everything should be still fine.
Especially now that the rq locks are still held again.
I think that is the only check we need. The is_migration_disabled() check
should probably be checked earlier, as the only way it could be set is if
the current task preempted it. If it had migrated, and migrated back, it
couldn't have that set on this current CPU otherwise it would not have
migrated.
Here's the current checks again:
if (unlikely(task_rq(task) != rq ||
!cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
task_on_cpu(rq, task) ||
!rt_task(task) ||
is_migration_disabled(task) ||
!task_on_rq_queued(task))) {
Let's look at pick_next_pushable_task():
p = plist_first_entry(&rq->rt.pushable_tasks,
struct task_struct, pushable_tasks);
BUG_ON(rq->cpu != task_cpu(p));
BUG_ON(task_current(rq, p));
BUG_ON(task_current_donor(rq, p));
BUG_ON(p->nr_cpus_allowed <= 1);
BUG_ON(!task_on_rq_queued(p));
BUG_ON(!rt_task(p));
If task_rq(task) != rq is true, then BUG_ON(rq->cpu != task_cpu(p)) would trigger.
!cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask
We still need that check, to make sure the CPU we picked is in the task's affinity.
If task_on_cpu(rq, task) is true, then BUG_ON(task_current(rq, p)) would trigger.
If !rt_task(task) is true then BUG_ON(!rt_task(p)) would trigger.
is_migration_disabled(task)
Is still needed, but could possibly be moved up? (unrelated change though)
If !task_on_rq_queued(task) is true then BUG_ON(!task_on_rq_queued(p)) would trigger.
Thus, I think we can change that condition to just:
if (is_migration_disabled(task) ||
!cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
task != pick_next_pushable_task(rq)) {
I moved the is_migration_disabled() up as that's the quickest check. If
that's true, no need to test the other conditions.
-- Steve
Powered by blists - more mailing lists