lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250211163952.7e3a82a1@gandalf.local.home>
Date: Tue, 11 Feb 2025 16:39:52 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Harshit Agarwal <harshit@...anix.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
 <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin
 Schneider <vschneid@...hat.com>, "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>, Jon Kohler <jon@...anix.com>, Gauri
 Patwardhan <gauri.patwardhan@...anix.com>, Rahul Chunduru
 <rahul.chunduru@...anix.com>, Will Ton <william.ton@...anix.com>
Subject: Re: [PATCH] sched/rt: Fix race in push_rt_task

On Tue, 11 Feb 2025 21:08:22 +0000
Harshit Agarwal <harshit@...anix.com> wrote:

> Thanks Steve for taking a look. Yes we should ideally remove any conditions that are 
> subsumed by task != pick_next_pushable_task condition, however I am not sure if anyone (say ttwu())
> will try to modify p’s state without removing it from the pushable tasks list first. In such a case
> pick_next_pushable_task will still pick p again and then it will match and will proceed to do the migration
> while ttwu() is trying to wake it up. Most likely, some asserts like !task_on_rq_queued etc will be hit in
> pick_next_pushable_task as soon as p is picked. If we can be sure that p’s state is modified by others
> only after it being removed from pushable tasks list of this CPU then it should be safe to remove it
> else not. 

Note that all tasks on the pick_next_pushable list is in the running state.
Nothing should be trying to wake it up while its on that list. That is, if
p is still on the pushable tasks then everything should be still fine.
Especially now that the rq locks are still held again.

I think that is the only check we need. The is_migration_disabled() check
should probably be checked earlier, as the only way it could be set is if
the current task preempted it. If it had migrated, and migrated back, it
couldn't have that set on this current CPU otherwise it would not have
migrated.

Here's the current checks again:

	if (unlikely(task_rq(task) != rq ||
		     !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
		     task_on_cpu(rq, task) ||
		     !rt_task(task) ||
		     is_migration_disabled(task) ||
		     !task_on_rq_queued(task))) {


Let's look at pick_next_pushable_task():

	p = plist_first_entry(&rq->rt.pushable_tasks,
			      struct task_struct, pushable_tasks);

	BUG_ON(rq->cpu != task_cpu(p));
	BUG_ON(task_current(rq, p));
	BUG_ON(task_current_donor(rq, p));
	BUG_ON(p->nr_cpus_allowed <= 1);

	BUG_ON(!task_on_rq_queued(p));
	BUG_ON(!rt_task(p));

If task_rq(task) != rq is true, then BUG_ON(rq->cpu != task_cpu(p)) would trigger.

  !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask

We still need that check, to make sure the CPU we picked is in the task's affinity.

If task_on_cpu(rq, task) is true, then BUG_ON(task_current(rq, p)) would trigger.

If !rt_task(task) is true then BUG_ON(!rt_task(p)) would trigger.

   is_migration_disabled(task)

Is still needed, but could possibly be moved up? (unrelated change though)

If !task_on_rq_queued(task) is true then BUG_ON(!task_on_rq_queued(p)) would trigger.

Thus, I think we can change that condition to just:

	if (is_migration_disabled(task) ||
	    !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
	    task != pick_next_pushable_task(rq)) {

I moved the is_migration_disabled() up as that's the quickest check. If
that's true, no need to test the other conditions.

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ