linux-kernel - Re: [RFC PATCH 0/6] stop_machine: kill stop_cpus_mutex and stop_cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150629040251.GA14558@redhat.com>
Date:	Mon, 29 Jun 2015 06:02:51 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	paulmck@...ux.vnet.ibm.com, tj@...nel.org, mingo@...hat.com,
	der.herr@...r.at, dave@...olabs.net, riel@...hat.com,
	viro@...IV.linux.org.uk, torvalds@...ux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/6] stop_machine: kill stop_cpus_mutex and
	stop_cpus_lock

On 06/26, Oleg Nesterov wrote:
>
> 2 cmpxchg()'s vs 2 spin_lock()'s. Plus wake_up(), but we can check
> waitqueue_active().
>
> Do you think thi will be noticeably slower?
>
> Of course, if it races with another stop_two_cpus/stop_cpus it will
> sleep, but in this case we need to wait anyway.
>
>
> And I don't think that percpu-rwsem instead of stop_cpu_mutex makes
> sense. at least I don't understand how can it help. OK, stop_two_cpus()
> can use percpu_down_read() to avoid the deadlock with stop_cpus(), but
> you still need double-lock... So I don't think this will make it faster,
> this will just penalize stop_cpus(). Or I misunderstood.
>
> So I am still not convinced... But probably I am too biased ;)

Yes... I'll probably try to make v2, this version is overcomplicated
and buggy.


> Btw. I can't understand the cpu_active() checks in stop_two_cpus().
> Do we really need them?

Ah, please ignore.

Yes, we can't rely on stopper->enabled check in cpu_stop_queue_work(),
cpu_stop_signal_done() does not update multi_stop_data->num_threads /
->thread_ack. So we need to ensure that cpu_online() == T for both CPUS
or multi_cpu_stop() can hang.

But we can't use cpu_online() instead, take_cpu_down() can be already
queued.

So this relies on the fact that CPU_DOWN_PREPARE (which removes CPU
from cpu_active_mask) is called before stop_machine(take_cpu_down) and
we do not care that cpu_active() is not stable; if we see cpu_active()
cpu_online() can't change unders us because take_cpu_down() was not
queued.

If we change stop_two_cpus() to use stop_work_alloc_one() it can use
cpu_online(),

	int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *arg)
	{
		struct cpu_stop_work *work1, *work2;
		struct cpu_stop_done done;
		struct multi_stop_data msdata = {
			.fn = fn,
			.data = arg,
			.num_threads = 2,
			.active_cpus = cpumask_of(cpu1),
		};

		set_state(&msdata, MULTI_STOP_PREPARE);
		cpu_stop_init_done(&done, 2);

		if (cpu1 > cpu2)
			swap(cpu1, cpu2);

		work1 = stop_work_alloc_one(cpu1, true);
		work2 = stop_work_alloc_one(cpu2, true);
		/* stop_machine() is blocked, cpu can't go away */
		if (cpu_online(cpu1) && cpu_online(cpu2)) {
			work1->fn   = work2->fn   = multi_cpu_stop;
			work1->arg  = work2->arg  = &msdata;
			work1->done = work2->done = &done;

			preempt_disable();
			cpu_stop_queue_work(cpu1, work1);
			cpu_stop_queue_work(cpu2, work2);
			preempt_enable();
			wait_for_completion(&done.completion);
		}

		stop_work_free_one(cpu1);
		stop_work_free_one(cpu2);
		stop_work_wake_up();

		return done.executed ? done.ret : -ENOENT;
	}

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/