linux-kernel - Re: [PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150730215527.GQ25159@twins.programming.kicks-ass.net>
Date:	Thu, 30 Jul 2015 23:55:27 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...hat.com>,
	Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 6/6] stop_machine: kill stop_cpus_lock and
 lg_double_lock/unlock()

On Tue, Jul 21, 2015 at 09:22:47PM +0200, Oleg Nesterov wrote:

> +static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
> +				    int cpu2, struct cpu_stop_work *work2)
> +{
> +	struct cpu_stopper *stopper1 = per_cpu_ptr(&cpu_stopper, cpu1);
> +	struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2);
> +	int err;
> +retry:
> +	spin_lock_irq(&stopper1->lock);
> +	spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);
> +	/*
> +	 * If we observe both CPUs active we know _cpu_down() cannot yet have
> +	 * queued its stop_machine works and therefore ours will get executed
> +	 * first. Or its not either one of our CPUs that's getting unplugged,
> +	 * in which case we don't care.
> +	 */
> +	err = -ENOENT;
> +	if (!cpu_active(cpu1) || !cpu_active(cpu2))
> +		goto unlock;
> +
> +	WARN_ON(!stopper1->enabled || !stopper2->enabled);
> +	/*
> +	 * Ensure that if we race with stop_cpus() the stoppers won't
> +	 * get queued up in reverse order, leading to system deadlock.
> +	 */
> +	err = -EDEADLK;
> +	if (stop_work_pending(stopper1) != stop_work_pending(stopper2))
> +		goto unlock;

You could DoS/false positive this by running stop_one_cpu() in a loop,
and thereby 'always' having work pending on one but not the other.

(doing so if obviously daft for other reasons)

> +
> +	err = 0;
> +	__cpu_stop_queue_work(stopper1, work1);
> +	__cpu_stop_queue_work(stopper2, work2);
> +unlock:
> +	spin_unlock(&stopper2->lock);
> +	spin_unlock_irq(&stopper1->lock);
> +
> +	if (unlikely(err == -EDEADLK)) {
> +		cond_resched();
> +		goto retry;

And this just gives me -rt nightmares.

> +	}
> +	return err;
> +}

As it is, -rt does horrible things to stop_machine, and I would very
much like to make it such that we don't need to do that.

Now, obviously, stop_cpus() is _BAD_ for -rt, and we try real hard to
make sure that doesn't happen, but stop_one_cpu() and stop_two_cpus()
should not be a problem.

Exclusion between stop_{one,two}_cpu{,s}() and stop_cpus() makes this
trivially go away.

Paul's RCU branch already kills try_stop_cpus() dead, so that wart is
also gone. But we're still stuck with stop_machine_from_inactive_cpu()
which does a spin-wait for exclusive state. So I suppose we'll have to
keep stop_cpus_mutex :/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/