lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 29 Sep 2013 15:56:46 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...nel.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH] hotplug: Optimize {get,put}_online_cpus()

On 09/27, Oleg Nesterov wrote:
>
> I tried hard to find any hole in this version but failed, I believe it
> is correct.

And I still believe it is. But now I am starting to think that we
don't need cpuhp_seq. (and imo cpuhp_waitcount, but this is minor).

> We need to ensure 2 things:
>
> 1. The reader should notic state = BLOCK or the writer should see
>    inc(__cpuhp_refcount). This is guaranteed by 2 mb's in
>    __get_online_cpus() and in cpu_hotplug_begin().
>
>    We do not care if the writer misses some inc(__cpuhp_refcount)
>    in per_cpu_sum(__cpuhp_refcount), that reader(s) should notice
>    state = readers_block (and inc(cpuhp_seq) can't help anyway).

Yes!

> 2. If the writer sees the result of this_cpu_dec(__cpuhp_refcount)
>    from __put_online_cpus() (note that the writer can miss the
>    corresponding inc() if it was done on another CPU, so this dec()
>    can lead to sum() == 0),

But this can't happen in this version? Somehow I forgot that
__get_online_cpus() does inc/get under preempt_disable(), always on
the same CPU. And thanks to mb's the writer should not miss the
reader which has already passed the "state != BLOCK" check.

To simplify the discussion, lets ignore the "readers_fast" state,
synchronize_sched() logic looks obviously correct. IOW, lets discuss
only the SLOW -> BLOCK transition.

	cput_hotplug_begin()
	{
		state = BLOCK;

		mb();

		wait_event(cpuhp_writer,
				per_cpu_sum(__cpuhp_refcount) == 0);
	}

should work just fine? Ignoring all details, we have

	get_online_cpus()
	{
	again:
		preempt_disable();

		__this_cpu_inc(__cpuhp_refcount);

		mb();

		if (state == BLOCK) {

			mb();

			__this_cpu_dec(__cpuhp_refcount);
			wake_up_all(cpuhp_writer);

			preempt_enable();
			wait_event(state != BLOCK);
			goto again;
		}

		preempt_enable();
	}

It seems to me that these mb's guarantee all we need, no?

It looks really simple. The reader can only succed if it doesn't see
BLOCK, in this case per_cpu_sum() should see the change,

We have

	WRITER					READER on CPU X

	state = BLOCK;				__cpuhp_refcount[X]++;

	mb();					mb();

	...
	count += __cpuhp_refcount[X];		if (state != BLOCK)
	...						return;

						mb();
						__cpuhp_refcount[X]--;

Either reader or writer should notice the STORE we care about.

If a reader can decrement __cpuhp_refcount, we have 2 cases:

	1. It is the reader holding this lock. In this case we
	   can't miss the corresponding inc() done by this reader,
	   because this reader didn't see BLOCK in the past.

	   It is just the

			A == B == 0
	   	CPU_0			CPU_1
	   	-----			-----
	   	A = 1;			B = 1;
	   	mb();			mb();
	   	b = B;			a = A;

	   pattern, at least one CPU should see 1 in its a/b.

	2. It is the reader which tries to take this lock and
	   noticed state == BLOCK. We could miss the result of
	   its inc(), but we do not care, this reader is going
	   to block.

	   _If_ the reader could migrate between inc/dec, then
	   yes, we have a problem. Because that dec() could make
	   the result of per_cpu_sum() = 0. IOW, we could miss
	   inc() but notice dec(). But given that it does this
	   on the same CPU this is not possible.

So why do we need cpuhp_seq?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ