lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 15 Jul 2014 08:41:49 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Christoph Lameter <cl@...two.org>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Tejun Heo <tj@...nel.org>, David Howells <dhowells@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC] percpu: add data dependency barrier in percpu
 accessors and operations

Christoph, stop arguing. Trust me, Paul knows memory ordering. You
clearly do *not*.

On Tue, Jul 15, 2014 at 8:06 AM, Christoph Lameter <cl@...two.org> wrote:
>
> The cachelines will be evicted from the other processors at
> initialization. alloc_percpu *itself* zeroes all data on each percpu areas
> before returning the offset to the percpu data structure. See
> pcpu_populate_chunk(). At that point *all* other processors have those
> cachelines no longer in their caches. The initialization done with values
> specific to the subsystem is not that important.

In practice, with enough instructions in the CPU queues and
sufficiently small write buffers etc (or with a sufficiently ordered
CPU core, like x86), that may often be true. But there is absolutely
zero reason to think it's always true.

On the writer side, if there isn't a write barrier, the actual writes
can be visible to other CPU's in arbitrary order. *Including* the
visibility of the offset before the zeroing. Really.

On the reader side, for all sane CPU's, reading the offset and then
reading data from that offset is an implicit barrier. But "all sane"
is not "all". On alpha, reading the offset does NOT guarantee that you
see later data when you use that offset to read data. In theory, it
could be due to value prediction, but in practice it's actually due to
segmented caches, so that one part of the cache has seen data that
arrived "later" (ie written _after_ the wmb on the writing CPU)
_before_ it sees data that arrived earlier. That's what the
"smp_read_barrier_depends()" protects against.

> The return value of the function is only available after
> pcpu_populate_chunk() returns.

Really, "before" and "after" have ABSOLUTELY NO MEANING unless you
have a barrier. And you're arguing against those barriers. So you
cannot use "before" as an argument, since in your world, no such thing
even exists!

There are other arguments, but they basically boil down to "no other
CPU ever accesses the per-cpu data of *this* CPU" (wrong) or "the
users will do their own barriers" (maybe true, maybe not). Your "value
is only available after" argument really isn't an argument. Not
without those barriers.

            Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ