[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100919152638.GF3060@linux.vnet.ibm.com>
Date: Sun, 19 Sep 2010 08:26:38 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc: Miklos Szeredi <miklos@...redi.hu>,
James Bottomley <James.Bottomley@...senPartnership.com>,
dhowells@...hat.com, linux-kernel@...r.kernel.org,
linux-arch@...r.kernel.org
Subject: Re: memory barrier question
On Sun, Sep 19, 2010 at 12:47:01PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2010-09-17 at 16:12 -0700, Paul E. McKenney wrote:
> > On Sat, Sep 18, 2010 at 07:49:08AM +1000, Benjamin Herrenschmidt wrote:
> > >
> > > > Right but in the concrete namei example I can't see how a compiler
> > > > optimization can make a difference. The order of the loads is quite
> > > > clear:
> > > >
> > > > LOAD inode = next.dentry->inode
> > > > if (inode != NULL)
> > > > LOAD inode->f_op
> > > >
> > > > What is there the compiler can optimize?
> > >
> > > Those two loads depend on each other, I don't think any implementation
> > > can re-order them. In fact, such data dependency is typically what is
> > > used to avoid having barriers in some cases. The second load cannot be
> > > issued until the value from the first one is returned.
> >
> > Sufficiently sadistic compiler and CPU implementations could do value
> > speculation, for example, driven by profile-feedback optimization.
> > Then the guess might initially incorrect, but then a store by some other
> > CPU could make the subsequent test decide (wrongly) that the guess had
> > in fact been correct.
> >
> > Needless to say, I am not a fan of value speculation. But other people
> > do like it a lot.
>
> Well, this verges on insanity... we get to a point where nobody's going
> to get any code right :-)
>
> I don't think the powerpc arch allows that, that leaves us with the
> compiler, but so far I don't think gcc is -that- crazy. Those constructs
> are common enough...
Give it a few years. There are reportedly already other compilers that do
this, which is not too surprising given that the perception of insanity
is limited to lockless parallel code. If you have single-threaded code,
such as code and data under a lock (where the data is never accessed
without holding that lock), then this sort of optimization is pretty safe.
I still don't like it, but the compiler guys would argue that this is
because I am one of those insane parallel-programming guys.
Furthermore, there are other ways to get into trouble. If the code
continued as follows:
LOAD inode = next.dentry->inode
if (inode != NULL)
LOAD inode->f_op
do_something_using_lots_of_registers();
LOAD inode->some_other_field
and if the code expected ->f_op and ->some_other_field to be from the
same inode structure, severe disappointment could ensue. This is because
the compiler is within its rights to reload from next.dentry->inode,
especially given register pressure. In fact, the compiler would be within
its rights to reload from next.dentry->inode in the "LOAD inode->f_op"
statement. And it might well get NULL from such a reload.
This code sequence therefore needs -at- -least- an ACCESS_ONCE() to keep
the compiler in line.
Remember, by default, the compiler is permitted to assume that it is
generating single-threaded code.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists