[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130819133148.GF6726@fieldses.org>
Date: Mon, 19 Aug 2013 09:31:49 -0400
From: Bruce Fields <bfields@...ldses.org>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: David Howells <dhowells@...hat.com>,
Jeff Layton <jlayton@...hat.com>,
Al Viro <viro@...IV.linux.org.uk>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Matthew Wilcox <matthew@....cx>
Subject: Re: memory barriers in flock (Re: [PATCH v3] locks: close potential
race between setlease and open)
On Thu, Aug 15, 2013 at 02:31:06PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 15, 2013 at 09:44:25PM +0100, David Howells wrote:
> > Bruce Fields <bfields@...ldses.org> wrote:
> >
> > (Adding Paul McKenney who's good at this stuff)
>
> Well, I should be able to provide a more refined form of confusion...
>
> > > > v2:
> > > > - fix potential double-free of lease if second check finds conflict
> > > > - add smp_mb's to ensure that other CPUs see i_flock changes
> > > >
> > > > v3:
> > > > - remove smp_mb calls. Partial ordering is unlikely to help here.
> > >
> > > Forgive me here, I still don't understand. So to simplify massively,
> > > the situation looks like:
> > >
> > > setlease open
> > > ------------ ------
> > >
> > > atomic_read atomic_inc
> > > write i_flock read i_flock
> > > atomic_read
> >
> > Are the three atomic ops reading the same value? If so, you can have smp_mb()
> > calls here:
> >
> > atomic_read atomic_inc
> > smp_mb()
> > write i_flock read i_flock
> > smp_mb()
> > atomic_read
>
> Yes, you need memory barrier of some form or another. You can use
> smp_mb__after_atomic_inc() after the atomic_inc, which is lighter
> weight on x86. Note that atomic_inc() does not return a value, and
> thus does not guarantee ordering of any sort.
>
> > I *think* that memory accesses in one place need to be reverse-ordered wrt to
> > those in the other place, so:
> >
> > atomic_read atomic_inc
> > smp_mb() smp_mb()
> > write i_flock read i_flock
> > atomic_read
> >
> > doesn't achieve anything.
>
> The only thing that this arrangement could achive would be if the
> atomic_ operations are all accessing the same variable, in which case
> if the first CPU's last atomic_read preceded the atomic_inc, then
> we would know that the first CPU's first atomic_inc also preceded
> that same atomic_inc.
>
> Let's add values and look at it a slightly different way:
>
> CPU 0 CPU 1
>
> r1 = atomic_read(&x); atomic_inc(&x);
> smp_mb(); smp_mb__after_atomic_inc()
> i_flock = 1; r3 = i_flock;
> r2 = atomic_read(&x);
>
> Assume that the initial values of x and i_flock are zero. (This might not
> make sense in the code, but you can substitute different values without
> changing the way this works.)
>
> Then if r2==0, we know that r1==0 as well. Of course, if there were other
> atomic_inc(&x) operations in flight, it might be that r1!=r2, but we
> would know that r1 got an earlier value of x than r2. If x is 64 bits,
> then we know that r1<r2.
>
> But as Dave points out, the placement of the memory barriers prevents
> us from drawing any similar conclusions about the accesses to i_flock.
>
> So let's take a similar look at the initial arrangement:
>
> CPU 0 CPU 1
>
> r1 = atomic_read(&x); atomic_inc(&x);
> i_flock = 1; smp_mb__after_atomic_inc()
> smp_mb(); r3 = i_flock;
> r2 = atomic_read(&x);
>
> Again, assume that the initial values of x and of i_flock are zero.
> Assume also that this is the only code that executes. Then if r3==0, we
> know that r2>=1. In fact, if r3==0, we know that r2==1. The load from
> i_flock() had to have come before the store, and so the memory barriers
> guarantee that the load into r2 must see the results of the atomic_inc().
>
> Similarly, if r2==0, we know that CPU 0's load into r2 came before
> CPU 1's atomic_inc(). The memory barriers then guarantee that CPU 0's
> store into i_flock precedes CPU 1's load from i_flock, so that r3==1.
> Cache coherence gives us another fact. Both of CPU 0's atomic_read()s
> do volatile loads from the same variable, so they must happen in order
> (if they were not volatile, the compiler would be within its rights
> to interchange them).
OK, but the smp_mb() already guaranteed this, right?
(How would our results be different if we replaced the above by
r1 = x; x++;
i_flock=1; smp_mb();
smb_mb(); r3=i_flock;
r2 = x;
?
Could the compiler could for example assume that x doesn't change at all
between the two assignments and not do the second read at all? But if
it's allowed to do that then I'm not sure I understand what a compiler
barrier is.)
> Therefore, because CPU 0's atomic_read() into
> r2 precedes CPU 1's atomic_inc() -- recall that r2==0 -- we know that
> CPU 0's atomic_read() into r1 must also precede CPU 1's atomic_inc(),
> meaning that we know r1==0.
>
> Reasoning about memory barriers takes this same form. If something after
> memory barrier A can be proven to have happened before something that
> came before memory barrier B, then everything before memory barrier A
> must happen before everything after memory barrier B. You carry out
> the proof by looking at loads and stores to a given variable.
>
> This is a key point. Memory barriers do nothing except for conditionally
> enforcing ordering. They are not guaranteed to commit values to memory,
> to caches, or much of anything else. Again, if you know that something
> following memory barrier A happened before something preceding memory
> barrier B, then you know that memory access preceding memory barrier A
> will be seen by a corresponding memory access following memory barrier B.
>
> > > And we want to be sure that either the setlease caller sees the result
> > > of the atomic_inc, or the opener sees the result of the write to
> > > i_flock.
> > >
> > > As an example, suppose the above steps happen in the order:
> > >
> > > atomic_read [A]
> > > write i_flock [B]
> > > atomic_read [C]
> > > atomic_inc [X]
> > > read i_flock [Y]
> >
> > (I've labelled the operations for convenience)
> >
> > > How do we know that the read of i_flock [Y] at the last step reflects the
> > > latest value of i_flock? For example, couldn't the write still be stuck in
> > > first CPU's cache?
> >
> > Putting in memory barriers doesn't help here. If A, B and C are all performed
> > and committed to memory by the time X happens, then Y will see B, but C will
> > not see X.
>
> Indeed, you don't get both of these at once, except by accident.
>
> > The thing to remember is that smp_mb() is not a commit operation: it doesn't
> > cause a modification to be committed to memory. The reason you use it is to
> > make sure the CPU actually does preceding memory ops - eg. makes the
> > modification in question - before it goes and does any following memory ops.
> >
> > Linux requires the system to be cache-coherent, so if the write is actually
> > performed by a CPU then the result will be obtained by any other CPU, no
> > matter whether it's still lingering in the first CPU's caches or whether it's
> > been passed on.
>
> Or some later result, but yes. Again, these accesses must be volatile
> (as in either atomic_read() or ACCESS_ONCE()), or the compiler is within
> its rights to do all sorts of mischief.
Again I'm confused here by the statement in memory-barriers.txt that
"All memory barriers except the data dependency barriers imply a
compiler barrier." Shouldn't that mean that the compiler is forbidden
any mischief that would make accesses appear to be misordered with
respect to the barriers? If a compiler barrier doesn't mean at least
that, then I can't figure out what's left that it could mean.
> > -*-
> >
> > However, I could be wrong. Memory barriers are mind-bending to try and think
> > through, especially when it's the operations being ordered are R-W vs R-W
> > rather than having W-W on at least one side.
>
> There are 16 combinations, all of which do something. Some cases require
> an external store to prove anything, though. For example:
>
> CPU 0: r1 = ACCESS_ONCE(x); smp_mb(); r2 = ACCESS_ONCE(y);
> CPU 1: r3 = ACCESS_ONCE(y); smp_mb(); r4 = ACCESS_ONCE(x);
> CPU 2: ACCESS_ONCE(x) = 1;
> CPU 3: ACCESS_ONCE(y) = 1;
>
> Here, we know that if r2==0&&r3==1, then it is impossible to also have
> r4==0&&r1==1. Similarly, if r4==0&&r1==1, then it is impossible to also
> have r2==0&&r3==1. If there were no stores, then of course the reads
> could not tell you anything about the ordering.
>
> > Hopefully Paul will be able to chime in
>
> Careful what you wish for! ;-)
Thanks so much for the explanations!
--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists