[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0707241208440.1433@cselinux1.cse.iitk.ac.in>
Date: Tue, 24 Jul 2007 13:04:19 +0530 (IST)
From: Satyam Sharma <ssatyam@....iitk.ac.in>
To: Nick Piggin <nickpiggin@...oo.com.au>
cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
David Howells <dhowells@...hat.com>, Andi Kleen <ak@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit()
definitions
On Tue, 24 Jul 2007, Nick Piggin wrote:
> Satyam Sharma wrote:
> > From: Satyam Sharma <ssatyam@....iitk.ac.in>
> >
> > [8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions
> >
> > > From Documentation/atomic_ops.txt, those archs that require explicit
> > memory barriers around clear_bit() must also implement these two interfaces.
> > However, for i386, clear_bit() is a strict, locked, atomic and
> > un-reorderable operation and includes an implicit memory barrier already.
> >
> > But these two functions have been wrongly defined as "barrier()" which is
> > a pointless _compiler optimization_ barrier, and only serves to make gcc
> > not do legitimate optimizations that it could have otherwise done.
> >
> > So let's make these proper no-ops, because that's exactly what we require
> > these to be on the i386 platform.
>
> No. clear_bit is not a compiler barrier on i386,
Obvious.
> thus smp_mb__before/after
> must be.
Not so obvious. Why do we require these to be a full compiler barrier
is precisely the question I raised here.
Consider this (the above two functions exist only for clear_bit(),
the atomic variant, as you already know), the _only_ memory reference
we care about is that of the address of the passed bit-string:
(1) The compiler must not optimize / elid it (i.e. we need to disallow
compiler optimization for that reference) -- but we've already taken
care of that with the __asm__ __volatile__ and the constraints on
the memory "addr" operand there, and,
(2) For the i386, it also includes an implicit memory (CPU) barrier
already.
So I /think/ it makes sense to let the compiler optimize _other_ memory
references across the call to clear_bit(). There's a difference. I think
we'd be safe even if we do this, because the synchronization in callers
must be based upon the _passed bit-string_, otherwise _they_ are the
ones who're buggy.
[ However, elsewhere Jeremy Fitzhardinge mentioned the case of
some callers, for instance, doing a memset() on an alias of
the same bit-string. But again, I think that is dodgy/buggy/
extremely border-line usage on the caller's side itself ...
*unless* the caller is doing that inside a higher-level lock
anyway, in which case he wouldn't be needing to use the
locked variants either ... ]
[ For those interested, I've been looking at the code generated
for the test kernel I built with these patches, and I don't
really see anything gcc did that it shouldn't have -- but ok,
that doesn't mean other versions/toolchains for other setups
won't. Also, the test box has been up all night, but I'm only
running Firefox on it anyway, and don't really know how to
verify if I've introduced any correctness issues / bugs. ]
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists