[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A4D2239.5000602@gmail.com>
Date: Thu, 02 Jul 2009 23:10:17 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: David Howells <dhowells@...hat.com>, mingo@...e.hu,
akpm@...ux-foundation.org, paulus@...ba.org, arnd@...db.de,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] FRV: Implement atomic64_t
Linus Torvalds a écrit :
>
> On Wed, 1 Jul 2009, David Howells wrote:
>> +
>> +#define ATOMIC64_INIT(i) { (i) }
>> +#define atomic64_read(v) ((v)->counter)
>> +#define atomic64_set(v, i) (((v)->counter) = (i))
>
> These seem to be buggy.
>
> At least "atomic64_read()" needs to make sure to actually read it
> atomically - otherwise you'll do two 32-bit reads, and that just gets
> crap. Imagine if somebody is adding 1 to 0x00000000ffffffff, and then
> "atomic64_read()" reads it as two accesses in the wrong place, and gets
> either 0, or 0x00000001ffffffff, both of which are totally incorrect.
>
> The case of 'atomic64_set()' is _slightly_ less clear, because I think we
> use it mostly for initializers, so atomicity is often not strictly
> required. But at least on x86, we do guarantee that it sets it atomically
> too.
>
> Btw, Ingo: I looked at the x86-32 versions to be sure, and noticed a
> couple of buglets:
>
> - atomic64_xchg uses "atomic_read()". Sure, it happens to work, since
> the "atomic_read()" is not type-safe, and gets a non-atomic 64-bit
> read, but that looks really really bogus.
>
> It _should_ use __atomic64_read(), and the 64-bit versions should use a
> different counter name ("counter64"?) or we should use an inline
> function for atomic_read(), so that the type safety issue gets fixed.
>
> - atomic64_read() is being stupid with the whole loop thing. It _should_
> just do
>
> static inline unsigned long long atomic64_read(atomic64_t *ptr)
> {
> unsigned long long old = __atomic64_read(ptr);
> return cmpxchg8b(ptr, old, old);
> }
>
> and that's it. No loop. cmpxchg8b() will return the right thing.
Using a fixed initial value (instead of __atomic64_read()) is even faster,
it apparently permits cpu to use an appropriate bus transaction.
static inline unsigned long long atomic64_read(atomic64_t *ptr)
{
unsigned long long old = 0LL ;
return cmpxchg8b(&ptr->counter, old, old);
}
I also rewrote cmpxchg8b() to not use %edi register but a generic "+m" constraint.
static inline unsigned long long
cmpxchg8b(unsigned long long *ptr, unsigned long long old, unsigned long long new)
{
unsigned long low = new;
unsigned long high = new >> 32;
asm volatile(
LOCK_PREFIX "cmpxchg8b %1\n"
: "+A" (old), "+m" (*ptr)
: "b" (low), "c" (high)
);
return old;
}
I got a 4 x speedup on a dual quad core (Intel E5450) machine if all cpus try
to *read* the same atomic64 location.
I tried various init value and got additional 5 % speedup chosing a
value *most probably* different than actual atomic64 one,
like (1LL << 32), with nice asm output...
static inline unsigned long long atomic64_read(atomic64_t *ptr)
{
unsigned long long old = (1LL << 32) ;
return cmpxchg8b(&ptr->counter, old, old);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists