[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A4D2FDA.3000509@gmail.com>
Date: Fri, 03 Jul 2009 00:08:26 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: David Howells <dhowells@...hat.com>, mingo@...e.hu,
akpm@...ux-foundation.org, paulus@...ba.org, arnd@...db.de,
linux-kernel@...r.kernel.org
Subject: [PATCH] x86: atomic64_t should be 8 bytes aligned
Linus Torvalds a écrit :
>
> On Thu, 2 Jul 2009, Eric Dumazet wrote:
>> Using a fixed initial value (instead of __atomic64_read()) is even faster,
>> it apparently permits cpu to use an appropriate bus transaction.
>
> Yeah, I guess it does a "read-for-write-ownership" and allows the thing to
> be done as a single cache transaction.
>
> If we read it first, it will first get the cacheline for shared-read, and
> then the cmpxchg8b will need to turn it from shared to exclusive.
>
> Of course, the _optimal_ situation would be if the cmpxchg8b didn't
> actually do the write at all when the value matches (and all cores could
> just keep it shared), but I guess that's not going to happen.
>
> Too bad there is no pure 8-byte read op. Using MMX has too many downsides.
>
> Btw, your numbers imply that for the atomic64_add_return(), we really
> would be much better off not reading the original value at all. Again, in
> that case, we really do want the "read-for-write-ownership" cache
> transaction, not a read.
>
I forgot to mention that if atomic64_t uses to cache lines, my test
program is 10x slower.
So we probably need an __attribute__((aligned(8)) on atomic64_t definition
as well, since alignof(atomic64_t) is 4, not 8 (!!!)
#include <stdio.h>
typedef struct {
unsigned long long counter;
} atomic64_t;
typedef struct {
unsigned long long __attribute__((aligned(8))) counter;
} atomic64_ta;
struct {
int a;
atomic64_t counter;
} s __attribute__((aligned(64)));
struct {
int a;
atomic64_ta counter;
} sa __attribute__((aligned(64)));
int main()
{
printf("alignof(atomic64_t)=%d\n", __alignof__(atomic64_t));
printf("alignof(atomic64_ta)=%d\n", __alignof__(atomic64_t));
printf("alignof(unsigned long long)=%d\n", __alignof__(unsigned long long));
printf("&s.counter=%p\n", &s.counter);
printf("&sa.counter=%p\n", &sa.counter);
return 0;
}
$ gcc -O2 -o test test.c ; ./test
alignof(atomic64_t)=4
alignof(atomic64_ta)=4
alignof(unsigned long long)=8
&s.counter=0x80496c4
&sa.counter=0x8049708
$ gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../gcc-4.3.3/configure --enable-languages=c,c++ --prefix=/usr
Thread model: posix
gcc version 4.3.3 (GCC)
[PATCH] x86: atomic64_t should be 8 bytes aligned
LOCKED instructions on two cache lines are painful.
Make sure an atomic64_t is 8bytes aligned.
Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index 2503d4e..1927a56 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -250,7 +250,7 @@ static inline int atomic_add_unless(atomic_t *v, int a, int u)
/* An 64bit atomic type */
typedef struct {
- unsigned long long counter;
+ unsigned long long __attribute__((__aligned__(8))) counter;
} atomic64_t;
#define ATOMIC64_INIT(val) { (val) }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists