[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.0.999.0708162033400.30176@woody.linux-foundation.org>
Date: Thu, 16 Aug 2007 20:42:23 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Paul Mackerras <paulus@...ba.org>
cc: Nick Piggin <nickpiggin@...oo.com.au>,
Segher Boessenkool <segher@...nel.crashing.org>,
heiko.carstens@...ibm.com, horms@...ge.net.au,
linux-kernel@...r.kernel.org, rpjday@...dspring.com, ak@...e.de,
netdev@...r.kernel.org, cfriesen@...tel.com,
akpm@...ux-foundation.org, jesper.juhl@...il.com,
linux-arch@...r.kernel.org, zlynx@....org, satyam@...radead.org,
clameter@....com, schwidefsky@...ibm.com,
Chris Snook <csnook@...hat.com>,
Herbert Xu <herbert.xu@...hat.com>, davem@...emloft.net,
wensong@...ux-vs.org, wjiang@...ilience.com
Subject: Re: [PATCH 0/24] make atomic_read() behave consistently across all
architectures
On Fri, 17 Aug 2007, Paul Mackerras wrote:
>
> I'm really surprised it's as much as a few K. I tried it on powerpc
> and it only saved 40 bytes (10 instructions) for a G5 config.
One of the things that "volatile" generally screws up is a simple
volatile int i;
i++;
which a compiler will generally get horribly, horribly wrong.
In a reasonable world, gcc should just make that be (on x86)
addl $1,i(%rip)
on x86-64, which is indeed what it does without the volatile. But with the
volatile, the compiler gets really nervous, and doesn't dare do it in one
instruction, and thus generates crap like
movl i(%rip), %eax
addl $1, %eax
movl %eax, i(%rip)
instead. For no good reason, except that "volatile" just doesn't have any
good/clear semantics for the compiler, so most compilers will just make it
be "I will not touch this access in any way, shape, or form". Including
even trivially correct instruction optimization/combination.
This is one of the reasons why we should never use "volatile". It
pessimises code generation for no good reason - just because compilers
don't know what the heck it even means!
Now, people don't do "i++" on atomics (you'd use "atomic_inc()" for that),
but people *do* do things like
if (atomic_read(..) <= 1)
..
On ppc, things like that probably don't much matter. But on x86, it makes
a *huge* difference whether you do
movl i(%rip),%eax
cmpl $1,%eax
or if you can just use the value directly for the operation, like this:
cmpl $1,i(%rip)
which is again a totally obvious and totally safe optimization, but is
(again) something that gcc doesn't dare do, since "i" is volatile.
In other words: "volatile" is a horribly horribly bad way of doing things,
because it generates *worse*code*, for no good reason. You just don't see
it on powerpc, because it's already a load-store architecture, so there is
no "good code" for doing direct-to-memory operations.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists