[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.0.999.0707091256510.3412@woody.linux-foundation.org>
Date: Mon, 9 Jul 2007 13:08:10 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Davide Libenzi <davidel@...ilserver.org>
cc: Nick Piggin <npiggin@...e.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: queued spinlock code and results
On Mon, 9 Jul 2007, Linus Torvalds wrote:
>
> There are no issues with the 255-CPU cap on 32-bit x86. It's just not
> relevant to anybody. So the _only_ thing that matters is speed and to a
> secondary degree size.
..of course, from a pure speed standpoint, the "lock dec" one seems to
be the fastest, with the difference bwteen the 16-bit/32-bit "lock xadd"
being comparatively totally in the noise.
Which is what I'd expect.
The difference between a 16-bit and 32-bit xadd should basically not be
likely to be really measurable (ie we're likely talking about a single CPU
cycle - if that - for the decode of the operand size override, and since
both variants need it for _one_ of the operations, it likely ends up being
about instruction scheduling noise), while the difference between a "dec"
and "xadd" could be the difference between a native uop and microcoded.
[ Not that "xadd" couldn't be as fast as a "dec" in theory, but it's much
less likely to be that. It obviously has to actually write to two
targets: the register -and- memory, and that tends to require at least
an extra uop.
And together with being a r-op-w memory instruction to begin with (which
is generally the "most complex" normal instruction), and not a very
often used instruction, the end result is that it would often tend to be
handled specially somehow - either in a special decode unit, or as
actual microcode. ]
So from a pure performance standpoint, xadd will likely continue to lose
against dec. So the reason to choose xadd in the first place isn't "best
performance", but "best performance given fairness".
And any performance difference between xadd and dec is going to be much
bigger than any difference between 16/32-bit versions of xadd.
So I wouldn't get too hung up on a potential single cycle, and it's
arguably more important to make the (inlined) "unlock" thing be as simple
and small as possible.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists