lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 9 Jul 2007 13:08:10 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Davide Libenzi <davidel@...ilserver.org>
cc:	Nick Piggin <npiggin@...e.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: queued spinlock code and results



On Mon, 9 Jul 2007, Linus Torvalds wrote:
> 
> There are no issues with the 255-CPU cap on 32-bit x86. It's just not 
> relevant to anybody. So the _only_ thing that matters is speed and to a 
> secondary degree size.

..of course, from a pure speed standpoint, the "lock dec" one seems to 
be the fastest, with the difference bwteen the 16-bit/32-bit "lock xadd" 
being comparatively totally in the noise.

Which is what I'd expect.

The difference between a 16-bit and 32-bit xadd should basically not be 
likely to be really measurable (ie we're likely talking about a single CPU 
cycle - if that - for the decode of the operand size override, and since 
both variants need it for _one_ of the operations, it likely ends up being 
about instruction scheduling noise), while the difference between a "dec" 
and "xadd" could be the difference between a native uop and microcoded.

[ Not that "xadd" couldn't be as fast as a "dec" in theory, but it's much 
  less likely to be that. It obviously has to actually write to two 
  targets: the register -and- memory, and that tends to require at least 
  an extra uop.

  And together with being a r-op-w memory instruction to begin with (which 
  is generally the "most complex" normal instruction), and not a very 
  often used instruction, the end result is that it would often tend to be 
  handled specially somehow - either in a special decode unit, or as 
  actual microcode. ]

So from a pure performance standpoint, xadd will likely continue to lose 
against dec. So the reason to choose xadd in the first place isn't "best 
performance", but "best performance given fairness".

And any performance difference between xadd and dec is going to be much 
bigger than any difference between 16/32-bit versions of xadd.

So I wouldn't get too hung up on a potential single cycle, and it's 
arguably more important to make the (inlined) "unlock" thing be as simple 
and small as possible.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ