linux-kernel - Re: queued spinlock code and results

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.0.999.0707091256510.3412@woody.linux-foundation.org>
Date:	Mon, 9 Jul 2007 13:08:10 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Davide Libenzi <davidel@...ilserver.org>
cc:	Nick Piggin <npiggin@...e.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: queued spinlock code and results

On Mon, 9 Jul 2007, Linus Torvalds wrote:
> 
> There are no issues with the 255-CPU cap on 32-bit x86. It's just not 
> relevant to anybody. So the _only_ thing that matters is speed and to a 
> secondary degree size.

..of course, from a pure speed standpoint, the "lock dec" one seems to 
be the fastest, with the difference bwteen the 16-bit/32-bit "lock xadd" 
being comparatively totally in the noise.

Which is what I'd expect.

The difference between a 16-bit and 32-bit xadd should basically not be 
likely to be really measurable (ie we're likely talking about a single CPU 
cycle - if that - for the decode of the operand size override, and since 
both variants need it for _one_ of the operations, it likely ends up being 
about instruction scheduling noise), while the difference between a "dec" 
and "xadd" could be the difference between a native uop and microcoded.

[ Not that "xadd" couldn't be as fast as a "dec" in theory, but it's much 
  less likely to be that. It obviously has to actually write to two 
  targets: the register -and- memory, and that tends to require at least 
  an extra uop.

  And together with being a r-op-w memory instruction to begin with (which 
  is generally the "most complex" normal instruction), and not a very 
  often used instruction, the end result is that it would often tend to be 
  handled specially somehow - either in a special decode unit, or as 
  actual microcode. ]

So from a pure performance standpoint, xadd will likely continue to lose 
against dec. So the reason to choose xadd in the first place isn't "best 
performance", but "best performance given fairness".

And any performance difference between xadd and dec is going to be much 
bigger than any difference between 16/32-bit versions of xadd.

So I wouldn't get too hung up on a potential single cycle, and it's 
arguably more important to make the (inlined) "unlock" thing be as simple 
and small as possible.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/