linux-kernel - Re: Efficient x86 and x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.10.0808131119290.3462@nehalem.linux-foundation.org>
Date:	Wed, 13 Aug 2008 11:27:14 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
cc:	Steven Rostedt <rostedt@...dmis.org>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Andi Kleen <andi@...stfloor.org>,
	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Miller <davem@...emloft.net>,
	Roland McGrath <roland@...hat.com>,
	Ulrich Drepper <drepper@...hat.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Gregory Haskins <ghaskins@...ell.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
	Clark Williams <williams@...hat.com>
Subject: Re: Efficient x86 and x86_64 NOP microbenchmarks

On Wed, 13 Aug 2008, Mathieu Desnoyers wrote:
> 
> I also did some microbenchmarks on my Intel Xeon 64 bits, AMD64 and
> Intel Pentium 4 boxes to compare a baseline

Note that the biggest problems of a jump-based nop are likely to happen 
when there are I$ misses and/or when there are other jumps involved. Ie a 
some microarchitectures tend to have issues with jumps to jumps, or when 
there are multiple control changes in the same (possibly partial) 
cacheline because the instruction stream prediction may be predecoded in 
the L1 I$, and multiple branches in the same cacheline - or in the same 
execution cycle - can pollute that kind of thing.

So microbenchmarking this way will probably make some things look 
unrealistically good. 

On the P4, the trace cache makes things even more interesting, since it's 
another level of I$ entirely, with very different behavior for the hit 
case vs the miss case.

And I$ misses for the kernel are actually fairly high. Not in 
microbenchmarks that tend to have very repetive behavior and a small I$ 
footprint, but in a lot of real-life loads the *bulk* of all action is in 
user space, and then the kernel side is often invoced with few loops (the 
kernel has very few loops indeed) and a cold I$.

So your numbers are interesting, but it would be really good to also get 
some info from Intel/AMD who may know about microarchitectural issues for 
the cases that don't show up in the hot-I$-cache environment.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/