linux-kernel - Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131113135309.GA27006@gmail.com>
Date:	Wed, 13 Nov 2013 14:53:09 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	David Laight <David.Laight@...LAB.COM>
Cc:	Neil Horman <nhorman@...driver.com>, Joe Perches <joe@...ches.com>,
	netdev <netdev@...r.kernel.org>, Dave Jones <davej@...hat.com>,
	linux-kernel@...r.kernel.org, sebastien.dugue@...l.net,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	Eric Dumazet <eric.dumazet@...il.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

* David Laight <David.Laight@...LAB.COM> wrote:

> > > I'm not sure, whats the typical capacity for the branch predictors 
> > > ability to remember code paths?
> ...
> > 
> > For such simple single-target branches it goes near or over a thousand 
> > for recent Intel and AMD microarchitectures. Thousands for really 
> > recent CPUs.
> 
> IIRC the x86 can also correctly predict simple sequences - like a branch 
> in a loop that is taken every other iteration, or only after a previous 
> branch is taken.

They tend to be rather capable but not very well documented :) With a 
large out of order execution design and 20+ pipeline stages x86 branch 
prediction accuracy is perhaps the most important design aspect to good 
CPU performance.

> Much simpler cpus may use a much simpler strategy.

Yeah. The patches in this thread are about the x86 assembly implementation 
of the csum routines, and for 'typical' x86 CPUs the branch prediction 
units and caches are certainly sophisticated enough.

Also note that here, for real usecases, the csum routines are (or should 
be) memory bandwidth limited, missing the data cache most of the time, 
with a partially idling pipeline, while branch prediction accuracy matters 
most when the pipeline is well fed and there are a lot of instructions in 
flight.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/