lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 13 Nov 2013 11:01:03 -0500
From:	Neil Horman <nhorman@...driver.com>
To:	David Laight <David.Laight@...LAB.COM>
Cc:	Ingo Molnar <mingo@...nel.org>, Joe Perches <joe@...ches.com>,
	netdev <netdev@...r.kernel.org>, Dave Jones <davej@...hat.com>,
	linux-kernel@...r.kernel.org, sebastien.dugue@...l.net,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	Eric Dumazet <eric.dumazet@...il.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

On Wed, Nov 13, 2013 at 01:32:50PM -0000, David Laight wrote:
> > > I'm not sure, whats the typical capacity for the branch predictors
> > > ability to remember code paths?
> ...
> > 
> > For such simple single-target branches it goes near or over a thousand for
> > recent Intel and AMD microarchitectures. Thousands for really recent CPUs.
> 
> IIRC the x86 can also correctly predict simple sequences - like a branch
> in a loop that is taken every other iteration, or only after a previous
> branch is taken.
> 
> Much simpler cpus may use a much simpler strategy.
> I think one I've used (a fpga soft-core cpu) just uses the low
> bits of the instruction address to index a single bit table.
> This means that branches alias each other.
> In order to get the consistent cycle counts in order to minimise
> the worst case code path we had to disable the dynamic prediction.
> 
> For the checksum code the loop branch isn't a problem.
> Tests on entry to the function might get mispredicted.
> 
> So if you have conditional prefetch when the buffer is long
> then time a short buffer after a 100 long ones you'll almost
> certainly see the mispredition penalty.
> 
> FWIW I remember speeding up a copy (I think) loop on a strongarm by
> adding an extra instruction to fetch a word from later in the buffer
> into a register I never otherwise used.
> (That was an unpaged system so I knew it couldn't fault.)
> 
Fair enough, but the code we're looking at here is arch specific.  If strongarms
benefit from different coding patterns, we can handle that in that arch.  This
x86 implementation can still avoid worrying about branch predicition since its
hardware handles it well
Neil

> 	David
> 
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ