netdev - RE: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AE90C24D6B3A694183C094C60CF0A2F6026B7412@saturn3.aculab.com>
Date:	Wed, 13 Nov 2013 13:32:50 -0000
From:	"David Laight" <David.Laight@...LAB.COM>
To:	"Ingo Molnar" <mingo@...nel.org>,
	"Neil Horman" <nhorman@...driver.com>
Cc:	"Joe Perches" <joe@...ches.com>, "netdev" <netdev@...r.kernel.org>,
	"Dave Jones" <davej@...hat.com>, <linux-kernel@...r.kernel.org>,
	<sebastien.dugue@...l.net>, "Thomas Gleixner" <tglx@...utronix.de>,
	"Ingo Molnar" <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
	<x86@...nel.org>, "Eric Dumazet" <eric.dumazet@...il.com>,
	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>
Subject: RE: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

> > I'm not sure, whats the typical capacity for the branch predictors
> > ability to remember code paths?
...
> 
> For such simple single-target branches it goes near or over a thousand for
> recent Intel and AMD microarchitectures. Thousands for really recent CPUs.

IIRC the x86 can also correctly predict simple sequences - like a branch
in a loop that is taken every other iteration, or only after a previous
branch is taken.

Much simpler cpus may use a much simpler strategy.
I think one I've used (a fpga soft-core cpu) just uses the low
bits of the instruction address to index a single bit table.
This means that branches alias each other.
In order to get the consistent cycle counts in order to minimise
the worst case code path we had to disable the dynamic prediction.

For the checksum code the loop branch isn't a problem.
Tests on entry to the function might get mispredicted.

So if you have conditional prefetch when the buffer is long
then time a short buffer after a 100 long ones you'll almost
certainly see the mispredition penalty.

FWIW I remember speeding up a copy (I think) loop on a strongarm by
adding an extra instruction to fetch a word from later in the buffer
into a register I never otherwise used.
(That was an unpaged system so I knew it couldn't fault.)

	David

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html