[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <063D6719AE5E284EB5DD2968C1650D6D1CCD63C9@AcuExch.aculab.com>
Date: Thu, 4 Feb 2016 11:08:45 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Tom Herbert' <tom@...bertland.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: "tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
"kernel-team@...com" <kernel-team@...com>
Subject: RE: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64
From: Tom Herbert
> Sent: 03 February 2016 19:19
...
> + /* Main loop */
> +50: adcq 0*8(%rdi),%rax
> + adcq 1*8(%rdi),%rax
> + adcq 2*8(%rdi),%rax
> + adcq 3*8(%rdi),%rax
> + adcq 4*8(%rdi),%rax
> + adcq 5*8(%rdi),%rax
> + adcq 6*8(%rdi),%rax
> + adcq 7*8(%rdi),%rax
> + adcq 8*8(%rdi),%rax
> + adcq 9*8(%rdi),%rax
> + adcq 10*8(%rdi),%rax
> + adcq 11*8(%rdi),%rax
> + adcq 12*8(%rdi),%rax
> + adcq 13*8(%rdi),%rax
> + adcq 14*8(%rdi),%rax
> + adcq 15*8(%rdi),%rax
> + lea 128(%rdi), %rdi
> + loop 50b
I'd need convincing that unrolling the loop like that gives any significant gain.
You have a dependency chain on the carry flag so have delays between the 'adcq'
instructions (these may be more significant than the memory reads from l1 cache).
I also don't remember (might be wrong) the 'loop' instruction being executed quickly.
If 'loop' is fast then you will probably find that:
10: adcq 0(%rdi),%rax
lea 8(%rdi),%rdi
loop 10b
is just as fast since the three instructions could all be executed in parallel.
But I suspect that 'dec %cx; jnz 10b' is actually better (and might execute as
a single micro-op).
IIRC 'adc' and 'dec' will both have dependencies on the flags register
so cannot execute together (which is a shame here).
It is also possible that breaking the carry-chain dependency by doing 32bit
adds (possibly after 64bit reads) can be made to be faster.
David
Powered by blists - more mailing lists