[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Udgf3jySASPSRKfvjZY5dthnScd++GpHX1PkdUe4E=YPA@mail.gmail.com>
Date: Thu, 4 Feb 2016 13:09:49 -0800
From: Alexander Duyck <alexander.duyck@...il.com>
To: Tom Herbert <tom@...bertland.com>
Cc: David Laight <David.Laight@...lab.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
"kernel-team@...com" <kernel-team@...com>
Subject: Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64
On Thu, Feb 4, 2016 at 12:59 PM, Tom Herbert <tom@...bertland.com> wrote:
> On Thu, Feb 4, 2016 at 9:09 AM, David Laight <David.Laight@...lab.com> wrote:
>> From: Tom Herbert
>> ...
>>> > If nothing else reducing the size of this main loop may be desirable.
>>> > I know the newer x86 is supposed to have a loop buffer so that it can
>>> > basically loop on already decoded instructions. Normally it is only
>>> > something like 64 or 128 bytes in size though. You might find that
>>> > reducing this loop to that smaller size may improve the performance
>>> > for larger payloads.
>>>
>>> I saw 128 to be better in my testing. For large packets this loop does
>>> all the work. I see performance dependent on the amount of loop
>>> overhead, i.e. we got it down to two non-adcq instructions but it is
>>> still noticeable. Also, this helps a lot on sizes up to 128 bytes
>>> since we only need to do single call in the jump table and no trip
>>> through the loop.
>>
>> But one of your 'loop overhead' instructions is 'loop'.
>> Look at http://www.agner.org/optimize/instruction_tables.pdf
>> you don't want to be using 'loop' on intel cpus.
>>
> I'm not following. We can replace loop with decl %ecx and jg, but why
> is that better?
Because loop takes something like 7 cycles whereas the decl/jg
approach takes 2 or 3. It is probably one of the reasons things look
so much better with the loop unrolled.
- Alex
Powered by blists - more mailing lists