[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87wprmean1.fsf@tassilo.jf.intel.com>
Date: Wed, 06 Jan 2016 12:05:54 -0800
From: Andi Kleen <andi@...stfloor.org>
To: Tom Herbert <tom@...bertland.com>
Cc: <davem@...emloft.net>, <netdev@...r.kernel.org>,
<kernel-team@...com>, <tglx@...utronix.de>, <mingo@...hat.com>,
<hpa@...or.com>, <x86@...nel.org>
Subject: Re: [PATCH v2 net-next] net: Implement fast csum_partial for x86_64
Tom Herbert <tom@...bertland.com> writes:
> Also, we don't do anything special for alignment, unaligned
> accesses on x86 do not appear to be a performance issue.
This is not true on Atom CPUs.
Also on most CPUs there is still a larger penalty when crossing
cache lines.
> Verified correctness by testing arbitrary length buffer filled with
> random data. For each buffer I compared the computed checksum
> using the original algorithm for each possible alignment (0-7 bytes).
>
> Checksum performance:
>
> Isolating old and new implementation for some common cases:
You forgot to state the CPU. The results likely depend heavily
on the micro architecture.
The original C code was optimized for K8 FWIW.
Overall your assembler looks similar to the C code, except for the jump
table. Jump table has the disadvantage that it is much harder to branch
predict, with a large penalty if it's mispredicted.
I would expect it to be slower for cases where the length
changes frequently. Did you benchmark that case?
-Andi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists