[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALx6S35bpBZ-cVPjMiKBE978sGt1_5+bgwLDRfgR+ZumhBf8YA@mail.gmail.com>
Date: Tue, 8 Mar 2016 08:49:55 -0800
From: Tom Herbert <tom@...bertland.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Alexander Duyck <alexander.duyck@...il.com>,
David Laight <David.Laight@...lab.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
"kernel-team@...com" <kernel-team@...com>
Subject: Re: [PATCH v5 net-next] net: Implement fast csum_partial for x86_64
On Mon, Mar 7, 2016 at 5:39 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Mon, Mar 7, 2016 at 4:07 PM, Tom Herbert <tom@...bertland.com> wrote:
>>
>> As I said previously, if alignment really is a factor then we can
>> check up front if a buffer crosses a page boundary and call the slow
>> path function (original code). I'm seeing a 1 nsec hit to add this
>> check.
>
> It shouldn't be a factor, and you shouldn't check for it. My code was
> self-aligning, and had at most one unaligned access at the beginnig
> (the data of which was then used to align the rest).
>
Yes, but the logic to do the alignment does not come for free. The
intent of these patches is really to speed up checksums over small
buffers (like the checksum over the IP header or pulling up checksums
over protocol headers for dealing with checksum-complete). For
checksum over larger buffers, e.g. TCP/UDP checksums we are depending
on checksum offload (there are still some case where the host will
need to a packet checksum, but as vendors move to providing up
protocol agnostic checksum those should go away). In the VXLAN GRO
path for instance, we do a checksum pull over both the UDP header and
VLXAN headers each of which are 8 bytes. csum_partial can be trivially
implemented for a buffer of length 8 with three adcq instructions (as
in my patch). When we're using VXLAN in IPv4 both the VLXAN headers
and UDP will likely not be eight byte aligned, but alignment seems to
only be an issue when crossing a page boundary. The probability that
an 8 byte header crosses a page boundary is already very low, and
probably with a little bit code drivers could pretty much guarantee
that packet headers don't straddle page boundaries. So it seems like
the effort to align small buffers, assuming they don't straddle page
boundaries, provides little or no value.
Tom
> Tom had a version that used that. Although now that I look back at it,
> it seems to be broken by some confusion about the one-byte alignment
> vs 8-byte alignment.
>
> Linus
Powered by blists - more mailing lists