lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 8 Mar 2016 08:49:55 -0800
From:	Tom Herbert <tom@...bertland.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Alexander Duyck <alexander.duyck@...il.com>,
	David Laight <David.Laight@...lab.com>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
	"kernel-team@...com" <kernel-team@...com>
Subject: Re: [PATCH v5 net-next] net: Implement fast csum_partial for x86_64

On Mon, Mar 7, 2016 at 5:39 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Mon, Mar 7, 2016 at 4:07 PM, Tom Herbert <tom@...bertland.com> wrote:
>>
>> As I said previously, if alignment really is a factor then we can
>> check up front if a buffer crosses a page boundary and call the slow
>> path function (original code). I'm seeing a 1 nsec hit to add this
>> check.
>
> It shouldn't be a factor, and you shouldn't check for it. My code was
> self-aligning, and had at most one unaligned access at the beginnig
> (the data of which was then used to align the rest).
>
Yes, but the logic to do the alignment does not come for free. The
intent of these patches is really to speed up checksums over small
buffers (like the checksum over the IP header or pulling up checksums
over protocol headers for dealing with checksum-complete). For
checksum over larger buffers, e.g. TCP/UDP checksums we are depending
on checksum offload (there are still some case where the host will
need to a packet checksum, but as vendors move to providing up
protocol agnostic checksum those should go away). In the VXLAN GRO
path for instance, we do a checksum pull over both the UDP header and
VLXAN headers each of which are 8 bytes. csum_partial can be trivially
implemented for a buffer of length 8 with three adcq instructions (as
in my patch). When we're using VXLAN in IPv4 both the VLXAN headers
and UDP will likely not be eight byte aligned, but alignment seems to
only be an issue when crossing a page boundary. The probability that
an 8 byte header crosses a page boundary is already very low, and
probably with a little bit code drivers could pretty much guarantee
that packet headers don't straddle page boundaries. So it seems like
the effort to align small buffers, assuming they don't straddle page
boundaries, provides little or no value.

Tom

> Tom had a version that used that. Although now that I look back at it,
> it seems to be broken by some confusion about the one-byte alignment
> vs 8-byte alignment.
>
>              Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ