netdev - Re: [RFC] Kernel unaligned access at __skb_flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UdLw62nxCdiieLQrZxkyyoJxSpK1AmCP63SQeCPCpNW3g@mail.gmail.com>
Date:	Fri, 29 Jan 2016 15:00:28 -0800
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Sowmini Varadhan <sowmini.varadhan@...cle.com>,
	David Miller <davem@...emloft.net>,
	Netdev <netdev@...r.kernel.org>
Subject: Re: [RFC] Kernel unaligned access at __skb_flow_dissect

On Fri, Jan 29, 2016 at 2:28 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Fri, 2016-01-29 at 14:08 -0800, Alexander Duyck wrote:
>
>> It also means DMA becomes dramatically slower as it introduces a
>> partial write access for the start of every frame.  It is why we had
>> set NET_IP_ALIGN to 0 on x86 since DMA was becoming more expensive
>> when unaligned then reading IP unaligned headers.
>
> Well, I guess that if you have an arch where DMA accesses are slow and
> NET_IP_ALIGN = 2, you are out of luck. This is why some platforms are
> better than others.

The other bit you forgot to mention was an IOMMU.  That is another
per-architecture thing that can really slow us down.  Back when I
rewrote the receive path I was dealing with a number of performance
complaints on PowerPC.  The approach I took with the Intel drivers was
supposed to be the best compromise for IOMMU, DMA alignment, and IP
header alignment.

>>
>> The gain on recvmsg would probably be minimal.  The only time I have
>> seen any significant speed-up for copying is if you can get both ends
>> aligned to something like 16B.
>
> On modern intel cpus, this does not matter at all, sure. It took a while
> before "rep movsb" finally did the right thing.
>
> memcpy() and friends implementations are much slower on some older
> arches (when dealing with unaligned src/dst)
>
> arch/mips/lib/memcpy.S is a gem ;)

Yeah.  I can imagine.  The fact is you can't may everybody happy so I
am good with just trying to support the majority architectures as best
as possible if a few have to take a performance hit for an unaligned
memcpy then so be it.

- Alex