netdev - Re: [PATCH 3/3] net: fec: align IP header in hardware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <638d2aa0-eb4a-dd4b-e516-d7ad89eca627@nelint.com>
Date:   Fri, 30 Sep 2016 07:16:12 -0700
From:   Eric Nelson <eric@...int.com>
To:     David Laight <David.Laight@...LAB.COM>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Cc:     "linux@....linux.org.uk" <linux@....linux.org.uk>,
        "andrew@...n.ch" <andrew@...n.ch>,
        "fugang.duan@....com" <fugang.duan@....com>,
        "otavio@...ystems.com.br" <otavio@...ystems.com.br>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "troy.kisky@...ndarydevices.com" <troy.kisky@...ndarydevices.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "u.kleine-koenig@...gutronix.de" <u.kleine-koenig@...gutronix.de>
Subject: Re: [PATCH 3/3] net: fec: align IP header in hardware

Hi David,

On 09/30/2016 06:49 AM, David Laight wrote:
> From: Eric Nelson
>> Sent: 30 September 2016 14:27
>> Thanks for the feedback David,
>>
>> On 09/29/2016 04:07 AM, David Laight wrote:
>>> From: Eric Nelson
>>>> Sent: 28 September 2016 18:15
>>>> On 09/28/2016 09:42 AM, David Laight wrote:
>>>>> From: Eric Nelson
>>>>>> Sent: 26 September 2016 19:40
>>>>>> Hi David,
>>>>>>
>>>>>> On 09/26/2016 02:26 AM, David Laight wrote:
>>>>>>> From: Eric Nelson
>>>>>>>> Sent: 24 September 2016 15:42
>>>>>>>> The FEC receive accelerator (RACC) supports shifting the data payload of
>>>>>>>> received packets by 16-bits, which aligns the payload (IP header) on a
>>>>>>>> 4-byte boundary, which is, if not required, at least strongly suggested
>>>>>>>> by the Linux networking layer.
>>>>>>> ...
>>>>>>>> +		/* align IP header */
>>>>>>>> +		val |= FEC_RACC_SHIFT16;
>>>>>>>
>>>>>>> I can't help feeling that there needs to be corresponding
>>>>>>> changes to increase the buffer size by 2 (maybe for large mtu)
>>>>>>> and to discard two bytes from the frame length.
>>>>>>>
>>>>>>
>>>>>> In the normal case, the fec driver over-allocates all receive packets to
>>>>>> be of size FEC_ENET_RX_FRSIZE (2048) minus the value of rx_align,
>>>>>> which is either 0x0f (ARM) or 0x03 (PPC).
>>>>>>
>>>>>> If the frame length is less than rx_copybreak (typically 256), then
>>>>>> the frame length from the receive buffer descriptor is used to
>>>>>> control the allocation size for a copied buffer, and this will include
>>>>>> the two bytes of padding if RACC_SHIFT16 is set.
>>>>>>
>>>>>>> If probably ought to be predicated on NET_IP_ALIGN as well.
>>>>>>>
>>>>>> Can you elaborate?
>>>>>
>>>>> From reading this it seems that the effect of FEC_RACC_SHIFT16 is to
>>>>> add two bytes of 'junk' to the start of every receive frame.
>>>>>
>>>>
>>>> That's right. Two bytes of junk between the MAC header and the
>>>> IP header.
>>>>
>>>>> In the 'copybreak' case the new skb would need to be 2 bytes shorter
>>>>> than the length reported by the hardware, and the data copied from
>>>>> 2 bytes into the dma buffer.
>>>>>
>>>>
>>>> As it stands, the skb allocated by the copybreak routine will include
>>>> the two bytes of padding, and the call to skb_pull_inline will ignore
>>>> them.
>>>
>>> Ok, I didn't see that call being added by this patch.
>>>
>>>>> The extra 2 bytes also mean the that maximum mtu that can be received
>>>>> into a buffer is two bytes less.
>>>>>
>>>>
>>>> Right, but I think the max is already high enough that this isn't a
>>>> problem.
>>>>
>>>>> If someone sets the mtu to (say) 9k for jumbo frames this might matter.
>>>>> Even with fixed 2048 byte buffers it reduces the maximum value the mtu
>>>>> can be set to by 2.
>>>>>
>>>>
>>>> As far as I can tell, the fec driver doesn't support jumbo frames, and
>>>> the max frame length is currently hard-coded at PKT_MAXBUF_SIZE (1522).
>>>>
>>>> This is well within the 2048-byte allocation, even with optional headers
>>>> for VLAN etc.
>>>
>>> Hmm...
>>>
>>> That (probably) means all the skb the driver allocates are actually 4k.
>>> It would be much better to reduce the size so that the entire skb
>>> (with packet buffer) is less than 2k.
>>>
>>
>> That seems worthwhile, but un-related to this patch.
> 
> Indeed.
> 
>> It appears to me that the received packets could be allocated as
>>
>> PKT_MAXBUF_SIZE+fep->rx_align+NET_IP_ALIGN
>>
>> (+2 if FEC_RACC_SHIFT16 is used)
> 
> No.
> The packet buffers need to be allocated NET_IP_ALIGN + PKT_MAXBUF_SIZE
> byte long and (I assume) aligned on a fep->rx_align byte boundary.
> 

I think we're saying the same thing here, with the exception of the
+2 for FEC_RACC_SHIFT16.

> If NET_IP_ALIGN is set (to 2) then FEC_RACC_SHIFT16 must also me set
> so that the ethernet frame itself is 4n+2 aligned.
> 

This patch does this, but not with the beginning of the skb.

It also does this when NET_IP_ALIGN is zero though, and I believe this
is the right thing, so the IP header is aligned in a sensible way.

The driver can't handle a DMA to (4n+2) on any architecture.

>>>>> Now if NET_IP_ALIGN is zero then it is fine for the rx frame to start
>>>>> on a 4n boundary, and the skb are likely to be allocated that way.
>>>>> In this case you don't want to extra two bytes of 'junk'.
>>>>>
>>>> NET_IP_ALIGN is defaulting to 2 by the conditional in skbuff.h
>>>
>>> Even though it is always currently set is isn't really ideal to have
>>> a driver that breaks if it isn't set.
>>> This could easily happen at some point in the future if the ethernet
>>> logic is put with a different cpu.
>>>
>>
>> After multiple reads, I'm confused about the meaning of NET_IP_ALIGN
>> and how it should be used.
>>
>> From Documentation/unaligned-memory-access.txt, I take it that this
>> should be configured on a per-architecture basis, and it seems to be
>> set to zero on both PPC and x86.
>>
>> I wonder if this is proper though. It seems that its' use might depend
>> on the I/O subsystem(s) in use as much as the architecture.
> ...
> 
> If the cpu cannot do misaligned memory cycles then NET_IP_ALIGN must be 2
> and all receive frames must be aligned like that.
> 

On ARM, the CPU can't handle misaligned memory cycles without
taking an alignment fault and NET_IP_ALIGN is set to 2.

On PPC, NET_IP_ALIGN is set to zero.

I could use some help from NXP about whether the driver is used on
PPC, but I don't think it can DMA to 4n+2 addresses on any architecture
and the purpose of this patch is to align the frame on a (4n+2)
address.

> If the cpu can do misaligned memory cycles then the alignment of receive
> ethernet frames doesn't matter that much.
> NET_IP_ALIGN is likely to be set to zero because the cost of the cpu
> doing misaligned transfers it likely to be a lot less than that of
> un-optimised dma accesses to misaligned memory [1] [2].
> 

On ARM, we have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
but I find it hard to believe that taking alignment faults is more
efficient than adding two bytes to the start of the frame.

> If NET_IP_ALIGN is zero then I believe that ethernet drivers are
> allowed to build skb that have the frame on a 4n+2 alignment.
> This is probably sensible if the hardware can write the two bytes.
> (DM might correct me there.)
> 

Again, I don't think the FEC can do this, even if PPC does allow
DMA to 4n+2 addresses for other functions.

> 	David
> 
> [1] The original sparc sbus 'DMA' part did multiple 16bit transfers instead
>   of a burst of 32bit transfers. This meant the buffer had to be misaligned
>   and a software copy done to align the frames. Fixed in the DMA+ part.
> 
> [2] PCIe writes are likely to be much faster if they contain entire cache
>   lines of data.
>