netdev - Re: [net-next PATCH 6/8] mlx4: Add support for inner IPv6 checksum offloads and TSO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UfWUOFs8NvjWFQTtL_3Ure8MWbe5tJhEWFYnowb1kz1Dw@mail.gmail.com>
Date:	Tue, 26 Apr 2016 14:01:44 -0700
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	Saeed Mahameed <saeedm@....mellanox.co.il>
Cc:	Alex Duyck <aduyck@...antis.com>, Tal Alon <talal@...lanox.com>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Gal Pressman <galp@...lanox.com>,
	Or Gerlitz <ogerlitz@...lanox.com>,
	Eran Ben Elisha <eranbe@...lanox.com>
Subject: Re: [net-next PATCH 6/8] mlx4: Add support for inner IPv6 checksum
 offloads and TSO

On Tue, Apr 26, 2016 at 1:23 PM, Saeed Mahameed
<saeedm@....mellanox.co.il> wrote:
> On Tue, Apr 26, 2016 at 6:50 PM, Alex Duyck <aduyck@...antis.com> wrote:
>>
>> The setup is pretty straight forward.  Basically I left the first port
>> in the default namespace and moved the second int a secondary
>> namespace referred to below as $netns.  I then assigned the IPv6
>> addresses fec0::10:1 and fec0::10:2. After that I ran the following:
>>
>>         VXLAN=vx$net
>>         echo $VXLAN ${test_options[$i]}
>>         ip link add $VXLAN type vxlan id $net \
>>                 local fec0::10:1 remote $addr6 dev $PF0 \
>>                 ${test_options[$i]} dstport `expr 8800 + $net`
>>         ip netns exec $netns ip link add $VXLAN type vxlan id $net \
>>                                   local $addr6 remote fec0::10:1 dev $port \
>>                                   ${test_options[$i]} dstport `expr 8800 + $net`
>>         ifconfig $VXLAN 192.168.${net}.1/24
>>         ip netns exec $netns ifconfig $VXLAN 192.168.${net}.2/24
>>
>
> Thanks, indeed i see that GSO is not working with vxlan over IPv6 over
> mlx5 device.
> We will test out those patches on both mlx4 and mlx5, and debug mlx4
> IPv6 issue you see.
>
>>
>>> Anyway, I suspect it might be related to a driver bug most likely in
>>> get_real_size function @en_tx.c
>>> specifically in : *lso_header_size = (skb_inner_transport_header(skb) -
>>> skb->data) + inner_tcp_hdrlen(skb);
>>>
>>> will check this and get back to you.
>>
>> I'm not entirely convinced.  What I was seeing is t hat the hardware
>> itself was performing Rx checksum offload only on tunnels with an
>> outer IPv4 header and ignoring tunnels with an outer IPv6 header.
>
> I don't get it, are you trying to say that the issue is in the RX side ?
> what do you mean by ignoring ? Dropping ? or just not validating checksum ?
> if so why would you disable GSO and IPv6 checksumming on TX ?

I'm suspecting that whatever parsing logic exists in either the
hardware or firmware may not be configured to parse tunnels with outer
IPv6 headers.  The tell-tale sign is what occurs with an IPv6 based
tunnel with no outer checksum.  The hardware is not performing a
checksum on the inner headers so it reports it as a UDP frame with no
checksum to the stack which ends up preventing us from doing GRO.
That tells me that the hardware is not parsing IPv6 based tunnels on
Rx.  I am assuming that if the Rx side doesn't work then there is a
good chance that the Tx won't.

>>>>   @@ -2431,7 +2435,18 @@ static netdev_features_t
>>>> mlx4_en_features_check(struct sk_buff *skb,
>>>>                                                 netdev_features_t
>>>> features)
>>>>   {
>>>>         features = vlan_features_check(skb, features);
>>>> -       return vxlan_features_check(skb, features);
>>>> +       features = vxlan_features_check(skb, features);
>>>> +
>>>> +       /* The ConnectX-3 doesn't support outer IPv6 checksums but it does
>>>> +        * support inner IPv6 checksums and segmentation so  we need to
>>>> +        * strip that feature if this is an IPv6 encapsulated frame.
>>>> +        */
>>>> +       if (skb->encapsulation &&
>>>> +           (skb->ip_summed == CHECKSUM_PARTIAL) &&
>>>> +           (ip_hdr(skb)->version != 4))
>>>> +               features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
>>>
>>> Dejavu, didn't you fix this already in harmonize_features, in
>>> i.e, it is enough to do here:
>>>
>>> if (skb->encapsulation && (skb->ip_summed == CHECKSUM_PARTIAL))
>>>             features &= ~NETIF_F_IPV6_CSUM;
>>>
>>
>> So what this patch is doing is enabling an inner IPv6 header offloads.
>> Up above we set the NETIF_F_IPV6_CSUM bit and we want it to stay set
>> unless we have an outer IPv6 header because the inner headers may
>> still need that bit set.  If I did what you suggest it strips IPv6
>> checksum support for inner headers and if we have to use GSO partial I
>> ended up encountering some of the other bugs that I have fixed for GSO
>> partial where either sg or csum are not defined.
>>
>
> I see, you mean that you want to disable checksumming and GSO only for
> packets with Outer(IPv6):Inner(X) and keep it in case for
> Outer(IPv4):Inner(IPv6)
> but i think it is weird that the driver decides to disable features it
> didn't declare in first place (NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK)
>
> Retry:
>
> if (skb->encapsulation && (skb->ip_summed == CHECKSUM_PARTIAL) &&
>     (ip_hdr(skb)->version != 4))
>             features &= ~NETIF_F_IPV6_CSUM;
>
> will this work ?

Sort of.  All that would happen is that you would fall through to
harmonize_features where NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK gets
cleared.  I just figured I would short-cut things since we cannot
support inner checksum or any GSO offloads if the tunnel has an outer
IPv6 header.  In addition this happens to effectively be the same code
I am using in vxlan_features_check to disable things if we cannot
checksum a protocol so it should help to keep the code size smaller
for the function if the compiler is smart enough to coalesce similar
code.

> Anyway i prefer to debug the mlx4 issue first before we discuss the
> best approach to disable checksumming & GSO for outer IPv6 in mlx4.

The current code as-is already has it disabled.  All I am doing is
enabling IPv6 checksums for inner headers as it seems like it doesn't
work for outer headers.

>>>
>>>> +
>>>> +       return features;
>>>>   }
>>>>   #endif
>>>>   diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>>>> b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>>>> index c0d7b7296236..c9f5388ea22a 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>>>> @@ -41,6 +41,7 @@
>>>>   #include <linux/vmalloc.h>
>>>>   #include <linux/tcp.h>
>>>>   #include <linux/ip.h>
>>>> +#include <linux/ipv6.h>
>>>>   #include <linux/moduleparam.h>
>>>>     #include "mlx4_en.h"
>>>> @@ -918,8 +919,18 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct
>>>> net_device *dev)
>>>>                                  tx_ind, fragptr);
>>>>         if (skb->encapsulation) {
>>>> -               struct iphdr *ipv4 = (struct iphdr
>>>> *)skb_inner_network_header(skb);
>>>> -               if (ipv4->protocol == IPPROTO_TCP || ipv4->protocol ==
>>>> IPPROTO_UDP)
>>>> +               union {
>>>> +                       struct iphdr *v4;
>>>> +                       struct ipv6hdr *v6;
>>>> +                       unsigned char *hdr;
>>>> +               } ip;
>>>> +               u8 proto;
>>>> +
>>>> +               ip.hdr = skb_inner_network_header(skb);
>>>> +               proto = (ip.v4->version == 4) ? ip.v4->protocol :
>>>> +                                               ip.v6->nexthdr;
>>>> +
>>>> +               if (proto == IPPROTO_TCP || proto == IPPROTO_UDP)
>>>>                         op_own |= cpu_to_be32(MLX4_WQE_CTRL_IIP |
>>>> MLX4_WQE_CTRL_ILP);
>>>>                 else
>>>>                         op_own |= cpu_to_be32(MLX4_WQE_CTRL_IIP);
>>>
>>>
>>> basically this is a bug fix, I don't know why the original author assumed it
>>> will be ipv4 !
>>
>> Because the feature flags didn't allow it any other way.  I am adding
>> the NETIF_F_TSO6 and NETIF_F_IPV6_CSUM flags in hw_enc_features and so
>> situations such as this couldn't be encountered until you start adding
>> those flags.
>>
>
> True.
>
> Thanks,
> - Saeed