netdev - Re: [net-next PATCH 6/8] mlx4: Add support for inner IPv6 checksum offloads and TSO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3182842f-7b27-9b4e-2e31-a5bd2eae8d9b@gmail.com>
Date:	Thu, 28 Apr 2016 16:26:02 +0300
From:	Tariq Toukan <ttoukan.linux@...il.com>
To:	Alexander Duyck <alexander.duyck@...il.com>,
	Saeed Mahameed <saeedm@....mellanox.co.il>
Cc:	Alex Duyck <aduyck@...antis.com>, Tal Alon <talal@...lanox.com>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Gal Pressman <galp@...lanox.com>,
	Or Gerlitz <ogerlitz@...lanox.com>,
	Eran Ben Elisha <eranbe@...lanox.com>
Subject: Re: [net-next PATCH 6/8] mlx4: Add support for inner IPv6 checksum
 offloads and TSO



On 27/04/2016 9:05 PM, Alexander Duyck wrote:
> On 04/27/2016 08:39 AM, Tariq Toukan wrote:
>>
>>
>> On 27/04/2016 12:01 AM, Alexander Duyck wrote:
>>> On Tue, Apr 26, 2016 at 1:23 PM, Saeed Mahameed
>>> <saeedm@....mellanox.co.il> wrote:
>>>> On Tue, Apr 26, 2016 at 6:50 PM, Alex Duyck <aduyck@...antis.com> 
>>>> wrote:
>>>>> The setup is pretty straight forward.  Basically I left the first 
>>>>> port
>>>>> in the default namespace and moved the second int a secondary
>>>>> namespace referred to below as $netns.  I then assigned the IPv6
>>>>> addresses fec0::10:1 and fec0::10:2. After that I ran the following:
>>>>>
>>>>>          VXLAN=vx$net
>>>>>          echo $VXLAN ${test_options[$i]}
>>>>>          ip link add $VXLAN type vxlan id $net \
>>>>>                  local fec0::10:1 remote $addr6 dev $PF0 \
>>>>>                  ${test_options[$i]} dstport `expr 8800 + $net`
>>>>>          ip netns exec $netns ip link add $VXLAN type vxlan id $net \
>>>>>                                    local $addr6 remote fec0::10:1
>>>>> dev $port \
>>>>>                                    ${test_options[$i]} dstport `expr
>>>>> 8800 + $net`
>>>>>          ifconfig $VXLAN 192.168.${net}.1/24
>>>>>          ip netns exec $netns ifconfig $VXLAN 192.168.${net}.2/24
>>>>>
>>>> Thanks, indeed i see that GSO is not working with vxlan over IPv6 over
>>>> mlx5 device.
>>>> We will test out those patches on both mlx4 and mlx5, and debug mlx4
>>>> IPv6 issue you see.
>>>>
>>>>>> Anyway, I suspect it might be related to a driver bug most likely in
>>>>>> get_real_size function @en_tx.c
>>>>>> specifically in : *lso_header_size =
>>>>>> (skb_inner_transport_header(skb) -
>>>>>> skb->data) + inner_tcp_hdrlen(skb);
>>>>>>
>>>>>> will check this and get back to you.
>>>>> I'm not entirely convinced.  What I was seeing is t hat the hardware
>>>>> itself was performing Rx checksum offload only on tunnels with an
>>>>> outer IPv4 header and ignoring tunnels with an outer IPv6 header.
>>>> I don't get it, are you trying to say that the issue is in the RX 
>>>> side ?
>>>> what do you mean by ignoring ? Dropping ? or just not validating
>>>> checksum ?
>>>> if so why would you disable GSO and IPv6 checksumming on TX ?
>>> I'm suspecting that whatever parsing logic exists in either the
>>> hardware or firmware may not be configured to parse tunnels with outer
>>> IPv6 headers.  The tell-tale sign is what occurs with an IPv6 based
>>> tunnel with no outer checksum.  The hardware is not performing a
>>> checksum on the inner headers so it reports it as a UDP frame with no
>>> checksum to the stack which ends up preventing us from doing GRO.
>>> That tells me that the hardware is not parsing IPv6 based tunnels on
>>> Rx.  I am assuming that if the Rx side doesn't work then there is a
>>> good chance that the Tx won't.
>>>
>>>>>>>    @@ -2431,7 +2435,18 @@ static netdev_features_t
>>>>>>> mlx4_en_features_check(struct sk_buff *skb,
>>>>>>> netdev_features_t
>>>>>>> features)
>>>>>>>    {
>>>>>>>          features = vlan_features_check(skb, features);
>>>>>>> -       return vxlan_features_check(skb, features);
>>>>>>> +       features = vxlan_features_check(skb, features);
>>>>>>> +
>>>>>>> +       /* The ConnectX-3 doesn't support outer IPv6 checksums but
>>>>>>> it does
>>>>>>> +        * support inner IPv6 checksums and segmentation so  we
>>>>>>> need to
>>>>>>> +        * strip that feature if this is an IPv6 encapsulated 
>>>>>>> frame.
>>>>>>> +        */
>>>>>>> +       if (skb->encapsulation &&
>>>>>>> +           (skb->ip_summed == CHECKSUM_PARTIAL) &&
>>>>>>> +           (ip_hdr(skb)->version != 4))
>>>>>>> +               features &= ~(NETIF_F_CSUM_MASK | 
>>>>>>> NETIF_F_GSO_MASK);
>>>>>> Dejavu, didn't you fix this already in harmonize_features, in
>>>>>> i.e, it is enough to do here:
>>>>>>
>>>>>> if (skb->encapsulation && (skb->ip_summed == CHECKSUM_PARTIAL))
>>>>>>              features &= ~NETIF_F_IPV6_CSUM;
>>>>>>
>>>>> So what this patch is doing is enabling an inner IPv6 header 
>>>>> offloads.
>>>>> Up above we set the NETIF_F_IPV6_CSUM bit and we want it to stay set
>>>>> unless we have an outer IPv6 header because the inner headers may
>>>>> still need that bit set.  If I did what you suggest it strips IPv6
>>>>> checksum support for inner headers and if we have to use GSO 
>>>>> partial I
>>>>> ended up encountering some of the other bugs that I have fixed for 
>>>>> GSO
>>>>> partial where either sg or csum are not defined.
>>>>>
>>>> I see, you mean that you want to disable checksumming and GSO only for
>>>> packets with Outer(IPv6):Inner(X) and keep it in case for
>>>> Outer(IPv4):Inner(IPv6)
>>>> but i think it is weird that the driver decides to disable features it
>>>> didn't declare in first place (NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK)
>>>>
>>>> Retry:
>>>>
>>>> if (skb->encapsulation && (skb->ip_summed == CHECKSUM_PARTIAL) &&
>>>>      (ip_hdr(skb)->version != 4))
>>>>              features &= ~NETIF_F_IPV6_CSUM;
>>>>
>>>> will this work ?
>>> Sort of.  All that would happen is that you would fall through to
>>> harmonize_features where NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK gets
>>> cleared.  I just figured I would short-cut things since we cannot
>>> support inner checksum or any GSO offloads if the tunnel has an outer
>>> IPv6 header.  In addition this happens to effectively be the same code
>>> I am using in vxlan_features_check to disable things if we cannot
>>> checksum a protocol so it should help to keep the code size smaller
>>> for the function if the compiler is smart enough to coalesce similar
>>> code.
>>>
>>>> Anyway i prefer to debug the mlx4 issue first before we discuss the
>>>> best approach to disable checksumming & GSO for outer IPv6 in mlx4.
>>> The current code as-is already has it disabled.  All I am doing is
>>> enabling IPv6 checksums for inner headers as it seems like it doesn't
>>> work for outer headers.
>>
>> Hi Alex,
>> I will be working on the mlx4 issue next week after the holidays.
>> I will check this offline in-house, without blocking the series.
>>
>> Regards,
>> Tariq
>>
>
> If it helps below are the kind of results I am seeing with the patch 
> series applied versus what is currently there.  The big problem areas 
> are IPv6 over IPv4 tunnels, and IPv4 over IPv6 tunnels without checksums.
>
> The breakdown below is bare-ip-version without tunnel, outer-inner 
> with tunnel, or outer-csum-inner if a checksum is present on the outer 
> UDP header.
>
> After the series is applied you can see the v6 over v4 issues are 
> addressed, and GSO partial has improved the performance for traffic 
> over v4 tunnels with outer UDP checksums.
>
> Throughput Throughput  Local Local   Result
>            Units       CPU   Service Tag
>                        Util  Demand
>                        %
> mlx4 - Before
> -------------------------------------------------
> 26616.45   10^6bits/s  3.62  0.357   "v4"
> 26101.18   10^6bits/s  6.72  0.675   "v6"
> 22289.41   10^6bits/s  6.49  0.764   "v4-v4"
> N/A - could not connect              "v4-v6"
> 12743.91   10^6bits/s  4.25  0.874   "v4-csum-v4"
> N/A - could not connect              "v4-csum-v6"
> 0.69       10^6bits/s  0.66  2519.1  "v6-v4"
> 5924.47    10^6bits/s  4.23  1.871   "v6-v6"
> 10369.95   10^6bits/s  4.33  1.096   "v6-csum-v4"
> 10648.51   10^6bits/s  4.10  1.010   "v6-csum-v6"
>
> mlx4 - After
> -------------------------------------------------
> 26585.36   10^6bits/s  3.60  0.355   "v4"
> 26342.86   10^6bits/s  6.67  0.664   "v6"
> 22295.93   10^6bits/s  6.34  0.746   "v4-v4"
> 19977.24   10^6bits/s  6.04  0.793   "v4-v6"
> 22164.71   10^6bits/s  6.46  0.763   "v4-csum-v4"
> 19685.22   10^6bits/s  6.12  0.815   "v4-csum-v6"
> 6126.80    10^6bits/s  4.29  1.835   "v6-v4"
> 5793.53    10^6bits/s  4.24  1.917   "v6-v6"
> 10278.52   10^6bits/s  4.07  1.039   "v6-csum-v4"
> 10526.68   10^6bits/s  4.11  1.024   "v6-csum-v6"

Yeah that can help in comparing our results and getting aligned to 
address the issues.
Thanks for sharing.

Regards,
Tariq