[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+mtBx_oxJxteeBnbyboGTSKn3JdBRx2Nb3a9EuCZzG+_U-8Zw@mail.gmail.com>
Date: Wed, 5 Nov 2014 13:59:38 -0800
From: Tom Herbert <therbert@...gle.com>
To: Or Gerlitz <ogerlitz@...lanox.com>
Cc: Florian Westphal <fw@...len.de>,
Linux Netdev List <netdev@...r.kernel.org>,
Jesse Gross <jesse@...ira.com>, Amir Vadai <amirv@...lanox.com>
Subject: Re: mlx4+vxlan offload breaks gre tunnels
On Wed, Nov 5, 2014 at 8:17 AM, Or Gerlitz <ogerlitz@...lanox.com> wrote:
> On 11/5/2014 5:04 PM, Florian Westphal wrote:
>>
>> tl,dr: all tcp packets sent via gre tunnel have broken tcp csum if vxlan
>> offload
>> is enabled with mlx4 driver.
>>
>> Given following config on tx-side:
>> dev=enp3s0
>> ip addr add dev $dev 192.168.23.1/24
>> ip link set $dev up
>> ip link add mygre type gretap remote 192.168.23.2 local 192.168.23.1
>> ip addr add dev mygre 192.168.42.1/24
>> ip link set gre0 up
>> ip link set mygre up
>>
>> and
>>
>> options mlx4_core log_num_mgm_entry_size=-1 debug_level=1
>> port_type_array=2,2
>>
>> in
>> /etc/modprobe.d/mlx4.conf
>>
>> all tcp packets sent to destinations over the gre tunnel have bogus tcp
>> checksums (and are tossed on rx side when stack validates tcp checksum).
>>
>> net-next head is commit 30349bdbc4da5ecf0efa25556e3caff9c9b8c5f7 .
>>
>> What makes things work for me:
>> either
>>
>> options mlx4_core 1 debug_level=1 port_type_array=2,2
>>
>> (ie. no MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
>>
>> or not setting NETIF_F_IP_CSUM in enc_features:
>>
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> @@ -2579,10 +2579,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev,
>> int port,
>> dev->priv_flags |= IFF_UNICAST_FLT;
>> if (mdev->dev->caps.tunnel_offload_mode ==
>> MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
>> - dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
>> + dev->hw_enc_features |= NETIF_F_RXCSUM |
>> NETIF_F_TSO |
>> NETIF_F_GSO_UDP_TUNNEL;
>>
>> I am not sure if its right fix, but to my eyes this basically looks like
>> mlx4 is telling stack that it can handle tcp checksum offload within
>> tunnels, and that doesn't seem to be the case for all types (e.g. gre).
>>
>> Could someone who understand the enc_features specifics better confirm
>> that
>> above patch is correct (or provide a better/proper fix)?
>
>
> Yep, I can see now the problem. It comes into play with ConnectX3-pro NICs
> that support VXLAN offloads (but not with ConnectX3 NIC which don't) when
> you enable the offloads support on the CX3-pro.
>
> The problem originates from the fact that we can't advertize something like
> "the HW can offload the inner checksum of UDP/VXLAN encapsulated (but not
> for GRE)", e.g in a similar manner that exists in the GSO space, where you
> have NETIF_F_GSO _YYY for each yyy in {UDP, SIT, GRE, etc} tunneling scheme.
>
> I think the best effort we can do now is
>
> 1. come up with something such as the below patch for 3.18 which is
> back-ward portable for -stable kernels, it will only arm the hw offloads if
> the OS tells us there's VXLAN in action
>
> 2. come up with proper kernel APIs to let NICs advertize which encap
> schemes they can actually offload the inner checksum, Tom... your work which
> now runs over netdev.
>
Possibly #3: add ndo_gso_check to detect nested tunneling. In this
case it would see that gso_type has both SKB_GSO_GRE and
SKB_GSO_UDP_TUNNEL set.
> Tom/Jesse- thoughts? are you +1-ing the below approach?
>
> Or.
>
> tested to work with the following which is a bit different, tell me if it
> works for you
>
> # node A - with mlx4_en address192.168.31.18
> ip tunnel add gre1 mode gre local 192.168.31.18 remote 192.168.31.17 ttl 255
> ifconfig gre1 10.10.10.18/24 up
> ifconfig gre1 mtu 1450
>
> # node B - with mlx4_en address192.168.31.17
> ip tunnel add gre1 mode gre local 192.168.31.17 remote 192.168.31.18 ttl 255
> ifconfig gre1 10.10.10.17/24 up
> ifconfig gre1 mtu 1450
>
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 0efbae9..7753833 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2292,6 +2292,12 @@ static void mlx4_en_add_vxlan_offloads(struct
> work_struct *work)
> out:
> if (ret)
> en_err(priv, "failed setting L2 tunnel configuration ret
> %d\n", ret);
> +
> + /* set offloads */
> + priv->dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
> + NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
> + priv->dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
> + priv->dev->features |= NETIF_F_GSO_UDP_TUNNEL;
> }
>
> static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
> @@ -2299,6 +2305,10 @@ static void mlx4_en_del_vxlan_offloads(struct
> work_struct *work)
> int ret;
> struct mlx4_en_priv *priv = container_of(work, struct mlx4_en_priv,
> vxlan_del_task);
> + /* unset offloads */
> + priv->dev->hw_enc_features = 0;
> + priv->dev->hw_features &= ~NETIF_F_GSO_UDP_TUNNEL;
> + priv->dev->features &= ~NETIF_F_GSO_UDP_TUNNEL;
>
> ret = mlx4_SET_PORT_VXLAN(priv->mdev->dev, priv->port,
> VXLAN_STEER_BY_OUTER_MAC, 0);
> @@ -2578,13 +2588,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int
> port,
> if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_A0)
> dev->priv_flags |= IFF_UNICAST_FLT;
>
> - if (mdev->dev->caps.tunnel_offload_mode ==
> MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
> - dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
> - NETIF_F_TSO |
> NETIF_F_GSO_UDP_TUNNEL;
> - dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
> - dev->features |= NETIF_F_GSO_UDP_TUNNEL;
> - }
> -
> mdev->pndev[port] = dev;
>
> netif_carrier_off(dev);
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists