netdev - Re: [VXLAN] [MLX5] Lost traffic and issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAA85sZuoiWSfMt8H+pjN-Ly=f2wNwG5tPiFZzcc6-1F3fqcO9Q@mail.gmail.com>
Date:   Mon, 2 Mar 2020 23:45:44 +0100
From:   Ian Kumlien <ian.kumlien@...il.com>
To:     Saeed Mahameed <saeedm@...lanox.com>
Cc:     Roi Dayan <roid@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Yevgeny Kliteynik <kliteyn@...lanox.com>,
        Leon Romanovsky <leonro@...lanox.com>
Subject: Re: [VXLAN] [MLX5] Lost traffic and issues

On Mon, Mar 2, 2020 at 8:10 PM Saeed Mahameed <saeedm@...lanox.com> wrote:
> On Fri, 2020-02-28 at 16:02 +0100, Ian Kumlien wrote:
> > Hi,
> >
> > Including netdev - to see if someone else has a clue.
> >
> > We have a few machines in a cloud and when upgrading from 4.16.7 ->
> > 5.4.15 we ran in to
> > unexpected and intermittent problems.
> > (I have tested 5.5.6 and the problems persists)
> >
> > What we saw, using several monitoring points, was that traffic
> > disappeared after what we can see when tcpdumping on "bond0"
> >
> > We had tcpdump running on:
> > 1, DHCP nodes (local tap interfaces)
> > 2, Router instances on L3 node
> > 3, Local node (where the VM runs) (tap, bridge and eventually tap
> > interface dumping VXLAN traffic)
> > 4, Using port mirroring on the 100gbit switch to see what ended up on
> > the physical wire.
> >
> > What we can see is that from the four step handshake for DHCP only
> > two
> > steps works, the forth step will be dropped "on the nic".
> >
> > We can see it go out bond0, in tagged VLAN and within a VXLAN packet
> > -
> > however the switch never sees it.
>
> Hi,
>
> Have you seen the packets actually going out on one of the mlx5 100gbit
> legs ?

We disabled bond and made it go to only one interface to be able to snoop the
traffic (sorry, sometimes you forget things when writing them down)

And no, the traffic that was lost never reached the "wire" to our knowledge

> > There has been a few mlx5 changes wrt VXLAN which can be culprits but
> > it's really hard to judge.
> >
> > dmesg |grep mlx
> > [    2.231399] mlx5_core 0000:0b:00.0: firmware version: 16.26.1040
> > [    2.912595] mlx5_core 0000:0b:00.0: Rate limit: 127 rates are
> > supported, range: 0Mbps to 97656Mbps
> > [    2.935012] mlx5_core 0000:0b:00.0: Port module event: module 0,
> > Cable plugged
> > [    2.949528] mlx5_core 0000:0b:00.1: firmware version: 16.26.1040
> > [    3.638647] mlx5_core 0000:0b:00.1: Rate limit: 127 rates are
> > supported, range: 0Mbps to 97656Mbps
> > [    3.661206] mlx5_core 0000:0b:00.1: Port module event: module 1,
> > Cable plugged
> > [    3.675562] mlx5_core 0000:0b:00.0: MLX5E: StrdRq(1) RqSz(8)
> > StrdSz(64) RxCqeCmprss(0)
> > [    3.846149] mlx5_core 0000:0b:00.1: MLX5E: StrdRq(1) RqSz(8)
> > StrdSz(64) RxCqeCmprss(0)
> > [    4.021738] mlx5_core 0000:0b:00.0 enp11s0f0: renamed from eth0
> > [    4.021962] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
> >
> > I have tried turning all offloads off, but the problem persists as
> > well - it's really weird that it seems to be only some packets.
> >
> > To be clear, the bond0 interface is 2*100gbit, using 802.1ad (LACP)
> > with layer2+3 hashing.
> > This seems to be offloaded in to the nic (can it be turned off?) and
> > messages about modifying the "lag map" was
> > quite frequent until we did a firmware upgrade - even with upgraded
> > firmware, it continued but to a lesser extent.
> >
> > With 5.5.7 approaching, we would want a path forward to handle
> > this...
>
>
> What type of mlx5 configuration you have (Native PV virtualization ?
> SRIOV ? legacy mode or switchdev mode ? )

We have:
tap -> bridge -> ovs -> bond (one legged) -switch-fabric-> <other-end>

So a pretty standard openstack setup

> The only change that i could think of is the lag multi-path support we
> added, Roi can you please take a look at this ?

I'm also trying to get a setup working where i could try reverting changes
but so far we've only had this problem with mlx5_core...
Also the intermittent but reliable patterns are really weird...

All traffic seems fine, except vxlan traffic :/

(The problem is that the actual machines that has the issue is in production
with 8x V100 nvidia cards... Kinda hard to justify having them "offline" ;))