lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 3 Mar 2020 11:23:48 +0100
From:   Ian Kumlien <ian.kumlien@...il.com>
To:     Saeed Mahameed <saeedm@...lanox.com>
Cc:     Roi Dayan <roid@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Yevgeny Kliteynik <kliteyn@...lanox.com>,
        Leon Romanovsky <leonro@...lanox.com>
Subject: Re: [VXLAN] [MLX5] Lost traffic and issues

On Mon, Mar 2, 2020 at 11:45 PM Ian Kumlien <ian.kumlien@...il.com> wrote:
>
> On Mon, Mar 2, 2020 at 8:10 PM Saeed Mahameed <saeedm@...lanox.com> wrote:

[... 8< ...]

> > What type of mlx5 configuration you have (Native PV virtualization ?
> > SRIOV ? legacy mode or switchdev mode ? )
>
> We have:
> tap -> bridge -> ovs -> bond (one legged) -switch-fabric-> <other-end>
>
> So a pretty standard openstack setup

Oh, the L3 nodes are also MLX5s (50gbit) and they do report the lag map thing

[   37.389366] mlx5_core 0000:04:00.0 ens1f0: S-tagged traffic will be
dropped while C-tag vlan stripping is enabled
[77126.178520] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
[77131.485189] mlx5_core 0000:04:00.0 ens1f0: Link down
[77337.033686] mlx5_core 0000:04:00.0 ens1f0: Link up
[77344.338901] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
[78098.028670] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
[78103.479494] mlx5_core 0000:04:00.0 ens1f0: Link down
[78310.028518] mlx5_core 0000:04:00.0 ens1f0: Link up
[78317.797155] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
[78504.893590] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
[78511.277529] mlx5_core 0000:04:00.0 ens1f0: Link down
[78714.526539] mlx5_core 0000:04:00.0 ens1f0: Link up
[78720.422078] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
[78720.838063] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
[78727.226433] mlx5_core 0000:04:00.0 ens1f0: Link down
[78929.575826] mlx5_core 0000:04:00.0 ens1f0: Link up
[78935.422600] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
[79330.519516] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
[79330.831447] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
[79336.073520] mlx5_core 0000:04:00.1 ens1f1: Link down
[79336.279519] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
[79541.272469] mlx5_core 0000:04:00.1 ens1f1: Link up
[79546.664008] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
[82107.461831] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
[82113.859238] mlx5_core 0000:04:00.1 ens1f1: Link down
[82320.458475] mlx5_core 0000:04:00.1 ens1f1: Link up
[82327.774289] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
[82490.950671] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
[82497.307348] mlx5_core 0000:04:00.1 ens1f1: Link down
[82705.956583] mlx5_core 0000:04:00.1 ens1f1: Link up
[82714.055134] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
[83100.804620] mlx5_core 0000:04:00.0 ens1f0: Link down
[83100.860943] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
[83319.953296] mlx5_core 0000:04:00.0 ens1f0: Link up
[83327.984559] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
[83924.600444] mlx5_core 0000:04:00.0 ens1f0: Link down
[83924.656321] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
[84312.648630] mlx5_core 0000:04:00.0 ens1f0: Link up
[84319.571326] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
[84946.495374] mlx5_core 0000:04:00.1 ens1f1: Link down
[84946.588637] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
[84946.692596] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
[84949.188628] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
[85363.543475] mlx5_core 0000:04:00.1 ens1f1: Link up
[85371.093484] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
[624051.460733] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
[624053.644769] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
[624053.674747] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2

Sorry, it's been a long couple of weeks ;)

> > The only change that i could think of is the lag multi-path support we
> > added, Roi can you please take a look at this ?
>
> I'm also trying to get a setup working where i could try reverting changes
> but so far we've only had this problem with mlx5_core...
> Also the intermittent but reliable patterns are really weird...
>
> All traffic seems fine, except vxlan traffic :/
>
> (The problem is that the actual machines that has the issue is in production
> with 8x V100 nvidia cards... Kinda hard to justify having them "offline" ;))

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ