[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMnM-s4M4HFpK1MVr+ey6PkU=uzwYsUipc1zBA5RPhzt-A@mail.gmail.com>
Date: Mon, 24 Apr 2023 13:59:15 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Stephen Hemminger <stephen@...workplumber.org>
Cc: Leon Romanovsky <leon@...nel.org>,
Victor Nogueira <victor@...atatu.com>, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
netdev@...r.kernel.org, xiyou.wangcong@...il.com, jiri@...nulli.us,
kernel@...atatu.com
Subject: Re: [PATCH net v2] net/sched: act_mirred: Add carrier check
On Mon, Apr 24, 2023 at 1:44 PM Stephen Hemminger
<stephen@...workplumber.org> wrote:
>
> On Mon, 24 Apr 2023 20:36:02 +0300
> Leon Romanovsky <leon@...nel.org> wrote:
>
> > > There are cases where the device is adminstratively UP, but operationally
> > > down. For example, we have a physical device (Nvidia ConnectX-6 Dx, 25Gbps)
> > > who's cable was pulled out, here is its ip link output:
> > >
> > > 5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
> > > link/ether b8:ce:f6:4b:68:35 brd ff:ff:ff:ff:ff:ff
> > > altname enp179s0f1np1
> > >
> > > As you can see, it's administratively UP but operationally down.
> > > In this case, sending a packet to this port caused a nasty kernel hang (so
> > > nasty that we were unable to capture it). Aborting a transmit based on
> > > operational status (in addition to administrative status) fixes the issue.
> > >
>
> Then fix the driver. It shouldn't hang.
> Other drivers just drop packets if link is down.
We didnt do extensive testing of drivers but consider this a safeguard
against buggy driver (its a huge process upgrading drivers in some
environments). It may even make sense to move this to dev_queue_xmit()
i.e the arguement is: why is the core sending a packet to hardware
that has link down to begin with? BTW, I believe the bridge behaves
this way ...
cheers,
jamal
Powered by blists - more mailing lists