[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b172b4dad96e519b2f49034ab627f1d666b3df63.camel@infinera.com>
Date: Mon, 2 Nov 2020 08:27:20 +0000
From: Joakim Tjernlund <Joakim.Tjernlund@...inera.com>
To: "dsahern@...il.com" <dsahern@...il.com>,
"linyunsheng@...wei.com" <linyunsheng@...wei.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"kuba@...nel.org" <kuba@...nel.org>
Subject: Re: arping stuck with ENOBUFS in 4.19.150
On Sat, 2020-10-31 at 09:48 +0800, Yunsheng Lin wrote:
> On 2020/10/30 19:50, Joakim Tjernlund wrote:
> > On Fri, 2020-10-30 at 09:36 +0800, Yunsheng Lin wrote:
> > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> > >
> > >
> > > On 2020/10/29 23:18, David Ahern wrote:
> > > > On 10/29/20 8:10 AM, Joakim Tjernlund wrote:
> > > > > OK, bisecting (was a bit of a bother since we merge upstream releases into our tree, is there a way to just bisect that?)
> > > > >
> > > > > Result was commit "net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc" (749cc0b0c7f3dcdfe5842f998c0274e54987384f)
> > > > >
> > > > > Reverting that commit on top of our tree made it work again. How to fix?
> > > >
> > > > Adding the author of that patch (linyunsheng@...wei.com) to take a look.
> > > >
> > > >
> > > > >
> > > > > Jocke
> > > > >
> > > > > On Mon, 2020-10-26 at 12:31 -0600, David Ahern wrote:
> > > > > >
> > > > > > On 10/26/20 6:58 AM, Joakim Tjernlund wrote:
> > > > > > > Ping (maybe it should read "arping" instead :)
> > > > > > >
> > > > > > > Jocke
> > > > > > >
> > > > > > > On Thu, 2020-10-22 at 17:19 +0200, Joakim Tjernlund wrote:
> > > > > > > > strace arping -q -c 1 -b -U -I eth1 0.0.0.0
> > > > > > > > ...
> > > > > > > > sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
> > > > > > > > 20) = -1 ENOBUFS (No buffer space available)
> > > > > > > > ....
> > > > > > > > and then arping loops.
> > > > > > > >
> > > > > > > > in 4.19.127 it was:
> > > > > > > > sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1,
> > > > > > > > ffffffffffff}, 20) = 28
> > > > > > > >
> > > > > > > > Seems like something has changed the IP behaviour between now and then ?
> > > > > > > > eth1 is UP but not RUNNING and has an IP address.
> > >
> > > "eth1 is UP but not RUNNING" usually mean user has configure the netdev as up,
> > > but the hardware has not detected a linkup yet.
> > >
> > > Also What is the output of "ethtool eth1"?
> >
> > echo 1 > /sys/class/net/eth1/carrier
> > cu3-jocke ~ # arping -q -c 1 -b -U -I eth1 0.0.0.0
> > cu3-jocke ~ # echo 0 > /sys/class/net/eth1/carrier
> > cu3-jocke ~ # arping -q -c 1 -b -U -I eth1 0.0.0.0
> > ^Ccu3-jocke ~ # ethtool eth1
> > Settings for eth1:
> > Supported ports: [ MII ]
> > Supported link modes: 1000baseT/Full
> > Supported pause frame use: Symmetric Receive-only
> > Supports auto-negotiation: Yes
> > Advertised link modes: 1000baseT/Full
> > Advertised pause frame use: Symmetric Receive-only
> > Advertised auto-negotiation: Yes
> > Speed: 10Mb/s
> > Duplex: Half
> > Port: MII
> > PHYAD: 1
> > Transceiver: external
> > Auto-negotiation: on
> > Current message level: 0x00000037 (55)
> > drv probe link ifdown ifup
> > Link detected: no
> >
> > We have a writeable carrier since eth device is PHY less. Maybe that path is different ?
> > Check drivers/net/ethernet/freescale/dpaa/dpa_eth.c
>
> The above difference does not seems to matter.
>
> >
> > >
> > > It would be good to see the status of netdev before and after executing arping cmd
> > > too.
> >
> > hmm, how do you mean?
>
> I was trying to find out when the netdev' state became "eth1 is UP but not RUNNING".
>
> Anyway, when I looked at the backported patch, I did find new qdisc assignment is
> missing from the upstream patch.
>
> Please see if the below patch fix your problem, thanks:
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index bd96fd2..4e15913 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1116,10 +1116,13 @@ static void dev_deactivate_queue(struct net_device *dev,
> void *_qdisc_default)
> {
> struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc);
> + struct Qdisc *qdisc_default = _qdisc_default;
>
> if (qdisc) {
> if (!(qdisc->flags & TCQ_F_BUILTIN))
> set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state);
> +
> + rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
> }
> }
This patch seem to have resolved the problem, thanks.
Please CC me on the formal patch for 4.19.x
Jocke
Powered by blists - more mailing lists