[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200501135602.0671c73d@carbon>
Date: Fri, 1 May 2020 13:56:02 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org,
Stephen Hemminger <stephen@...workplumber.org>,
David Ahern <dsahern@...il.com>, brouer@...hat.com
Subject: Re: [PATCH net-next V2] net: sched: fallback to qdisc noqueue if
default qdisc setup fail
On Thu, 30 Apr 2020 12:45:49 -0700
Jakub Kicinski <kuba@...nel.org> wrote:
> On Thu, 30 Apr 2020 13:42:22 +0200 Jesper Dangaard Brouer wrote:
> > Currently if the default qdisc setup/init fails, the device ends up with
> > qdisc "noop", which causes all TX packets to get dropped.
> >
> > With the introduction of sysctl net/core/default_qdisc it is possible
> > to change the default qdisc to be more advanced, which opens for the
> > possibility that Qdisc_ops->init() can fail.
> >
> > This patch detect these kind of failures, and choose to fallback to
> > qdisc "noqueue", which is so simple that its init call will not fail.
> > This allows the interface to continue functioning.
> >
> > V2:
> > As this also captures memory failures, which are transient, the
> > device is not kept in IFF_NO_QUEUE state. This allows the net_device
> > to retry to default qdisc assignment.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
>
> I have mixed feelings about this one, I wonder if I'm the only one.
> Seems like failure to allocate the default qdisc is pretty critical,
> the log message may be missed, especially in the boot time noise.
>
> I think a WARN_ON() is in order here, I'd personally just replace the
> netdev_info with a WARN_ON, without the fallback.
It is good that we agree that failure to default qdisc is pretty
critical. I guess we disagree on whether (1) we keep network
functioning in a degraded state, (2) drop all packets on net_device
such that people notice.
This change propose (1) keeping the box functioning. For me it was a
pretty bad experience, that when I pushed a new kernel over the network
to my embedded box, then I lost all network connectivity. I
fortunately had serial console access (as this was not an OpenWRT box
but a full devel board) so I could debug, but I could no-longer upgrade
the kernel. I clearly noticed, as the box was not operational, but I
guess most people would just give up at this point. (Imagine a small
OpenWRT box config setting default_qdisc to fq_codel, which brick the
box as it cannot allocate memory).
I hope that people will notice this degrade state, when they start to
transfer data to the device. Because running 'noqueue' on a physical
device will result in net_crit_ratelimited() messages below:
[86971.609318] Virtual device eth0 asks to queue packet!
[86971.622183] Virtual device eth0 asks to queue packet!
[86971.627510] Virtual device eth0 asks to queue packet!
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists