netdev - Re: [PATCH net-next V2] net: sched: fallback to qdisc noqueue if default qdisc setup fail

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200501135602.0671c73d@carbon>
Date:   Fri, 1 May 2020 13:56:02 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     netdev@...r.kernel.org,
        Stephen Hemminger <stephen@...workplumber.org>,
        David Ahern <dsahern@...il.com>, brouer@...hat.com
Subject: Re: [PATCH net-next V2] net: sched: fallback to qdisc noqueue if
 default qdisc setup fail

On Thu, 30 Apr 2020 12:45:49 -0700
Jakub Kicinski <kuba@...nel.org> wrote:

> On Thu, 30 Apr 2020 13:42:22 +0200 Jesper Dangaard Brouer wrote:
> > Currently if the default qdisc setup/init fails, the device ends up with
> > qdisc "noop", which causes all TX packets to get dropped.
> > 
> > With the introduction of sysctl net/core/default_qdisc it is possible
> > to change the default qdisc to be more advanced, which opens for the
> > possibility that Qdisc_ops->init() can fail.
> > 
> > This patch detect these kind of failures, and choose to fallback to
> > qdisc "noqueue", which is so simple that its init call will not fail.
> > This allows the interface to continue functioning.
> > 
> > V2:
> > As this also captures memory failures, which are transient, the
> > device is not kept in IFF_NO_QUEUE state.  This allows the net_device
> > to retry to default qdisc assignment.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>  
> 
> I have mixed feelings about this one, I wonder if I'm the only one.
> Seems like failure to allocate the default qdisc is pretty critical,
> the log message may be missed, especially in the boot time noise.
> 
> I think a WARN_ON() is in order here, I'd personally just replace the
> netdev_info with a WARN_ON, without the fallback.

It is good that we agree that failure to default qdisc is pretty
critical.  I guess we disagree on whether (1) we keep network
functioning in a degraded state, (2) drop all packets on net_device
such that people notice.

This change propose (1) keeping the box functioning.  For me it was a
pretty bad experience, that when I pushed a new kernel over the network
to my embedded box, then I lost all network connectivity.  I
fortunately had serial console access (as this was not an OpenWRT box
but a full devel board) so I could debug, but I could no-longer upgrade
the kernel.  I clearly noticed, as the box was not operational, but I
guess most people would just give up at this point. (Imagine a small
OpenWRT box config setting default_qdisc to fq_codel, which brick the
box as it cannot allocate memory).

I hope that people will notice this degrade state, when they start to
transfer data to the device.  Because running 'noqueue' on a physical
device will result in net_crit_ratelimited() messages below:

 [86971.609318] Virtual device eth0 asks to queue packet!
 [86971.622183] Virtual device eth0 asks to queue packet!
 [86971.627510] Virtual device eth0 asks to queue packet!

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer