[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200312214159.1c41209d@kicinski-fedora-PC1C0HJN>
Date: Thu, 12 Mar 2020 21:41:59 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Shannon Nelson <snelson@...sando.io>
Cc: David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next 1/7] ionic: tx and rx queues state follows link
state
On Thu, 12 Mar 2020 17:12:45 -0700 Shannon Nelson wrote:
> On 3/12/20 3:41 PM, David Miller wrote:
> > From: Shannon Nelson <snelson@...sando.io>
> > Date: Thu, 12 Mar 2020 14:50:09 -0700
> >
> >> + if (!test_bit(IONIC_LIF_F_UP, lif->state) &&
> >> + netif_running(netdev)) {
> >> + rtnl_lock();
> >> + ionic_open(netdev);
> >> + rtnl_unlock();
> >> }
> > You're running into a major problem area here.
> >
> > ionic_open() can fail, particularly because it allocates resources.
> >
> > Yet you are doing this in an operational path that doesn't handle
> > and unwind from errors.
> >
> > You must find a way to do this properly, because the current approach
> > can result in an inoperable interface.
>
> I don't see this as much different from how we use it in
> ionic_reset_queues(), which was modeled after some other drivers' uses
> of the open call. In the fw reset case, though, the time between the
> close and the open is many seconds.
Precedent does not make it any better. You're really pushing the
re-open hack to new levels here :(
Why don't you just unregister the netdev? 30 sec is orders of
magnitude past the point where "no impact to the user" could be
claimed.
FWIW please take a look at NFP, nfp_net_ring_reconfig() and its uses,
to see a better way of handling shutdown and reconfiguration.
> Yes, ionic_open() can fail, and it unwinds its own work. There isn't
> anything here in ionic_link_status_check() to unwind, and no one to
> report the error to, so we don't catch the error here. However, it would
> be better if I move the addition of the IONIC_LIF_F_TRANS flag and a
> couple other bits from patch 7 into this patch - I can do that for a v2
> patchset.
Those layers of "how open is the device _really_" state within drivers
are just a breeding ground for bugs. Already you're doing this:
+ if (!test_bit(IONIC_LIF_F_UP, lif->state) &&
+ netif_running(netdev)) {
+ rtnl_lock();
+ ionic_open(netdev);
+ rtnl_unlock();
A lot may happen right before rtnl_lock().
You got the UP bit, the RESET bit, and now TRANS bit..
Powered by blists - more mailing lists