[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220326130148.GC31022@wunner.de>
Date: Sat, 26 Mar 2022 14:01:48 +0100
From: Lukas Wunner <lukas@...ner.de>
To: Andrew Lunn <andrew@...n.ch>
Cc: Oleksij Rempel <o.rempel@...gutronix.de>,
Oliver Neukum <oneukum@...e.com>,
Oleksij Rempel <linux@...pel-privat.de>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: ordering of call to unbind() in usbnet_disconnect
On Sat, Mar 26, 2022 at 01:44:12PM +0100, Andrew Lunn wrote:
> On Sat, Mar 26, 2022 at 01:25:52PM +0100, Lukas Wunner wrote:
> > Oleksij, I cannot reproduce your stacktrace (included in full length below).
> > I've tested with kernel 5.13 (since the stacktrace was with 5.13-rc3)
> > with all your (and other people's) asix patches applied on top,
> > except for 2c9d6c2b871d. Tried unplugging an AX88772A multiple times,
> > never got a stacktrace.
> >
> > I've also walked down the code paths from usbnet_disconnect() and cannot
> > see how the stacktrace could occur.
> >
> > Normally an unregistering netdev is removed from the linkwatch event list
> > (lweventlist) via this call stack:
> >
> > usbnet_disconnect()
> > unregister_netdev()
> > rtnl_unlock()
> > netdev_run_todo()
> > netdev_wait_allrefs()
> > linkwatch_forget_dev()
> > linkwatch_do_dev()
> >
> > For the stacktrace to occur, the netdev would have to be subsequently
> > re-added to the linkwatch event list via linkwatch_fire_event().
>
> What you might be missing is a call to phy_error()
But phy_error() has a WARN_ON(1) right at its top. So it produces
a stacktrace itself. That stacktrace is nowhere to be seen in the
dmesg output Oleksij posted. Hence it can't be caused by phy_error().
Also, recall that unregister_netdev() stops the netdev before
unregistering it. That in turn causes an invocation of phy_stop()
via ax88772_stop(). phy_stop() already puts the PHY into PHY_HALTED
state and resets phydev->link = 0. So a subsequent phy_error() cannot
result in a call to phy_link_down() (which would indeed trigger a
dangerous linkwatch_fire_event()).
> > That is called, among other places, from netif_carrier_off(). However,
> > netif_carrier_off() is already called *before* linkwatch_forget_dev()
> > when unregister_netdev() stops the netdev before unregistering it:
> >
> > usbnet_disconnect()
> > unregister_netdev()
> > unregister_netdevice()
> > unregister_netdevice_queue(dev, NULL)
> > unregister_netdevice_many()
> > dev_close_many()
> > __dev_close_many()
> > usbnet_stop()
> > ax88772_stop()
> > phy_stop() # state = PHY_HALTED
> > phy_state_machine()
>
> I'm guessing somewhere around here:
>
> If it calls into the PHY driver, and the PHY calls for an MDIO bus
> transaction, and that returns an error, -ENODEV or -EIO for example,
> because the USB device has gone away, and that results in a call to
> phy_error().
Oleksij amended phy_state_machine() to bail out if err == -ENODEV
with commit 06edf1a940be ("net: phy: do not print dump stack if device
was removed"). The commit skips the phy_error() on -ENODEV, which
makes a lot of sense.
Thanks,
Lukas
Powered by blists - more mailing lists