[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yj8KnE5BeEK1SXDP@lunn.ch>
Date: Sat, 26 Mar 2022 13:44:12 +0100
From: Andrew Lunn <andrew@...n.ch>
To: Lukas Wunner <lukas@...ner.de>
Cc: Oleksij Rempel <o.rempel@...gutronix.de>,
Oliver Neukum <oneukum@...e.com>,
Oleksij Rempel <linux@...pel-privat.de>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: ordering of call to unbind() in usbnet_disconnect
On Sat, Mar 26, 2022 at 01:25:52PM +0100, Lukas Wunner wrote:
> On Tue, Mar 15, 2022 at 12:38:41PM +0100, Oleksij Rempel wrote:
> > On Tue, Mar 15, 2022 at 09:32:34AM +0100, Lukas Wunner wrote:
> > > > > > > On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
> > > > > > > > I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
> > > > > > > > unregister_netdev()") is causing regressions.
> > >
> > > Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> > > net_device?
> > >
> > > Oleksij, the commit message of 2c9d6c2b871d says that disconnecting the
> > > PHY "fails" in that situation. Please elaborate what the failure looked
> > > like. Did you get a stacktrace?
>
> Oleksij, I cannot reproduce your stacktrace (included in full length below).
> I've tested with kernel 5.13 (since the stacktrace was with 5.13-rc3)
> with all your (and other people's) asix patches applied on top,
> except for 2c9d6c2b871d. Tried unplugging an AX88772A multiple times,
> never got a stacktrace.
>
> I've also walked down the code paths from usbnet_disconnect() and cannot
> see how the stacktrace could occur.
>
> Normally an unregistering netdev is removed from the linkwatch event list
> (lweventlist) via this call stack:
>
> usbnet_disconnect()
> unregister_netdev()
> rtnl_unlock()
> netdev_run_todo()
> netdev_wait_allrefs()
> linkwatch_forget_dev()
> linkwatch_do_dev()
>
> For the stacktrace to occur, the netdev would have to be subsequently
> re-added to the linkwatch event list via linkwatch_fire_event().
Hi Lukas
What you might be missing is a call to phy_error()
> That is called, among other places, from netif_carrier_off(). However,
> netif_carrier_off() is already called *before* linkwatch_forget_dev()
> when unregister_netdev() stops the netdev before unregistering it:
>
> usbnet_disconnect()
> unregister_netdev()
> unregister_netdevice()
> unregister_netdevice_queue(dev, NULL)
> unregister_netdevice_many()
> dev_close_many()
> __dev_close_many()
> usbnet_stop()
> ax88772_stop()
> phy_stop() # state = PHY_HALTED
> phy_state_machine()
I'm guessing somewhere around here:
If it calls into the PHY driver, and the PHY calls for an MDIO bus
transaction, and that returns an error, -ENODEV or -EIO for example,
because the USB device has gone away, and that results in a call to
phy_error().
void phy_error(struct phy_device *phydev)
{
WARN_ON(1);
mutex_lock(&phydev->lock);
phydev->state = PHY_HALTED;
mutex_unlock(&phydev->lock);
phy_trigger_machine(phydev);
}
That will trigger the PHY state machine to run again, asynchronously.
The end of phy_stop() says:
/* Cannot call flush_scheduled_work() here as desired because
* of rtnl_lock(), but PHY_HALTED shall guarantee irq handler
* will not reenable interrupts.
*/
so it looks like the state machine will run again, and potentially use
netdev.
If the MDIO bus driver is no longer returning ENODEV, than we should
avoid this.
Andrew
Powered by blists - more mailing lists