lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 26 Mar 2022 14:01:48 +0100
From:   Lukas Wunner <lukas@...ner.de>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Oleksij Rempel <o.rempel@...gutronix.de>,
        Oliver Neukum <oneukum@...e.com>,
        Oleksij Rempel <linux@...pel-privat.de>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: ordering of call to unbind() in usbnet_disconnect

On Sat, Mar 26, 2022 at 01:44:12PM +0100, Andrew Lunn wrote:
> On Sat, Mar 26, 2022 at 01:25:52PM +0100, Lukas Wunner wrote:
> > Oleksij, I cannot reproduce your stacktrace (included in full length below).
> > I've tested with kernel 5.13 (since the stacktrace was with 5.13-rc3)
> > with all your (and other people's) asix patches applied on top,
> > except for 2c9d6c2b871d.  Tried unplugging an AX88772A multiple times,
> > never got a stacktrace.
> > 
> > I've also walked down the code paths from usbnet_disconnect() and cannot
> > see how the stacktrace could occur.
> > 
> > Normally an unregistering netdev is removed from the linkwatch event list
> > (lweventlist) via this call stack:
> > 
> >           usbnet_disconnect()
> >             unregister_netdev()
> >               rtnl_unlock()
> >                 netdev_run_todo()
> >                   netdev_wait_allrefs()
> >                     linkwatch_forget_dev()
> >                       linkwatch_do_dev()
> > 
> > For the stacktrace to occur, the netdev would have to be subsequently
> > re-added to the linkwatch event list via linkwatch_fire_event().
> 
> What you might be missing is a call to phy_error()

But phy_error() has a WARN_ON(1) right at its top.  So it produces
a stacktrace itself.  That stacktrace is nowhere to be seen in the
dmesg output Oleksij posted.  Hence it can't be caused by phy_error().

Also, recall that unregister_netdev() stops the netdev before
unregistering it.  That in turn causes an invocation of phy_stop()
via ax88772_stop().  phy_stop() already puts the PHY into PHY_HALTED
state and resets phydev->link = 0.  So a subsequent phy_error() cannot
result in a call to phy_link_down() (which would indeed trigger a
dangerous linkwatch_fire_event()).


> > That is called, among other places, from netif_carrier_off().  However,
> > netif_carrier_off() is already called *before* linkwatch_forget_dev()
> > when unregister_netdev() stops the netdev before unregistering it:
> > 
> >           usbnet_disconnect()
> >             unregister_netdev()
> >               unregister_netdevice()
> >                 unregister_netdevice_queue(dev, NULL)
> >                   unregister_netdevice_many()
> >                     dev_close_many()
> >                       __dev_close_many()
> >                         usbnet_stop()
> >                           ax88772_stop()
> >                             phy_stop() # state = PHY_HALTED
> >                               phy_state_machine()
> 
> I'm guessing somewhere around here:
> 
> If it calls into the PHY driver, and the PHY calls for an MDIO bus
> transaction, and that returns an error, -ENODEV or -EIO for example,
> because the USB device has gone away, and that results in a call to
> phy_error().

Oleksij amended phy_state_machine() to bail out if err == -ENODEV
with commit 06edf1a940be ("net: phy: do not print dump stack if device
was removed").  The commit skips the phy_error() on -ENODEV, which
makes a lot of sense.

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ