lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yj8KnE5BeEK1SXDP@lunn.ch>
Date:   Sat, 26 Mar 2022 13:44:12 +0100
From:   Andrew Lunn <andrew@...n.ch>
To:     Lukas Wunner <lukas@...ner.de>
Cc:     Oleksij Rempel <o.rempel@...gutronix.de>,
        Oliver Neukum <oneukum@...e.com>,
        Oleksij Rempel <linux@...pel-privat.de>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: ordering of call to unbind() in usbnet_disconnect

On Sat, Mar 26, 2022 at 01:25:52PM +0100, Lukas Wunner wrote:
> On Tue, Mar 15, 2022 at 12:38:41PM +0100, Oleksij Rempel wrote:
> > On Tue, Mar 15, 2022 at 09:32:34AM +0100, Lukas Wunner wrote:
> > > > > > > On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
> > > > > > > > I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
> > > > > > > > unregister_netdev()") is causing regressions.
> > > 
> > > Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> > > net_device?
> > > 
> > > Oleksij, the commit message of 2c9d6c2b871d says that disconnecting the
> > > PHY "fails" in that situation.  Please elaborate what the failure looked
> > > like.  Did you get a stacktrace?
> 
> Oleksij, I cannot reproduce your stacktrace (included in full length below).
> I've tested with kernel 5.13 (since the stacktrace was with 5.13-rc3)
> with all your (and other people's) asix patches applied on top,
> except for 2c9d6c2b871d.  Tried unplugging an AX88772A multiple times,
> never got a stacktrace.
> 
> I've also walked down the code paths from usbnet_disconnect() and cannot
> see how the stacktrace could occur.
> 
> Normally an unregistering netdev is removed from the linkwatch event list
> (lweventlist) via this call stack:
> 
>           usbnet_disconnect()
>             unregister_netdev()
>               rtnl_unlock()
>                 netdev_run_todo()
>                   netdev_wait_allrefs()
>                     linkwatch_forget_dev()
>                       linkwatch_do_dev()
> 
> For the stacktrace to occur, the netdev would have to be subsequently
> re-added to the linkwatch event list via linkwatch_fire_event().

Hi Lukas

What you might be missing is a call to phy_error()
 
> That is called, among other places, from netif_carrier_off().  However,
> netif_carrier_off() is already called *before* linkwatch_forget_dev()
> when unregister_netdev() stops the netdev before unregistering it:
> 
>           usbnet_disconnect()
>             unregister_netdev()
>               unregister_netdevice()
>                 unregister_netdevice_queue(dev, NULL)
>                   unregister_netdevice_many()
>                     dev_close_many()
>                       __dev_close_many()
>                         usbnet_stop()
>                           ax88772_stop()
>                             phy_stop() # state = PHY_HALTED
>                               phy_state_machine()

I'm guessing somewhere around here:

If it calls into the PHY driver, and the PHY calls for an MDIO bus
transaction, and that returns an error, -ENODEV or -EIO for example,
because the USB device has gone away, and that results in a call to
phy_error().

void phy_error(struct phy_device *phydev)
{
        WARN_ON(1);

        mutex_lock(&phydev->lock);
        phydev->state = PHY_HALTED;
        mutex_unlock(&phydev->lock);

        phy_trigger_machine(phydev);
}

That will trigger the PHY state machine to run again, asynchronously.

The end of phy_stop() says:

        /* Cannot call flush_scheduled_work() here as desired because
         * of rtnl_lock(), but PHY_HALTED shall guarantee irq handler
         * will not reenable interrupts.
         */

so it looks like the state machine will run again, and potentially use
netdev.

If the MDIO bus driver is no longer returning ENODEV, than we should
avoid this.

      Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ